WorldWideScience

Sample records for leave-one-case-out cross-validation method

  1. Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context.

    Science.gov (United States)

    Martinez, Josue G; Carroll, Raymond J; Müller, Samuel; Sampson, Joshua N; Chatterjee, Nilanjan

    2011-11-01

    When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.

  2. Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods.

    Science.gov (United States)

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J Sunil

    2014-08-01

    We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called "Patient Recursive Survival Peeling" is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called "combined" cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication.

  3. A statistical method (cross-validation) for bone loss region detection after spaceflight

    Science.gov (United States)

    Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.

    2010-01-01

    Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144

  4. Raman fiber-optical method for colon cancer detection: Cross-validation and outlier identification approach

    Science.gov (United States)

    Petersen, D.; Naveed, P.; Ragheb, A.; Niedieker, D.; El-Mashtoly, S. F.; Brechmann, T.; Kötting, C.; Schmiegel, W. H.; Freier, E.; Pox, C.; Gerwert, K.

    2017-06-01

    Endoscopy plays a major role in early recognition of cancer which is not externally accessible and therewith in increasing the survival rate. Raman spectroscopic fiber-optical approaches can help to decrease the impact on the patient, increase objectivity in tissue characterization, reduce expenses and provide a significant time advantage in endoscopy. In gastroenterology an early recognition of malign and precursor lesions is relevant. Instantaneous and precise differentiation between adenomas as precursor lesions for cancer and hyperplastic polyps on the one hand and between high and low-risk alterations on the other hand is important. Raman fiber-optical measurements of colon biopsy samples taken during colonoscopy were carried out during a clinical study, and samples of adenocarcinoma (22), tubular adenomas (141), hyperplastic polyps (79) and normal tissue (101) from 151 patients were analyzed. This allows us to focus on the bioinformatic analysis and to set stage for Raman endoscopic measurements. Since spectral differences between normal and cancerous biopsy samples are small, special care has to be taken in data analysis. Using a leave-one-patient-out cross-validation scheme, three different outlier identification methods were investigated to decrease the influence of systematic errors, like a residual risk in misplacement of the sample and spectral dilution of marker bands (esp. cancerous tissue) and therewith optimize the experimental design. Furthermore other validations methods like leave-one-sample-out and leave-one-spectrum-out cross-validation schemes were compared with leave-one-patient-out cross-validation. High-risk lesions were differentiated from low-risk lesions with a sensitivity of 79%, specificity of 74% and an accuracy of 77%, cancer and normal tissue with a sensitivity of 79%, specificity of 83% and an accuracy of 81%. Additionally applied outlier identification enabled us to improve the recognition of neoplastic biopsy samples.

  5. Raman fiber-optical method for colon cancer detection: Cross-validation and outlier identification approach.

    Science.gov (United States)

    Petersen, D; Naveed, P; Ragheb, A; Niedieker, D; El-Mashtoly, S F; Brechmann, T; Kötting, C; Schmiegel, W H; Freier, E; Pox, C; Gerwert, K

    2017-06-15

    Endoscopy plays a major role in early recognition of cancer which is not externally accessible and therewith in increasing the survival rate. Raman spectroscopic fiber-optical approaches can help to decrease the impact on the patient, increase objectivity in tissue characterization, reduce expenses and provide a significant time advantage in endoscopy. In gastroenterology an early recognition of malign and precursor lesions is relevant. Instantaneous and precise differentiation between adenomas as precursor lesions for cancer and hyperplastic polyps on the one hand and between high and low-risk alterations on the other hand is important. Raman fiber-optical measurements of colon biopsy samples taken during colonoscopy were carried out during a clinical study, and samples of adenocarcinoma (22), tubular adenomas (141), hyperplastic polyps (79) and normal tissue (101) from 151 patients were analyzed. This allows us to focus on the bioinformatic analysis and to set stage for Raman endoscopic measurements. Since spectral differences between normal and cancerous biopsy samples are small, special care has to be taken in data analysis. Using a leave-one-patient-out cross-validation scheme, three different outlier identification methods were investigated to decrease the influence of systematic errors, like a residual risk in misplacement of the sample and spectral dilution of marker bands (esp. cancerous tissue) and therewith optimize the experimental design. Furthermore other validations methods like leave-one-sample-out and leave-one-spectrum-out cross-validation schemes were compared with leave-one-patient-out cross-validation. High-risk lesions were differentiated from low-risk lesions with a sensitivity of 79%, specificity of 74% and an accuracy of 77%, cancer and normal tissue with a sensitivity of 79%, specificity of 83% and an accuracy of 81%. Additionally applied outlier identification enabled us to improve the recognition of neoplastic biopsy samples. Copyright

  6. Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods

    Science.gov (United States)

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil

    2015-01-01

    We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our Survival Bump Hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non/semi-parametric statistics such as the hazards-ratio, the log-rank test or the Nelson--Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted to the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low and high-dimensional settings. Although several non-parametric survival models exist, none addresses the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome, for which tailored medical interventions could be made. An R package PRIMsrc (Patient Rule Induction Method in Survival, Regression and Classification settings) is available on CRAN (Comprehensive R Archive Network) and GitHub. PMID:27034730

  7. Estimating misclassification error: a closer look at cross-validation based methods

    Directory of Open Access Journals (Sweden)

    Ounpraseuth Songthip

    2012-11-01

    Full Text Available Abstract Background To estimate a classifier’s error in predicting future observations, bootstrap methods have been proposed as reduced-variation alternatives to traditional cross-validation (CV methods based on sampling without replacement. Monte Carlo (MC simulation studies aimed at estimating the true misclassification error conditional on the training set are commonly used to compare CV methods. We conducted an MC simulation study to compare a new method of bootstrap CV (BCV to k-fold CV for estimating clasification error. Findings For the low-dimensional conditions simulated, the modest positive bias of k-fold CV contrasted sharply with the substantial negative bias of the new BCV method. This behavior was corroborated using a real-world dataset of prognostic gene-expression profiles in breast cancer patients. Our simulation results demonstrate some extreme characteristics of variance and bias that can occur due to a fault in the design of CV exercises aimed at estimating the true conditional error of a classifier, and that appear not to have been fully appreciated in previous studies. Although CV is a sound practice for estimating a classifier’s generalization error, using CV to estimate the fixed misclassification error of a trained classifier conditional on the training set is problematic. While MC simulation of this estimation exercise can correctly represent the average bias of a classifier, it will overstate the between-run variance of the bias. Conclusions We recommend k-fold CV over the new BCV method for estimating a classifier’s generalization error. The extreme negative bias of BCV is too high a price to pay for its reduced variance.

  8. Cross-validity of a portable glucose capillary monitors in relation to enzymatic spectrophotometer methods

    Directory of Open Access Journals (Sweden)

    William Alves Lima

    2006-09-01

    Full Text Available The glucose is an important substrate utilizaded during exercise. Accurate measurement of glucose is vital to obtain trustworthy results. The enzymatic spectrophotometer methods are generally considered the “goldstandard” laboratory procedure for measuring of glucose (GEnz, is time consuming, costly, and inappropriate for large scale field testing. Compact and portable glucose monitors (GAccu are quick and easy methods to assess glucose on large numbers of subjects. So, this study aimed to test the cross-validity of GAccu. The sample was composed of 107 men (aged= 35.4±10.7 years; stature= 168.4±6.9 cm; body mass= 73.4±11.2 kg; %fat= 20.9±8.3% – by dual energy x-ray absorptiometry. Blood for measuring fasting glucose was taken in basilar vein (Genz, Bioplus: Bio-2000 and in ring finger (GAccu: Accu-Chek© Advantage©, after a 12-hour overnight fast. GEnz was used as the criterion for cross-validity. Paired t-test shown differences (p RESUMO A glicose é um substrato importante utilizado durante o exercício físico. Medidas acuradas da glicose são fundamentais para a obtenção de resultados confiáveis. O método laboratorial de espectrofotometria enzimática geralmente é considerado o procedimento “padrão ouro” para medir a glicose (GEnz, o qual requer tempo, custo e é inapropriado para o uso em larga escala. Monitores portáteis de glicose (GAccu são rápidos e fáceis para medir a glicose em um grande número de sujeitos. Então, este estudo teve por objetivo testar a validade concorrente do GAccu. A amostra foi composta por 107 homens (idade= 35,4±10,7 anos; estatura= 168,4±6,9 cm; massa corporal= 73,4±11,2 kg; %gordura= 20,9±8,3% – por absortometria de raio-x de dupla energia. O sangue para mensurar a glicose em jejum foi tirado na veia basilar (Genz, Bioplus: Bio-2000 e no dedo anular (GAccu - Accu- Chek© Advantage©, depois de 12h de jejum noturno. O GEnz foi usado como critério para testar a validade

  9. Comparison of the Effects of Cross-validation Methods on Determining Performances of Classifiers Used in Diagnosing Congestive Heart Failure

    Directory of Open Access Journals (Sweden)

    Isler Yalcin

    2015-08-01

    Full Text Available Congestive heart failure (CHF occurs when the heart is unable to provide sufficient pump action to maintain blood flow to meet the needs of the body. Early diagnosis is important since the mortality rate of the patients with CHF is very high. There are different validation methods to measure performances of classifier algorithms designed for this purpose. In this study, k-fold and leave-one-out cross-validation methods were tested for performance measures of five distinct classifiers in the diagnosis of the patients with CHF. Each algorithm was run 100 times and the average and the standard deviation of classifier performances were recorded. As a result, it was observed that average performance was enhanced and the variability of performances was decreased when the number of data sections used in the cross-validation method was increased.

  10. The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies

    Science.gov (United States)

    O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.

    2011-01-01

    The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…

  11. A cross-validation Delphi method approach to the diagnosis and treatment of personality disorders in older adults.

    Science.gov (United States)

    Rosowsky, Erlene; Young, Alexander S; Malloy, Mary C; van Alphen, S P J; Ellison, James M

    2018-03-01

    The Delphi method is a consensus-building technique using expert opinion to formulate a shared framework for understanding a topic with limited empirical support. This cross-validation study replicates one completed in the Netherlands and Belgium, and explores US experts' views on the diagnosis and treatment of older adults with personality disorders (PD). Twenty-one geriatric PD experts participated in a Delphi survey addressing diagnosis and treatment of older adults with PD. The European survey was translated and administered electronically. First-round consensus was reached for 16 out of 18 items relevant to diagnosis and specific mental health programs for personality disorders in older adults. Experts agreed on the usefulness of establishing criteria for specific types of treatments. The majority of psychologists did not initially agree on the usefulness of pharmacotherapy. Expert consensus was reached following two subsequent rounds after clarification addressing medication use. Study results suggest consensus among regarding psychosocial treatments. Limited acceptance amongst US psychologists about the suitability of pharmacotherapy for late-life PDs contrasted with the views expressed by experts surveyed in Netherlands and Belgium studies.

  12. Cross validation in LULOO

    DEFF Research Database (Denmark)

    Sørensen, Paul Haase; Nørgård, Peter Magnus; Hansen, Lars Kai

    1996-01-01

    The leave-one-out cross-validation scheme for generalization assessment of neural network models is computationally expensive due to replicated training sessions. Linear unlearning of examples has recently been suggested as an approach to approximative cross-validation. Here we briefly review...... the linear unlearning scheme, dubbed LULOO, and we illustrate it on a systemidentification example. Further, we address the possibility of extracting confidence information (error bars) from the LULOO ensemble....

  13. Prediction of fat-free mass by bioelectrical impedance analysis in older adults from developing countries: a cross-validation study using the deuterium dilution method

    International Nuclear Information System (INIS)

    Mateo, H. Aleman; Romero, J. Esparza; Valencia, M.E.

    2010-01-01

    Objective: Several limitations of published bioelectrical impedance analysis (BIA) equations have been reported. The aims were to develop in a multiethnic, elderly population a new prediction equation and cross- validate it along with some published BIA equations for estimating fat-free mass using deuterium oxide dilution as the reference method. Design and setting: Cross-sectional study of elderly from five developing countries. Methods: Total body water (TBW) measured by deuterium dilution was used to determine fat-free mass (FFM) in 383 subjects. Anthropometric and BIA variables were also measured. Only 377 subjects were included for the analysis, randomly divided into development and cross-validation groups after stratified by gender. Stepwise model selection was used to generate the model and Bland Altman analysis was used to test agreement. Results: FFM = 2.95 - 3.89 (Gender) + 0.514 (Ht2/Z) + 0.090 (Waist) + 0.156 (Body weight). The model fit parameters were an R2, total F-Ratio, and the SEE of 0.88, 314.3, and 3.3, respectively. None of the published BIA equations met the criteria for agreement. The new BIA equation underestimated FFM by just 0.3 kg in the cross-validation sample. The mean of the difference between FFM by TBW and the new BIA equation were not significantly different; 95% of the differences were between the limits of agreement of -6.3 to 6.9 kg of FFM. There was no significant association between the mean of the differences and their averages (r= 0.008 and p= 0.2). Conclusions:This new BIA equation offers a valid option compared with some of the current published BIA equations to estimate FFM in elderly subjects from five developing countries. (Authors)

  14. Cross validation of gas chromatography-flame photometric detection and gas chromatography-mass spectrometry methods for measuring dialkylphosphate metabolites of organophosphate pesticides in human urine.

    Science.gov (United States)

    Prapamontol, Tippawan; Sutan, Kunrunya; Laoyang, Sompong; Hongsibsong, Surat; Lee, Grace; Yano, Yukiko; Hunter, Ronald Elton; Ryan, P Barry; Barr, Dana Boyd; Panuwet, Parinya

    2014-01-01

    We report two analytical methods for the measurement of dialkylphosphate (DAP) metabolites of organophosphate pesticides in human urine. These methods were independently developed/modified and implemented in two separate laboratories and cross validated. The aim was to develop simple, cost effective, and reliable methods that could use available resources and sample matrices in Thailand and the United States. While several methods already exist, we found that direct application of these methods required modification of sample preparation and chromatographic conditions to render accurate, reliable data. The problems encountered with existing methods were attributable to urinary matrix interferences, and differences in the pH of urine samples and reagents used during the extraction and derivatization processes. Thus, we provide information on key parameters that require attention during method modification and execution that affect the ruggedness of the methods. The methods presented here employ gas chromatography (GC) coupled with either flame photometric detection (FPD) or electron impact ionization-mass spectrometry (EI-MS) with isotopic dilution quantification. The limits of detection were reported from 0.10ng/mL urine to 2.5ng/mL urine (for GC-FPD), while the limits of quantification were reported from 0.25ng/mL urine to 2.5ng/mL urine (for GC-MS), for all six common DAP metabolites (i.e., dimethylphosphate, dimethylthiophosphate, dimethyldithiophosphate, diethylphosphate, diethylthiophosphate, and diethyldithiophosphate). Each method showed a relative recovery range of 94-119% (for GC-FPD) and 92-103% (for GC-MS), and relative standard deviations (RSD) of less than 20%. Cross-validation was performed on the same set of urine samples (n=46) collected from pregnant women residing in the agricultural areas of northern Thailand. The results from split sample analysis from both laboratories agreed well for each metabolite, suggesting that each method can produce

  15. A fast cross-validation method for alignment of electron tomography images based on Beer-Lambert law

    Science.gov (United States)

    Yan, Rui; Edwards, Thomas J.; Pankratz, Logan M.; Kuhn, Richard J.; Lanman, Jason K.; Liu, Jun; Jiang, Wen

    2015-01-01

    In electron tomography, accurate alignment of tilt series is an essential step in attaining high-resolution 3D reconstructions. Nevertheless, quantitative assessment of alignment quality has remained a challenging issue, even though many alignment methods have been reported. Here, we report a fast and accurate method, tomoAlignEval, based on the Beer-Lambert law, for the evaluation of alignment quality. Our method is able to globally estimate the alignment accuracy by measuring the goodness of log-linear relationship of the beam intensity attenuations at different tilt angles. Extensive tests with experimental data demonstrated its robust performance with stained and cryo samples. Our method is not only significantly faster but also more sensitive than measurements of tomogram resolution using Fourier shell correlation method (FSCe/o). From these tests, we also conclude that while current alignment methods are sufficiently accurate for stained samples, inaccurate alignments remain a major limitation for high resolution cryo-electron tomography. PMID:26455556

  16. A fast cross-validation method for alignment of electron tomography images based on Beer-Lambert law.

    Science.gov (United States)

    Yan, Rui; Edwards, Thomas J; Pankratz, Logan M; Kuhn, Richard J; Lanman, Jason K; Liu, Jun; Jiang, Wen

    2015-11-01

    In electron tomography, accurate alignment of tilt series is an essential step in attaining high-resolution 3D reconstructions. Nevertheless, quantitative assessment of alignment quality has remained a challenging issue, even though many alignment methods have been reported. Here, we report a fast and accurate method, tomoAlignEval, based on the Beer-Lambert law, for the evaluation of alignment quality. Our method is able to globally estimate the alignment accuracy by measuring the goodness of log-linear relationship of the beam intensity attenuations at different tilt angles. Extensive tests with experimental data demonstrated its robust performance with stained and cryo samples. Our method is not only significantly faster but also more sensitive than measurements of tomogram resolution using Fourier shell correlation method (FSCe/o). From these tests, we also conclude that while current alignment methods are sufficiently accurate for stained samples, inaccurate alignments remain a major limitation for high resolution cryo-electron tomography. Copyright © 2015 Elsevier Inc. All rights reserved.

  17. Fast CSF MRI for brain segmentation; Cross-validation by comparison with 3D T1-based brain segmentation methods

    DEFF Research Database (Denmark)

    van der Kleij, Lisa A.; de Bresser, Jeroen; Hendrikse, Jeroen

    2018-01-01

    ObjectiveIn previous work we have developed a fast sequence that focusses on cerebrospinal fluid (CSF) based on the long T-2 of CSF. By processing the data obtained with this CSF MRI sequence, brain parenchymal volume (BPV) and intracranial volume (ICV) can be automatically obtained. The aim...... of this study was to assess the precision of the BPV and ICV measurements of the CSF MRI sequence and to validate the CSF MRI sequence by comparison with 3D T-1-based brain segmentation methods.Materials and methodsTen healthy volunteers (2 females; median age 28 years) were scanned (3T MRI) twice......cc) and CSF HR (5 +/- 5/4 +/- 2cc) were comparable to FSL HR (9 +/- 11/19 +/- 23cc), FSL LR (7 +/- 4,6 +/- 5cc),FreeSurfer HR (5 +/- 3/14 +/- 8cc), FreeSurfer LR (9 +/- 8,12 +/- 10cc), and SPM HR (5 +/- 3/4 +/- 7cc), and SPM LR (5 +/- 4,5 +/- 3cc). The correlation between the measured volumes...

  18. Fast CSF MRI for brain segmentation; Cross-validation by comparison with 3D T1-based brain segmentation methods.

    Science.gov (United States)

    van der Kleij, Lisa A; de Bresser, Jeroen; Hendrikse, Jeroen; Siero, Jeroen C W; Petersen, Esben T; De Vis, Jill B

    2018-01-01

    In previous work we have developed a fast sequence that focusses on cerebrospinal fluid (CSF) based on the long T2 of CSF. By processing the data obtained with this CSF MRI sequence, brain parenchymal volume (BPV) and intracranial volume (ICV) can be automatically obtained. The aim of this study was to assess the precision of the BPV and ICV measurements of the CSF MRI sequence and to validate the CSF MRI sequence by comparison with 3D T1-based brain segmentation methods. Ten healthy volunteers (2 females; median age 28 years) were scanned (3T MRI) twice with repositioning in between. The scan protocol consisted of a low resolution (LR) CSF sequence (0:57min), a high resolution (HR) CSF sequence (3:21min) and a 3D T1-weighted sequence (6:47min). Data of the HR 3D-T1-weighted images were downsampled to obtain LR T1-weighted images (reconstructed imaging time: 1:59 min). Data of the CSF MRI sequences was automatically segmented using in-house software. The 3D T1-weighted images were segmented using FSL (5.0), SPM12 and FreeSurfer (5.3.0). The mean absolute differences for BPV and ICV between the first and second scan for CSF LR (BPV/ICV: 12±9/7±4cc) and CSF HR (5±5/4±2cc) were comparable to FSL HR (9±11/19±23cc), FSL LR (7±4, 6±5cc), FreeSurfer HR (5±3/14±8cc), FreeSurfer LR (9±8, 12±10cc), and SPM HR (5±3/4±7cc), and SPM LR (5±4, 5±3cc). The correlation between the measured volumes of the CSF sequences and that measured by FSL, FreeSurfer and SPM HR and LR was very good (all Pearson's correlation coefficients >0.83, R2 .67-.97). The results from the downsampled data and the high-resolution data were similar. Both CSF MRI sequences have a precision comparable to, and a very good correlation with established 3D T1-based automated segmentations methods for the segmentation of BPV and ICV. However, the short imaging time of the fast CSF MRI sequence is superior to the 3D T1 sequence on which segmentation with established methods is performed.

  19. Shuffling cross-validation-bee algorithm as a new descriptor selection method for retention studies of pesticides in biopartitioning micellar chromatography.

    Science.gov (United States)

    Zarei, Kobra; Atabati, Morteza; Ahmadi, Monire

    2017-05-04

    Bee algorithm (BA) is an optimization algorithm inspired by the natural foraging behaviour of honey bees to find the optimal solution which can be proposed to feature selection. In this paper, shuffling cross-validation-BA (CV-BA) was applied to select the best descriptors that could describe the retention factor (log k) in the biopartitioning micellar chromatography (BMC) of 79 heterogeneous pesticides. Six descriptors were obtained using BA and then the selected descriptors were applied for model development using multiple linear regression (MLR). The descriptor selection was also performed using stepwise, genetic algorithm and simulated annealing methods and MLR was applied to model development and then the results were compared with those obtained from shuffling CV-BA. The results showed that shuffling CV-BA can be applied as a powerful descriptor selection method. Support vector machine (SVM) was also applied for model development using six selected descriptors by BA. The obtained statistical results using SVM were better than those obtained using MLR, as the root mean square error (RMSE) and correlation coefficient (R) for whole data set (training and test), using shuffling CV-BA-MLR, were obtained as 0.1863 and 0.9426, respectively, while these amounts for the shuffling CV-BA-SVM method were obtained as 0.0704 and 0.9922, respectively.

  20. An intercomparison of a large ensemble of statistical downscaling methods for Europe: Overall results from the VALUE perfect predictor cross-validation experiment

    Science.gov (United States)

    Gutiérrez, Jose Manuel; Maraun, Douglas; Widmann, Martin; Huth, Radan; Hertig, Elke; Benestad, Rasmus; Roessler, Ole; Wibig, Joanna; Wilcke, Renate; Kotlarski, Sven

    2016-04-01

    VALUE is an open European network to validate and compare downscaling methods for climate change research (http://www.value-cost.eu). A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. This framework is based on a user-focused validation tree, guiding the selection of relevant validation indices and performance measures for different aspects of the validation (marginal, temporal, spatial, multi-variable). Moreover, several experiments have been designed to isolate specific points in the downscaling procedure where problems may occur (assessment of intrinsic performance, effect of errors inherited from the global models, effect of non-stationarity, etc.). The list of downscaling experiments includes 1) cross-validation with perfect predictors, 2) GCM predictors -aligned with EURO-CORDEX experiment- and 3) pseudo reality predictors (see Maraun et al. 2015, Earth's Future, 3, doi:10.1002/2014EF000259, for more details). The results of these experiments are gathered, validated and publicly distributed through the VALUE validation portal, allowing for a comprehensive community-open downscaling intercomparison study. In this contribution we describe the overall results from Experiment 1), consisting of a European wide 5-fold cross-validation (with consecutive 6-year periods from 1979 to 2008) using predictors from ERA-Interim to downscale precipitation and temperatures (minimum and maximum) over a set of 86 ECA&D stations representative of the main geographical and climatic regions in Europe. As a result of the open call for contribution to this experiment (closed in Dec. 2015), over 40 methods representative of the main approaches (MOS and Perfect Prognosis, PP) and techniques (linear scaling, quantile mapping, analogs, weather typing, linear and generalized regression, weather generators, etc.) were submitted, including information both data

  1. Online cross-validation-based ensemble learning.

    Science.gov (United States)

    Benkeser, David; Ju, Cheng; Lendle, Sam; van der Laan, Mark

    2018-01-30

    Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and, as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to identify the algorithm with the best performance. We show that by basing estimates on the cross-validation-selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best-performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  2. Linear Unlearning for Cross-Validation

    DEFF Research Database (Denmark)

    Hansen, Lars Kai; Larsen, Jan

    1996-01-01

    The leave-one-out cross-validation scheme for generalization assessment of neural network models is computationally expensive due to replicated training sessions. In this paper we suggest linear unlearning of examples as an approach to approximative cross-validation. Further, we discuss...... time series prediction benchmark demonstrate the potential of the linear unlearning technique...

  3. A theory of cross-validation error

    OpenAIRE

    Turney, Peter D.

    1994-01-01

    This paper presents a theory of error in cross-validation testing of algorithms for predicting real-valued attributes. The theory justifies the claim that predicting real-valued attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore, the theory indicates precisely how these conflicting demands must be balanced, in order to minimize cross-validation error. A general theory is presented, then it is developed in detail for linear regression and instance-bas...

  4. Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method

    International Nuclear Information System (INIS)

    Yan, Shiju; Qian, Wei; Guan, Yubao; Zheng, Bin

    2016-01-01

    Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initially computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.

  5. Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method

    Energy Technology Data Exchange (ETDEWEB)

    Yan, Shiju [School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China and School of Electrical and Computer Engineering, University of Oklahoma, Norman, Oklahoma 73019 (United States); Qian, Wei [Department of Electrical and Computer Engineering, University of Texas, El Paso, Texas 79968 and Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang 110819 (China); Guan, Yubao [Department of Radiology, Guangzhou Medical University, Guangzhou 510182 (China); Zheng, Bin, E-mail: Bin.Zheng-1@ou.edu [School of Electrical and Computer Engineering, University of Oklahoma, Norman, Oklahoma 73019 (United States)

    2016-06-15

    Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initially computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.

  6. Cross-validation of two commercial methods for volumetric high-resolution dose reconstruction on a phantom for non-coplanar VMAT beams

    International Nuclear Information System (INIS)

    Feygelman, Vladimir; Stambaugh, Cassandra; Opp, Daniel; Zhang, Geoffrey; Moros, Eduardo G.; Nelms, Benjamin E.

    2014-01-01

    Background and purpose: Delta 4 (ScandiDos AB, Uppsala, Sweden) and ArcCHECK with 3DVH software (Sun Nuclear Corp., Melbourne, FL, USA) are commercial quasi-three-dimensional diode dosimetry arrays capable of volumetric measurement-guided dose reconstruction. A method to reconstruct dose for non-coplanar VMAT beams with 3DVH is described. The Delta 4 3D dose reconstruction on its own phantom for VMAT delivery has not been thoroughly evaluated previously, and we do so by comparison with 3DVH. Materials and methods: Reconstructed volumetric doses for VMAT plans delivered with different table angles were compared between the Delta 4 and 3DVH using gamma analysis. Results: The average γ (2% local dose-error normalization/2mm) passing rate comparing the directly measured Delta 4 diode dose with 3DVH was 98.2 ± 1.6% (1SD). The average passing rate for the full volumetric comparison of the reconstructed doses on a homogeneous cylindrical phantom was 95.6 ± 1.5%. No dependence on the table angle was observed. Conclusions: Modified 3DVH algorithm is capable of 3D VMAT dose reconstruction on an arbitrary volume for the full range of table angles. Our comparison results between different dosimeters make a compelling case for the use of electronic arrays with high-resolution 3D dose reconstruction as primary means of evaluating spatial dose distributions during IMRT/VMAT verification

  7. Determination of lead content in drilling fueled soil using laser induced spectral analysis and its cross validation using ICP/OES method.

    Science.gov (United States)

    Rehan, I; Gondal, M A; Rehan, K

    2018-05-15

    A detection system based on Laser Induced Breakdown Spectroscopy (LIBS) was designed, optimized, and successfully employed for the estimation of lead (Pb) content in drilling fueled soil (DFS) collected from oil field drilling areas in Pakistan. The concentration of Pb was evaluated by the standard calibration curve method as well as by using an approach based on the integrated intensity of strongest emission of an element of interest. Remarkably, our investigation clearly demonstrated that the concentration of Pb in drilling fueled soil collected at the exact drilling site was greater than the safe permissible limits. Furthermore, the Pb concentration was observed to decline with increasing distance away from the specific drilling point. Analytical determinations were carried out under the assumptions that laser generated plasma was optically thin and in local thermodynamic equilibrium (LTE). In order to improve the sensitivity of our LIBS detection system, various parametric dependence studies were performed. To further validate the precision of our LIBS results, the concentration of Pb present in the acquired samples were also quantified via a standard analytical tool like inductively coupled plasma/optical emission spectroscopy (ICP/OES). Both results were in excellent agreement, implying remarkable reliability for the LIBS data. Furthermore, the Limit of detection (LOD) of our LIBS system for Pb was estimated to be 125.14 mg L -1 . Copyright © 2018 Elsevier B.V. All rights reserved.

  8. CVTresh: R Package for Level-Dependent Cross-Validation Thresholding

    Directory of Open Access Journals (Sweden)

    Donghoh Kim

    2006-04-01

    Full Text Available The core of the wavelet approach to nonparametric regression is thresholding of wavelet coefficients. This paper reviews a cross-validation method for the selection of the thresholding value in wavelet shrinkage of Oh, Kim, and Lee (2006, and introduces the R package CVThresh implementing details of the calculations for the procedures. This procedure is implemented by coupling a conventional cross-validation with a fast imputation method, so that it overcomes a limitation of data length, a power of 2. It can be easily applied to the classical leave-one-out cross-validation and K-fold cross-validation. Since the procedure is computationally fast, a level-dependent cross-validation can be developed for wavelet shrinkage of data with various sparseness according to levels.

  9. CVTresh: R Package for Level-Dependent Cross-Validation Thresholding

    Directory of Open Access Journals (Sweden)

    Donghoh Kim

    2006-04-01

    Full Text Available The core of the wavelet approach to nonparametric regression is thresholding of wavelet coefficients. This paper reviews a cross-validation method for the selection of the thresholding value in wavelet shrinkage of Oh, Kim, and Lee (2006, and introduces the R package CVThresh implementing details of the calculations for the procedures.This procedure is implemented by coupling a conventional cross-validation with a fast imputation method, so that it overcomes a limitation of data length, a power of 2. It can be easily applied to the classical leave-one-out cross-validation and K-fold cross-validation. Since the procedure is computationally fast, a level-dependent cross-validation can be developed for wavelet shrinkage of data with various sparseness according to levels.

  10. Cross-validation pitfalls when selecting and assessing regression and classification models.

    Science.gov (United States)

    Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon

    2014-03-29

    We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.

  11. Ensemble Kalman filter regularization using leave-one-out data cross-validation

    KAUST Repository

    Rayo Schiappacasse, Lautaro Jeró nimo; Hoteit, Ibrahim

    2012-01-01

    In this work, the classical leave-one-out cross-validation method for selecting a regularization parameter for the Tikhonov problem is implemented within the EnKF framework. Following the original concept, the regularization parameter is selected

  12. Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models

    DEFF Research Database (Denmark)

    Vehtari, Aki; Mononen, Tommi; Tolvanen, Ville

    2016-01-01

    The future predictive performance of a Bayesian model can be estimated using Bayesian cross-validation. In this article, we consider Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation (EP). We study...... the properties of several Bayesian leave-one-out (LOO) cross-validation approximations that in most cases can be computed with a small additional cost after forming the posterior approximation given the full data. Our main objective is to assess the accuracy of the approximative LOO cross-validation estimators...

  13. EVALUACIÓN DE DOS MÉTODOS DE ESTABILIDAD FENOTÍPICA A TRAVÉS DE VALIDACIÓN CRUZADA EVALUATION OF TWO METHODS OF PHENOTYPIC STABILITY THROUGH CROSS-VALIDATION

    Directory of Open Access Journals (Sweden)

    Jairo Alberto Rueda Restrepo

    2009-12-01

    Full Text Available Una de las principales preocupaciones de los fitomejoradores es la evaluación de la estabilidad fenotípica mediante la realización de pruebas regionales o multiambiente. Existen numerosos métodos propuestos para el análisis de estas pruebas regionales y la estimación de la estabilidad fenotípica. En este trabajo se compara el método de regresión propuesto por Eberhart y Russell y el de componentes de varianza propuesto por Shukla, siguiendo un esquema de validación cruzada. Para ello fueron utilizados datos provenientes de 20 pruebas multiambiente de maíz, cada una con nueve genotipos, plantadas bajo un diseño en bloques completos al azar con cuatro repeticiones. Se encontró que el mejor modelo para predecir el rendimiento futuro de un genotipo en un determinado ambiente es el método de Eberhart y Russell, presentando un valor de raíz cuadrada del cuadrado medio de predicción 2,21% menos que el método de Shukla, con una consistencia en la predicción de 90,6%.One of the most important topics of plant breeders is to evaluate the phenotypic stability through regional trials or multi-environment trials. There are many methods proposed to analyze those trials and to estimate the phenotypic stability. This paper compares the regression method proposed by Eberhart and Russell and the components of variance proposed by Shukla, according to a cross-validation methodology. Data from 20 multi-environment corn tests, each one with nine genotypes, planted under a randomized complete block design with four replications, were used. It was found that the best model to predict the future performance of a genotype is the method of Eberhart and Russell, showing a root square value of the prediction medium 2,21% less than Shukla´s method , which a prediction consistence of 90.6%.

  14. Cross-validated detection of crack initiation in aerospace materials

    Science.gov (United States)

    Vanniamparambil, Prashanth A.; Cuadra, Jefferson; Guclu, Utku; Bartoli, Ivan; Kontsos, Antonios

    2014-03-01

    A cross-validated nondestructive evaluation approach was employed to in situ detect the onset of damage in an Aluminum alloy compact tension specimen. The approach consisted of the coordinated use primarily the acoustic emission, combined with the infrared thermography and digital image correlation methods. Both tensile loads were applied and the specimen was continuously monitored using the nondestructive approach. Crack initiation was witnessed visually and was confirmed by the characteristic load drop accompanying the ductile fracture process. The full field deformation map provided by the nondestructive approach validated the formation of a pronounced plasticity zone near the crack tip. At the time of crack initiation, a burst in the temperature field ahead of the crack tip as well as a sudden increase of the acoustic recordings were observed. Although such experiments have been attempted and reported before in the literature, the presented approach provides for the first time a cross-validated nondestructive dataset that can be used for quantitative analyses of the crack initiation information content. It further allows future development of automated procedures for real-time identification of damage precursors including the rarely explored crack incubation stage in fatigue conditions.

  15. Efficient approximate k-fold and leave-one-out cross-validation for ridge regression

    NARCIS (Netherlands)

    Meijer, R.J.; Goeman, J.J.

    2013-01-01

    In model building and model evaluation, cross-validation is a frequently used resampling method. Unfortunately, this method can be quite time consuming. In this article, we discuss an approximation method that is much faster and can be used in generalized linear models and Cox' proportional hazards

  16. Ensemble Kalman filter regularization using leave-one-out data cross-validation

    KAUST Repository

    Rayo Schiappacasse, Lautaro Jerónimo

    2012-09-19

    In this work, the classical leave-one-out cross-validation method for selecting a regularization parameter for the Tikhonov problem is implemented within the EnKF framework. Following the original concept, the regularization parameter is selected such that it minimizes the predictive error. Some ideas about the implementation, suitability and conceptual interest of the method are discussed. Finally, what will be called the data cross-validation regularized EnKF (dCVr-EnKF) is implemented in a 2D 2-phase synthetic oil reservoir experiment and the results analyzed.

  17. A cross-validation package driving Netica with python

    Science.gov (United States)

    Fienen, Michael N.; Plant, Nathaniel G.

    2014-01-01

    Bayesian networks (BNs) are powerful tools for probabilistically simulating natural systems and emulating process models. Cross validation is a technique to avoid overfitting resulting from overly complex BNs. Overfitting reduces predictive skill. Cross-validation for BNs is known but rarely implemented due partly to a lack of software tools designed to work with available BN packages. CVNetica is open-source, written in Python, and extends the Netica software package to perform cross-validation and read, rebuild, and learn BNs from data. Insights gained from cross-validation and implications on prediction versus description are illustrated with: a data-driven oceanographic application; and a model-emulation application. These examples show that overfitting occurs when BNs become more complex than allowed by supporting data and overfitting incurs computational costs as well as causing a reduction in prediction skill. CVNetica evaluates overfitting using several complexity metrics (we used level of discretization) and its impact on performance metrics (we used skill).

  18. Benchmarking protein classification algorithms via supervised cross-validation

    NARCIS (Netherlands)

    Kertész-Farkas, A.; Dhir, S.; Sonego, P.; Pacurar, M.; Netoteia, S.; Nijveen, H.; Kuzniar, A.; Leunissen, J.A.M.; Kocsor, A.; Pongor, S.

    2008-01-01

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold,

  19. Classification in hyperspectral images by independent component analysis, segmented cross-validation and uncertainty estimates

    Directory of Open Access Journals (Sweden)

    Beatriz Galindo-Prieto

    2018-02-01

    Full Text Available Independent component analysis combined with various strategies for cross-validation, uncertainty estimates by jack-knifing and critical Hotelling’s T2 limits estimation, proposed in this paper, is used for classification purposes in hyperspectral images. To the best of our knowledge, the combined approach of methods used in this paper has not been previously applied to hyperspectral imaging analysis for interpretation and classification in the literature. The data analysis performed here aims to distinguish between four different types of plastics, some of them containing brominated flame retardants, from their near infrared hyperspectral images. The results showed that the method approach used here can be successfully used for unsupervised classification. A comparison of validation approaches, especially leave-one-out cross-validation and regions of interest scheme validation is also evaluated.

  20. Cross-Validation of Aerobic Capacity Prediction Models in Adolescents.

    Science.gov (United States)

    Burns, Ryan Donald; Hannon, James C; Brusseau, Timothy A; Eisenman, Patricia A; Saint-Maurice, Pedro F; Welk, Greg J; Mahar, Matthew T

    2015-08-01

    Cardiorespiratory endurance is a component of health-related fitness. FITNESSGRAM recommends the Progressive Aerobic Cardiovascular Endurance Run (PACER) or One mile Run/Walk (1MRW) to assess cardiorespiratory endurance by estimating VO2 Peak. No research has cross-validated prediction models from both PACER and 1MRW, including the New PACER Model and PACER-Mile Equivalent (PACER-MEQ) using current standards. The purpose of this study was to cross-validate prediction models from PACER and 1MRW against measured VO2 Peak in adolescents. Cardiorespiratory endurance data were collected on 90 adolescents aged 13-16 years (Mean = 14.7 ± 1.3 years; 32 girls, 52 boys) who completed the PACER and 1MRW in addition to a laboratory maximal treadmill test to measure VO2 Peak. Multiple correlations among various models with measured VO2 Peak were considered moderately strong (R = .74-0.78), and prediction error (RMSE) ranged from 5.95 ml·kg⁻¹,min⁻¹ to 8.27 ml·kg⁻¹.min⁻¹. Criterion-referenced agreement into FITNESSGRAM's Healthy Fitness Zones was considered fair-to-good among models (Kappa = 0.31-0.62; Agreement = 75.5-89.9%; F = 0.08-0.65). In conclusion, prediction models demonstrated moderately strong linear relationships with measured VO2 Peak, fair prediction error, and fair-to-good criterion referenced agreement with measured VO2 Peak into FITNESSGRAM's Healthy Fitness Zones.

  1. Cross validation of bioelectrical impedance equations for men

    Directory of Open Access Journals (Sweden)

    Maria Fátima Glaner

    2005-06-01

    Full Text Available The purpose of this study was to analyze the cross validity of bioimpedance equations (BIA on the estimation of the fat free mass (FFM of 44 men, with mean age of 24.98 ± 3.40 years and relative body fat (%BF of 17.15 ± 6.41%. A dual energy x-ray absorptiometry was used as a reference method for %BF and FFM. Total body resistance was assessed by the Biodynamics (Model 310. The equations analyzed in this study were: two equations (Eq.1 and 2 developed by Carvalho e Pires Neto (1998; one equation (Eq. 3 developed by Rising et al.(1991; one equation (Eq. 4 developed by Oppliger et al.(1991; two equations (Eq. 5 (%FM RESUMO Este estudo teve como objetivo analisar a validade concorrente de equações de impedância bioelétrica (IB para estimar a massa corporal livre de gordura (MLG, em 44 homens, com idade média de 24,98 ± 3,40 anos e gordura relativa (%G de 17,15 ± 6,41 %. A absortometria de raio-x de dupla energia foi usada como critério, para mensurar a %G e a MLG, e para obter estas variáveis decorrentes das equações de IB foi utilizado o Biodynamics (Modelo 310. As equações de IB analisadas neste estudo foram: duas equações (Eq. 1 e 2 de Carvalho e Pires Neto (1998; uma equação (Eq. 3 de Rising et al. (1991; uma equação (Eq. 4 de Oppliger et al.(1991; duas equações (Eq. 5 (%G < 20% e Eq. 6 (%G ≥ 20% de Segal et al. (1988. Os critérios adotados para validação foram os propostos por Lohman (1991. Todas as correlações foram altas e significativas, oscilando de 0,906 (Eq. 2 a 0,981 (Eq. 6. As equações 1 a 5 superestimaram de forma significativa (p < 0,001 a MLG, sendo que os erros constantes variaram de 1,32 kg (Eq. 5 a 5,90 kg (Eq. 4. A equação 6 atendeu a todos os critérios de validação, apresentando: correlação = 0,981; erro constante = -0,38 kg; erro total = 1,10 kg. Esta equação de Segal et al.(1988, para homens com gordura relativa ≥ 20% (Eq. 6 foi a única que apresentou validade concorrente, estimando

  2. Cross-validating a bidimensional mathematics anxiety scale.

    Science.gov (United States)

    Haiyan Bai

    2011-03-01

    The psychometric properties of a 14-item bidimensional Mathematics Anxiety Scale-Revised (MAS-R) were empirically cross-validated with two independent samples consisting of 647 secondary school students. An exploratory factor analysis on the scale yielded strong construct validity with a clear two-factor structure. The results from a confirmatory factor analysis indicated an excellent model-fit (χ(2) = 98.32, df = 62; normed fit index = .92, comparative fit index = .97; root mean square error of approximation = .04). The internal consistency (.85), test-retest reliability (.71), interfactor correlation (.26, p anxiety. Math anxiety, as measured by MAS-R, correlated negatively with student achievement scores (r = -.38), suggesting that MAS-R may be a useful tool for classroom teachers and other educational personnel tasked with identifying students at risk of reduced math achievement because of anxiety.

  3. Development and Cross-Validation of Equation for Estimating Percent Body Fat of Korean Adults According to Body Mass Index

    Directory of Open Access Journals (Sweden)

    Hoyong Sung

    2017-06-01

    Full Text Available Background : Using BMI as an independent variable is the easiest way to estimate percent body fat. Thus far, few studies have investigated the development and cross-validation of an equation for estimating the percent body fat of Korean adults according to the BMI. The goals of this study were the development and cross-validation of an equation for estimating the percent fat of representative Korean adults using the BMI. Methods : Samples were obtained from the Korea National Health and Nutrition Examination Survey between 2008 and 2011. The samples from 2008-2009 and 2010-2011 were labeled as the validation group (n=10,624 and the cross-validation group (n=8,291, respectively. The percent fat was measured using dual-energy X-ray absorptiometry, and the body mass index, gender, and age were included as independent variables to estimate the measured percent fat. The coefficient of determination (R², standard error of estimation (SEE, and total error (TE were calculated to examine the accuracy of the developed equation. Results : The cross-validated R² was 0.731 for Model 1 and 0.735 for Model 2. The SEE was 3.978 for Model 1 and 3.951 for Model 2. The equations developed in this study are more accurate for estimating percent fat of the cross-validation group than those previously published by other researchers. Conclusion : The newly developed equations are comparatively accurate for the estimation of the percent fat of Korean adults.

  4. Cross validation for the classical model of structured expert judgment

    International Nuclear Information System (INIS)

    Colson, Abigail R.; Cooke, Roger M.

    2017-01-01

    We update the 2008 TU Delft structured expert judgment database with data from 33 professionally contracted Classical Model studies conducted between 2006 and March 2015 to evaluate its performance relative to other expert aggregation models. We briefly review alternative mathematical aggregation schemes, including harmonic weighting, before focusing on linear pooling of expert judgments with equal weights and performance-based weights. Performance weighting outperforms equal weighting in all but 1 of the 33 studies in-sample. True out-of-sample validation is rarely possible for Classical Model studies, and cross validation techniques that split calibration questions into a training and test set are used instead. Performance weighting incurs an “out-of-sample penalty” and its statistical accuracy out-of-sample is lower than that of equal weighting. However, as a function of training set size, the statistical accuracy of performance-based combinations reaches 75% of the equal weight value when the training set includes 80% of calibration variables. At this point the training set is sufficiently powerful to resolve differences in individual expert performance. The information of performance-based combinations is double that of equal weighting when the training set is at least 50% of the set of calibration variables. Previous out-of-sample validation work used a Total Out-of-Sample Validity Index based on all splits of the calibration questions into training and test subsets, which is expensive to compute and includes small training sets of dubious value. As an alternative, we propose an Out-of-Sample Validity Index based on averaging the product of statistical accuracy and information over all training sets sized at 80% of the calibration set. Performance weighting outperforms equal weighting on this Out-of-Sample Validity Index in 26 of the 33 post-2006 studies; the probability of 26 or more successes on 33 trials if there were no difference between performance

  5. Evaluation of Analysis by Cross-Validation, Part II: Diagnostic and Optimization of Analysis Error Covariance

    Directory of Open Access Journals (Sweden)

    Richard Ménard

    2018-02-01

    Full Text Available We present a general theory of estimation of analysis error covariances based on cross-validation as well as a geometric interpretation of the method. In particular, we use the variance of passive observation-minus-analysis residuals and show that the true analysis error variance can be estimated, without relying on the optimality assumption. This approach is used to obtain near optimal analyses that are then used to evaluate the air quality analysis error using several different methods at active and passive observation sites. We compare the estimates according to the method of Hollingsworth-Lönnberg, Desroziers et al., a new diagnostic we developed, and the perceived analysis error computed from the analysis scheme, to conclude that, as long as the analysis is near optimal, all estimates agree within a certain error margin.

  6. Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement.

    Science.gov (United States)

    Nguyen, N; Milanfar, P; Golub, G

    2001-01-01

    In many image restoration/resolution enhancement applications, the blurring process, i.e., point spread function (PSF) of the imaging system, is not known or is known only to within a set of parameters. We estimate these PSF parameters for this ill-posed class of inverse problem from raw data, along with the regularization parameters required to stabilize the solution, using the generalized cross-validation method (GCV). We propose efficient approximation techniques based on the Lanczos algorithm and Gauss quadrature theory, reducing the computational complexity of the GCV. Data-driven PSF and regularization parameter estimation experiments with synthetic and real image sequences are presented to demonstrate the effectiveness and robustness of our method.

  7. Asymptotic optimality and efficient computation of the leave-subject-out cross-validation

    KAUST Repository

    Xu, Ganggang

    2012-12-01

    Although the leave-subject-out cross-validation (CV) has been widely used in practice for tuning parameter selection for various nonparametric and semiparametric models of longitudinal data, its theoretical property is unknown and solving the associated optimization problem is computationally expensive, especially when there are multiple tuning parameters. In this paper, by focusing on the penalized spline method, we show that the leave-subject-out CV is optimal in the sense that it is asymptotically equivalent to the empirical squared error loss function minimization. An efficient Newton-type algorithm is developed to compute the penalty parameters that optimize the CV criterion. Simulated and real data are used to demonstrate the effectiveness of the leave-subject-out CV in selecting both the penalty parameters and the working correlation matrix. © 2012 Institute of Mathematical Statistics.

  8. Asymptotic optimality and efficient computation of the leave-subject-out cross-validation

    KAUST Repository

    Xu, Ganggang; Huang, Jianhua Z.

    2012-01-01

    Although the leave-subject-out cross-validation (CV) has been widely used in practice for tuning parameter selection for various nonparametric and semiparametric models of longitudinal data, its theoretical property is unknown and solving the associated optimization problem is computationally expensive, especially when there are multiple tuning parameters. In this paper, by focusing on the penalized spline method, we show that the leave-subject-out CV is optimal in the sense that it is asymptotically equivalent to the empirical squared error loss function minimization. An efficient Newton-type algorithm is developed to compute the penalty parameters that optimize the CV criterion. Simulated and real data are used to demonstrate the effectiveness of the leave-subject-out CV in selecting both the penalty parameters and the working correlation matrix. © 2012 Institute of Mathematical Statistics.

  9. SU-E-T-231: Cross-Validation of 3D Gamma Comparison Tools

    International Nuclear Information System (INIS)

    Alexander, KM; Jechel, C; Pinter, C; Lasso, A; Fichtinger, G; Salomons, G; Schreiner, LJ

    2015-01-01

    Purpose: Moving the computational analysis for 3D gel dosimetry into the 3D Slicer (www.slicer.org) environment has made gel dosimetry more clinically accessible. To ensure accuracy, we cross-validate the 3D gamma comparison module in 3D Slicer with an independently developed algorithm using simulated and measured dose distributions. Methods: Two reference dose distributions were generated using the Varian Eclipse treatment planning system. The first distribution consisted of a four-field box irradiation delivered to a plastic water phantom and the second, a VMAT plan delivered to a gel dosimeter phantom. The first reference distribution was modified within Eclipse to create an evaluated dose distribution by spatially shifting one field by 3mm, increasing the monitor units of the second field, applying a dynamic wedge for the third field, and leaving the fourth field unchanged. The VMAT plan was delivered to a gel dosimeter and the evaluated dose in the gel was calculated from optical CT measurements. Results from the gamma comparison tool built into the SlicerRT toolbox were compared to results from our in-house gamma algorithm implemented in Matlab (via MatlabBridge in 3D Slicer). The effects of noise, resolution and the exchange of reference and evaluated designations on the gamma comparison were also examined. Results: Perfect agreement was found between the gamma results obtained using the SlicerRT tool and our Matlab implementation for both the four-field box and gel datasets. The behaviour of the SlicerRT comparison with respect to changes in noise, resolution and the role of the reference and evaluated dose distributions was consistent with previous findings. Conclusion: Two independently developed gamma comparison tools have been cross-validated and found to be identical. As we transition our gel dosimetry analysis from Matlab to 3D Slicer, this validation serves as an important test towards ensuring the consistency of dose comparisons using the 3D Slicer

  10. The cross-validated AUC for MCP-logistic regression with high-dimensional data.

    Science.gov (United States)

    Jiang, Dingfeng; Huang, Jian; Zhang, Ying

    2013-10-01

    We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

  11. Cross Validation of Rain Drop Size Distribution between GPM and Ground Based Polarmetric radar

    Science.gov (United States)

    Chandra, C. V.; Biswas, S.; Le, M.; Chen, H.

    2017-12-01

    Dual-frequency precipitation radar (DPR) on board the Global Precipitation Measurement (GPM) core satellite has reflectivity measurements at two independent frequencies, Ku- and Ka- band. Dual-frequency retrieval algorithms have been developed traditionally through forward, backward, and recursive approaches. However, these algorithms suffer from "dual-value" problem when they retrieve medium volume diameter from dual-frequency ratio (DFR) in rain region. To this end, a hybrid method has been proposed to perform raindrop size distribution (DSD) retrieval for GPM using a linear constraint of DSD along rain profile to avoid "dual-value" problem (Le and Chandrasekar, 2015). In the current GPM level 2 algorithm (Iguchi et al. 2017- Algorithm Theoretical Basis Document) the Solver module retrieves a vertical profile of drop size distributionn from dual-frequency observations and path integrated attenuations. The algorithm details can be found in Seto et al. (2013) . On the other hand, ground based polarimetric radars have been used for a long time to estimate drop size distributions (e.g., Gorgucci et al. 2002 ). In addition, coincident GPM and ground based observations have been cross validated using careful overpass analysis. In this paper, we perform cross validation on raindrop size distribution retrieval from three sources, namely the hybrid method, the standard products from the solver module and DSD retrievals from ground polarimetric radars. The results are presented from two NEXRAD radars located in Dallas -Fort Worth, Texas (i.e., KFWS radar) and Melbourne, Florida (i.e., KMLB radar). The results demonstrate the ability of DPR observations to produce DSD estimates, which can be used subsequently to generate global DSD maps. References: Seto, S., T. Iguchi, T. Oki, 2013: The basic performance of a precipitation retrieval algorithm for the Global Precipitation Measurement mission's single/dual-frequency radar measurements. IEEE Transactions on Geoscience and

  12. Biased binomial assessment of cross-validated estimation of classification accuracies illustrated in diagnosis predictions

    Directory of Open Access Journals (Sweden)

    Quentin Noirhomme

    2014-01-01

    Full Text Available Multivariate classification is used in neuroimaging studies to infer brain activation or in medical applications to infer diagnosis. Their results are often assessed through either a binomial or a permutation test. Here, we simulated classification results of generated random data to assess the influence of the cross-validation scheme on the significance of results. Distributions built from classification of random data with cross-validation did not follow the binomial distribution. The binomial test is therefore not adapted. On the contrary, the permutation test was unaffected by the cross-validation scheme. The influence of the cross-validation was further illustrated on real-data from a brain–computer interface experiment in patients with disorders of consciousness and from an fMRI study on patients with Parkinson disease. Three out of 16 patients with disorders of consciousness had significant accuracy on binomial testing, but only one showed significant accuracy using permutation testing. In the fMRI experiment, the mental imagery of gait could discriminate significantly between idiopathic Parkinson's disease patients and healthy subjects according to the permutation test but not according to the binomial test. Hence, binomial testing could lead to biased estimation of significance and false positive or negative results. In our view, permutation testing is thus recommended for clinical application of classification with cross-validation.

  13. Biased binomial assessment of cross-validated estimation of classification accuracies illustrated in diagnosis predictions.

    Science.gov (United States)

    Noirhomme, Quentin; Lesenfants, Damien; Gomez, Francisco; Soddu, Andrea; Schrouff, Jessica; Garraux, Gaëtan; Luxen, André; Phillips, Christophe; Laureys, Steven

    2014-01-01

    Multivariate classification is used in neuroimaging studies to infer brain activation or in medical applications to infer diagnosis. Their results are often assessed through either a binomial or a permutation test. Here, we simulated classification results of generated random data to assess the influence of the cross-validation scheme on the significance of results. Distributions built from classification of random data with cross-validation did not follow the binomial distribution. The binomial test is therefore not adapted. On the contrary, the permutation test was unaffected by the cross-validation scheme. The influence of the cross-validation was further illustrated on real-data from a brain-computer interface experiment in patients with disorders of consciousness and from an fMRI study on patients with Parkinson disease. Three out of 16 patients with disorders of consciousness had significant accuracy on binomial testing, but only one showed significant accuracy using permutation testing. In the fMRI experiment, the mental imagery of gait could discriminate significantly between idiopathic Parkinson's disease patients and healthy subjects according to the permutation test but not according to the binomial test. Hence, binomial testing could lead to biased estimation of significance and false positive or negative results. In our view, permutation testing is thus recommended for clinical application of classification with cross-validation.

  14. Compressive Sensing with Cross-Validation and Stop-Sampling for Sparse Polynomial Chaos Expansions

    Energy Technology Data Exchange (ETDEWEB)

    Huan, Xun; Safta, Cosmin; Sargsyan, Khachik; Vane, Zachary Phillips; Lacaze, Guilhem; Oefelein, Joseph C.; Najm, Habib N.

    2017-07-01

    Compressive sensing is a powerful technique for recovering sparse solutions of underdetermined linear systems, which is often encountered in uncertainty quanti cation analysis of expensive and high-dimensional physical models. We perform numerical investigations employing several com- pressive sensing solvers that target the unconstrained LASSO formulation, with a focus on linear systems that arise in the construction of polynomial chaos expansions. With core solvers of l1 ls, SpaRSA, CGIST, FPC AS, and ADMM, we develop techniques to mitigate over tting through an automated selection of regularization constant based on cross-validation, and a heuristic strategy to guide the stop-sampling decision. Practical recommendations on parameter settings for these tech- niques are provided and discussed. The overall method is applied to a series of numerical examples of increasing complexity, including large eddy simulations of supersonic turbulent jet-in-cross flow involving a 24-dimensional input. Through empirical phase-transition diagrams and convergence plots, we illustrate sparse recovery performance under structures induced by polynomial chaos, accuracy and computational tradeoffs between polynomial bases of different degrees, and practi- cability of conducting compressive sensing for a realistic, high-dimensional physical application. Across test cases studied in this paper, we find ADMM to have demonstrated empirical advantages through consistent lower errors and faster computational times.

  15. Accelerating cross-validation with total variation and its application to super-resolution imaging.

    Directory of Open Access Journals (Sweden)

    Tomoyuki Obuchi

    Full Text Available We develop an approximation formula for the cross-validation error (CVE of a sparse linear regression penalized by ℓ1-norm and total variation terms, which is based on a perturbative expansion utilizing the largeness of both the data dimensionality and the model. The developed formula allows us to reduce the necessary computational cost of the CVE evaluation significantly. The practicality of the formula is tested through application to simulated black-hole image reconstruction on the event-horizon scale with super resolution. The results demonstrate that our approximation reproduces the CVE values obtained via literally conducted cross-validation with reasonably good precision.

  16. A Cross-Validation Study of Police Recruit Performance as Predicted by the IPI and MMPI.

    Science.gov (United States)

    Shusman, Elizabeth J.; And Others

    Validation and cross-validation studies were conducted using the Minnesota Multiphasic Personality Inventory (MMPI) and Inwald Personality Inventory (IPI) to predict job performance for 698 urban male police officers who completed a six-month training academy. Job performance criteria evaluated included absence, lateness, derelictions, negative…

  17. Cross validation of two partitioning-based sampling approaches in mesocosms containing PCB contaminated field sediment, biota, and activated carbon amendment

    DEFF Research Database (Denmark)

    Nørgaard Schmidt, Stine; Wang, Alice P.; Gidley, Philip T

    2017-01-01

    with multiple thicknesses of silicone and in situ pre-equilibrium sampling with low density polyethylene (LDPE) loaded with performance reference compounds were applied independently to measure polychlorinated biphenyls (PCBs) in mesocosms with (1) New Bedford Harbor sediment (MA, USA), (2) sediment and biota......, and (3) activated carbon amended sediment and biota. The aim was to cross validate the two different sampling approaches. Around 100 PCB congeners were quantified in the two sampling polymers, and the results confirmed the good precision of both methods and were in overall good agreement with recently...... published silicone to LDPE partition ratios. Further, the methods yielded Cfree in good agreement for all three experiments. The average ratio between Cfree determined by the two methods was factor 1.4±0.3 (range: 0.6-2.0), and the results thus cross-validated the two sampling approaches. For future...

  18. Development and Cross-Validation of the Short Form of the Cultural Competence Scale for Nurses

    Directory of Open Access Journals (Sweden)

    Duckhee Chae, PhD, RN

    2018-03-01

    Full Text Available Purpose: To develop and validate the short form of the Korean adaptation of the Cultural Competence Scale for Nurses. Methods: To shorten the 33-item Cultural Competence Scale for Nurses, an expert panel (N = 6 evaluated its content validity. The revised items were pilot tested using a sample of nine nurses, and clarity was assessed through cognitive interviews with respondents. The original instrument was shortened and validated through item analysis, exploratory factor analysis, convergent validity, and reliability using data from 277 hospital nurses. The 14-item final version was cross-validated through confirmatory factor analysis, convergent validity, discriminant validity, known-group comparisons, and reliability using data from 365 nurses belonging to 19 hospitals. Results: A 4-factor, 14-item model demonstrated satisfactory fit with significant factor loadings. The convergent validity between the developed tool and transcultural self-efficacy was significant (r = .55, p < .001. The convergent validity evaluated using the Average Variance Extracted and discriminant validity were acceptable. Known-group comparisons revealed significant differences in the mean scores of the groups who spent more than one month abroad (p = .002 were able to communicate in a foreign language (p < .001 and had education to care for foreign patients (p = .039. Cronbach's α was .89, and the reliability of the subscales ranged from .74 to .91. Conclusion: The Cultural Competence Scale for Nurses-Short Form demonstrated good reliability and validity. It is a short and appropriate instrument for use in clinical and research settings to assess nurses' cultural competence. Keywords: cultural competence, psychometric properties, nurse

  19. Cross Validation Through Two-Dimensional Solution Surface for Cost-Sensitive SVM.

    Science.gov (United States)

    Gu, Bin; Sheng, Victor S; Tay, Keng Yeow; Romano, Walter; Li, Shuo

    2017-06-01

    Model selection plays an important role in cost-sensitive SVM (CS-SVM). It has been proven that the global minimum cross validation (CV) error can be efficiently computed based on the solution path for one parameter learning problems. However, it is a challenge to obtain the global minimum CV error for CS-SVM based on one-dimensional solution path and traditional grid search, because CS-SVM is with two regularization parameters. In this paper, we propose a solution and error surfaces based CV approach (CV-SES). More specifically, we first compute a two-dimensional solution surface for CS-SVM based on a bi-parameter space partition algorithm, which can fit solutions of CS-SVM for all values of both regularization parameters. Then, we compute a two-dimensional validation error surface for each CV fold, which can fit validation errors of CS-SVM for all values of both regularization parameters. Finally, we obtain the CV error surface by superposing K validation error surfaces, which can find the global minimum CV error of CS-SVM. Experiments are conducted on seven datasets for cost sensitive learning and on four datasets for imbalanced learning. Experimental results not only show that our proposed CV-SES has a better generalization ability than CS-SVM with various hybrids between grid search and solution path methods, and than recent proposed cost-sensitive hinge loss SVM with three-dimensional grid search, but also show that CV-SES uses less running time.

  20. On the use of the observation-wise k-fold operation in PCA cross-validation

    NARCIS (Netherlands)

    Saccenti, E.; Camacho, J.

    2015-01-01

    Cross-validation (CV) is a common approach for determining the optimal number of components in a principal component analysis model. To guarantee the independence between model testing and calibration, the observationwise k-fold operation is commonly implemented in each cross-validation step. This

  1. Attempted development and cross-validation of predictive models of individual-level and organizational-level turnover of nuclear power operators

    International Nuclear Information System (INIS)

    Vasa-Sideris, S.J.

    1989-01-01

    Nuclear power accounts for 209% of the electric power generated in the U.S. by 107 nuclear plants which employ over 8,700 operators. Operator turnover is significant to utilities from the economic point of view since it costs almost three hundred thousand dollars to train and qualify one operator, and because turnover affects plant operability and therefore plant safety. The study purpose was to develop and cross-validate individual-level and organizational-level models of turnover of nuclear power plant operators. Data were obtained by questionnaires and from published data for 1983 and 1984 on a number of individual, organizational, and environmental predictors. Plants had been in operation for two or more years. Questionnaires were returned by 29 out of 50 plants on over 1600 operators. The objectives were to examine the reliability of the turnover criterion, to determine the classification accuracy of the multivariate predictive models and of categories of predictors (individual, organizational, and environmental) and to determine if a homology existed between the individual-level and organizational-level models. The method was to examine the shrinkage that occurred between foldback design (in which the predictive models were reapplied to the data used to develop them) and cross-validation. Results did not support the hypothesis objectives. Turnover data were accurate but not stable between the two years. No significant differences were detected between the low and high turnover groups at the organization or individual level in cross-validation. Lack of stability in the criterion, restriction of range, and small sample size at the organizational level were serious limitations of this study. The results did support the methods. Considerable shrinkage occurred between foldback and cross-validation of the models

  2. Prediction of cognitive and motor development in preterm children using exhaustive feature selection and cross-validation of near-term white matter microstructure.

    Science.gov (United States)

    Schadl, Kornél; Vassar, Rachel; Cahill-Rowley, Katelyn; Yeom, Kristin W; Stevenson, David K; Rose, Jessica

    2018-01-01

    Advanced neuroimaging and computational methods offer opportunities for more accurate prognosis. We hypothesized that near-term regional white matter (WM) microstructure, assessed on diffusion tensor imaging (DTI), using exhaustive feature selection with cross-validation would predict neurodevelopment in preterm children. Near-term MRI and DTI obtained at 36.6 ± 1.8 weeks postmenstrual age in 66 very-low-birth-weight preterm neonates were assessed. 60/66 had follow-up neurodevelopmental evaluation with Bayley Scales of Infant-Toddler Development, 3rd-edition (BSID-III) at 18-22 months. Linear models with exhaustive feature selection and leave-one-out cross-validation computed based on DTI identified sets of three brain regions most predictive of cognitive and motor function; logistic regression models were computed to classify high-risk infants scoring one standard deviation below mean. Cognitive impairment was predicted (100% sensitivity, 100% specificity; AUC = 1) by near-term right middle-temporal gyrus MD, right cingulate-cingulum MD, left caudate MD. Motor impairment was predicted (90% sensitivity, 86% specificity; AUC = 0.912) by left precuneus FA, right superior occipital gyrus MD, right hippocampus FA. Cognitive score variance was explained (29.6%, cross-validated Rˆ2 = 0.296) by left posterior-limb-of-internal-capsule MD, Genu RD, right fusiform gyrus AD. Motor score variance was explained (31.7%, cross-validated Rˆ2 = 0.317) by left posterior-limb-of-internal-capsule MD, right parahippocampal gyrus AD, right middle-temporal gyrus AD. Search in large DTI feature space more accurately identified neonatal neuroimaging correlates of neurodevelopment.

  3. The development and cross-validation of an MMPI typology of murderers.

    Science.gov (United States)

    Holcomb, W R; Adams, N A; Ponder, H M

    1985-06-01

    A sample of 80 male offenders charged with premeditated murder were divided into five personality types using MMPI scores. A hierarchical clustering procedure was used with a subsequent internal cross-validation analysis using a second sample of 80 premeditated murderers. A Discriminant Analysis resulted in a 96.25% correct classification of subjects from the second sample into the five types. Clinical data from a mental status interview schedule supported the external validity of these types. There were significant differences among the five types in hallucinations, disorientation, hostility, depression, and paranoid thinking. Both similarities and differences of the present typology with prior research was discussed. Additional research questions were suggested.

  4. Cross-validation of an employee safety climate model in Malaysia.

    Science.gov (United States)

    Bahari, Siti Fatimah; Clarke, Sharon

    2013-06-01

    Whilst substantial research has investigated the nature of safety climate, and its importance as a leading indicator of organisational safety, much of this research has been conducted with Western industrial samples. The current study focuses on the cross-validation of a safety climate model in the non-Western industrial context of Malaysian manufacturing. The first-order factorial validity of Cheyne et al.'s (1998) [Cheyne, A., Cox, S., Oliver, A., Tomas, J.M., 1998. Modelling safety climate in the prediction of levels of safety activity. Work and Stress, 12(3), 255-271] model was tested, using confirmatory factor analysis, in a Malaysian sample. Results showed that the model fit indices were below accepted levels, indicating that the original Cheyne et al. (1998) safety climate model was not supported. An alternative three-factor model was developed using exploratory factor analysis. Although these findings are not consistent with previously reported cross-validation studies, we argue that previous studies have focused on validation across Western samples, and that the current study demonstrates the need to take account of cultural factors in the development of safety climate models intended for use in non-Western contexts. The results have important implications for the transferability of existing safety climate models across cultures (for example, in global organisations) and highlight the need for future research to examine cross-cultural issues in relation to safety climate. Copyright © 2013 National Safety Council and Elsevier Ltd. All rights reserved.

  5. Diversity shrinkage: Cross-validating pareto-optimal weights to enhance diversity via hiring practices.

    Science.gov (United States)

    Song, Q Chelsea; Wee, Serena; Newman, Daniel A

    2017-12-01

    To reduce adverse impact potential and improve diversity outcomes from personnel selection, one promising technique is De Corte, Lievens, and Sackett's (2007) Pareto-optimal weighting strategy. De Corte et al.'s strategy has been demonstrated on (a) a composite of cognitive and noncognitive (e.g., personality) tests (De Corte, Lievens, & Sackett, 2008) and (b) a composite of specific cognitive ability subtests (Wee, Newman, & Joseph, 2014). Both studies illustrated how Pareto-weighting (in contrast to unit weighting) could lead to substantial improvement in diversity outcomes (i.e., diversity improvement), sometimes more than doubling the number of job offers for minority applicants. The current work addresses a key limitation of the technique-the possibility of shrinkage, especially diversity shrinkage, in the Pareto-optimal solutions. Using Monte Carlo simulations, sample size and predictor combinations were varied and cross-validated Pareto-optimal solutions were obtained. Although diversity shrinkage was sizable for a composite of cognitive and noncognitive predictors when sample size was at or below 500, diversity shrinkage was typically negligible for a composite of specific cognitive subtest predictors when sample size was at least 100. Diversity shrinkage was larger when the Pareto-optimal solution suggested substantial diversity improvement. When sample size was at least 100, cross-validated Pareto-optimal weights typically outperformed unit weights-suggesting that diversity improvement is often possible, despite diversity shrinkage. Implications for Pareto-optimal weighting, adverse impact, sample size of validation studies, and optimizing the diversity-job performance tradeoff are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  6. Cross-validation and hypothesis testing in neuroimaging: An irenic comment on the exchange between Friston and Lindquist et al.

    Science.gov (United States)

    Reiss, Philip T

    2015-08-01

    The "ten ironic rules for statistical reviewers" presented by Friston (2012) prompted a rebuttal by Lindquist et al. (2013), which was followed by a rejoinder by Friston (2013). A key issue left unresolved in this discussion is the use of cross-validation to test the significance of predictive analyses. This note discusses the role that cross-validation-based and related hypothesis tests have come to play in modern data analyses, in neuroimaging and other fields. It is shown that such tests need not be suboptimal and can fill otherwise-unmet inferential needs. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Test-retest reliability and cross validation of the functioning everyday with a wheelchair instrument.

    Science.gov (United States)

    Mills, Tamara L; Holm, Margo B; Schmeler, Mark

    2007-01-01

    The purpose of this study was to establish the test-retest reliability and content validity of an outcomes tool designed to measure the effectiveness of seating-mobility interventions on the functional performance of individuals who use wheelchairs or scooters as their primary seating-mobility device. The instrument, Functioning Everyday With a Wheelchair (FEW), is a questionnaire designed to measure perceived user function related to wheelchair/scooter use. Using consumer-generated items, FEW Beta Version 1.0 was developed and test-retest reliability was established. Cross-validation of FEW Beta Version 1.0 was then carried out with five samples of seating-mobility users to establish content validity. Based on the content validity study, FEW Version 2.0 was developed and administered to seating-mobility consumers to examine its test-retest reliability. FEW Beta Version 1.0 yielded an intraclass correlation coefficient (ICC) Model (3,k) of .92, p content validity results revealed that FEW Beta Version 1.0 captured 55% of seating-mobility goals reported by consumers across five samples. FEW Version 2.0 yielded ICC(3,k) = .86, p content validity of FEW Version 2.0 was confirmed. FEW Beta Version 1.0 and FEW Version 2.0 were highly stable in their measurement of participants' seating-mobility goals over a 1-week interval.

  8. Sound quality indicators for urban places in Paris cross-validated by Milan data.

    Science.gov (United States)

    Ricciardi, Paola; Delaitre, Pauline; Lavandier, Catherine; Torchia, Francesca; Aumond, Pierre

    2015-10-01

    A specific smartphone application was developed to collect perceptive and acoustic data in Paris. About 3400 questionnaires were analyzed, regarding the global sound environment characterization, the perceived loudness of some emergent sources and the presence time ratio of sources that do not emerge from the background. Sound pressure level was recorded each second from the mobile phone's microphone during a 10-min period. The aim of this study is to propose indicators of urban sound quality based on linear regressions with perceptive variables. A cross validation of the quality models extracted from Paris data was carried out by conducting the same survey in Milan. The proposed sound quality general model is correlated with the real perceived sound quality (72%). Another model without visual amenity and familiarity is 58% correlated with perceived sound quality. In order to improve the sound quality indicator, a site classification was performed by Kohonen's Artificial Neural Network algorithm, and seven specific class models were developed. These specific models attribute more importance on source events and are slightly closer to the individual data than the global model. In general, the Parisian models underestimate the sound quality of Milan environments assessed by Italian people.

  9. Application of Monte Carlo cross-validation to identify pathway cross-talk in neonatal sepsis.

    Science.gov (United States)

    Zhang, Yuxia; Liu, Cui; Wang, Jingna; Li, Xingxia

    2018-03-01

    To explore genetic pathway cross-talk in neonates with sepsis, an integrated approach was used in this paper. To explore the potential relationships between differently expressed genes between normal uninfected neonates and neonates with sepsis and pathways, genetic profiling and biologic signaling pathway were first integrated. For different pathways, the score was obtained based upon the genetic expression by quantitatively analyzing the pathway cross-talk. The paired pathways with high cross-talk were identified by random forest classification. The purpose of the work was to find the best pairs of pathways able to discriminate sepsis samples versus normal samples. The results found 10 pairs of pathways, which were probably able to discriminate neonates with sepsis versus normal uninfected neonates. Among them, the best two paired pathways were identified according to analysis of extensive literature. Impact statement To find the best pairs of pathways able to discriminate sepsis samples versus normal samples, an RF classifier, the DS obtained by DEGs of paired pathways significantly associated, and Monte Carlo cross-validation were applied in this paper. Ten pairs of pathways were probably able to discriminate neonates with sepsis versus normal uninfected neonates. Among them, the best two paired pathways ((7) IL-6 Signaling and Phospholipase C Signaling (PLC); (8) Glucocorticoid Receptor (GR) Signaling and Dendritic Cell Maturation) were identified according to analysis of extensive literature.

  10. Assessing behavioural changes in ALS: cross-validation of ALS-specific measures.

    Science.gov (United States)

    Pinto-Grau, Marta; Costello, Emmet; O'Connor, Sarah; Elamin, Marwa; Burke, Tom; Heverin, Mark; Pender, Niall; Hardiman, Orla

    2017-07-01

    The Beaumont Behavioural Inventory (BBI) is a behavioural proxy report for the assessment of behavioural changes in ALS. This tool has been validated against the FrSBe, a non-ALS-specific behavioural assessment, and further comparison of the BBI against a disease-specific tool was considered. This study cross-validates the BBI against the ALS-FTD-Q. Sixty ALS patients, 8% also meeting criteria for FTD, were recruited. All patients were evaluated using the BBI and the ALS-FTD-Q, completed by a carer. Correlational analysis was performed to assess construct validity. Precision, sensitivity, specificity, and overall accuracy of the BBI when compared to the ALS-FTD-Q, were obtained. The mean score of the whole sample on the BBI was 11.45 ± 13.06. ALS-FTD patients scored significantly higher than non-demented ALS patients (31.6 ± 14.64, 9.62 ± 11.38; p ALS-FTD-Q was observed (r = 0.807, p ALS-FTD-Q. Good construct validity has been further confirmed when the BBI is compared to an ALS-specific tool. Furthermore, the BBI is a more comprehensive behavioural assessment for ALS, as it measures the whole behavioural spectrum in this condition.

  11. A Cross-Validation Study of the Kirton Adaption-Innovation Inventory in Three Research and Development Organizations.

    Science.gov (United States)

    Keller, Robert T.; Holland, Winford E.

    1979-01-01

    A cross-validation study of the Kirton Adaption-Innovation Inventory (KAI) was conducted with 256 professional employees from three applied research and development organizations. The KAI correlated well with both direct and indirect measures of innovativeness in all three organizations. (Author/MH)

  12. Cross-validation of the Student Perceptions of Team-Based Learning Scale in the United States

    Directory of Open Access Journals (Sweden)

    Donald H. Lein

    2017-06-01

    Full Text Available Purpose The purpose of this study was to cross-validate the factor structure of the previously developed Student Perceptions of Team-Based Learning (TBL Scale among students in an entry-level doctor of physical therapy (DPT program in the United States. Methods Toward the end of the semester in 2 patient/client management courses taught using TBL, 115 DPT students completed the Student Perceptions of TBL Scale, with a response rate of 87%. Principal component analysis (PCA and confirmatory factor analysis (CFA were conducted to replicate and confirm the underlying factor structure of the scale. Results Based on the PCA for the validation sample, the original 2-factor structure (preference for TBL and preference for teamwork of the Student Perceptions of TBL Scale was replicated. The overall goodness-of-fit indices from the CFA suggested that the original 2-factor structure for the 15 items of the scale demonstrated a good model fit (comparative fit index, 0.95; non-normed fit index/Tucker-Lewis index, 0.93; root mean square error of approximation, 0.06; and standardized root mean square residual, 0.07. The 2 factors demonstrated high internal consistency (alpha= 0.83 and 0.88, respectively. DPT students taught using TBL viewed the factor of preference for teamwork more favorably than preference for TBL. Conclusion Our findings provide evidence supporting the replicability of the internal structure of the Student Perceptions of TBL Scale when assessing perceptions of TBL among DPT students in patient/client management courses.

  13. Vascular Adaptation: Pattern Formation and Cross Validation between an Agent Based Model and a Dynamical System.

    Science.gov (United States)

    Garbey, Marc; Casarin, Stefano; Berceli, Scott A

    2017-09-21

    Myocardial infarction is the global leading cause of mortality (Go et al., 2014). Coronary artery occlusion is its main etiology and it is commonly treated by Coronary Artery Bypass Graft (CABG) surgery (Wilson et al, 2007). The long-term outcome remains unsatisfactory (Benedetto, 2016) as the graft faces the phenomenon of restenosis during the post-surgery, which consists of re-occlusion of the lumen and usually requires secondary intervention even within one year after the initial surgery (Harskamp, 2013). In this work, we propose an extensive study of the restenosis phenomenon by implementing two mathematical models previously developed by our group: a heuristic Dynamical System (DS) (Garbey and Berceli, 2013), and a stochastic Agent Based Model (ABM) (Garbey et al., 2015). With an extensive use of the ABM, we retrieved the pattern formations of the cellular events that mainly lead the restenosis, especially focusing on mitosis in intima, caused by alteration in shear stress, and mitosis in media, fostered by alteration in wall tension. A deep understanding of the elements at the base of the restenosis is indeed crucial in order to improve the final outcome of vein graft bypass. We also turned the ABM closer to the physiological reality by abating its original assumption of circumferential symmetry. This allowed us to finely replicate the trigger event of the restenosis, i.e. the loss of the endothelium in the early stage of the post-surgical follow up (Roubos et al., 1995) and to simulate the encroachment of the lumen in a fashion aligned with histological evidences (Owens et al., 2015). Finally, we cross-validated the two models by creating an accurate matching procedure. In this way we added the degree of accuracy given by the ABM to a simplified model (DS) that can serve as powerful predictive tool for the clinic. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Evidence-based cross validation for acoustic power transmission for a novel treatment system.

    Science.gov (United States)

    Mihcin, Senay; Strehlow, Jan; Demedts, Daniel; Schwenke, Michael; Levy, Yoav; Melzer, Andreas

    2017-06-01

    The novel Trans-Fusimo Treatment System (TTS) is designed to control Magnetic Resonance guided Focused Ultrasound (MRgFUS) therapy to ablate liver tumours under respiratory motion. It is crucial to deliver the acoustic power within tolerance limits for effective liver tumour treatment via MRgFUS. Before application in a clinical setting, evidence of reproducibility and reliability is a must for safe practice. The TTS software delivers the acoustic power via ExAblate-2100 Conformal Bone System (CBS) transducer. A built-in quality assurance application was developed to measure the force values, using a novel protocol to measure the efficiency for the electrical power values of 100 and 150W for 6s of sonication. This procedure was repeated 30 times by two independent users against the clinically approved ExAblate-2100 CBS for cross-validation. Both systems proved to deliver the power within the accepted efficiency levels (70-90%). Two sample t-tests were used to assess the differences in force values between the ExAblate-2100 CBS and the TTS (p > 0.05). Bland-Altman plots were used to demonstrate the limits of agreement between the two systems falling within the 10% limits of agreement. Two sample t-tests indicated that TTS does not have user dependency (p > 0.05). The TTS software proved to deliver the acoustic power without exceeding the safety levels. Results provide evidence as a part of ISO13485 regulations for CE marking purposes. The developed methodology could be utilised as a part of quality assurance system in clinical settings; when the TTS is used in clinical practice.

  15. Body fat measurement by bioelectrical impedance and air displacement plethysmography: a cross-validation study to design bioelectrical impedance equations in Mexican adults

    Directory of Open Access Journals (Sweden)

    Valencia Mauro E

    2007-08-01

    Full Text Available Abstract Background The study of body composition in specific populations by techniques such as bio-impedance analysis (BIA requires validation based on standard reference methods. The aim of this study was to develop and cross-validate a predictive equation for bioelectrical impedance using air displacement plethysmography (ADP as standard method to measure body composition in Mexican adult men and women. Methods This study included 155 male and female subjects from northern Mexico, 20–50 years of age, from low, middle, and upper income levels. Body composition was measured by ADP. Body weight (BW, kg and height (Ht, cm were obtained by standard anthropometric techniques. Resistance, R (ohms and reactance, Xc (ohms were also measured. A random-split method was used to obtain two samples: one was used to derive the equation by the "all possible regressions" procedure and was cross-validated in the other sample to test predicted versus measured values of fat-free mass (FFM. Results and Discussion The final model was: FFM (kg = 0.7374 * (Ht2 /R + 0.1763 * (BW - 0.1773 * (Age + 0.1198 * (Xc - 2.4658. R2 was 0.97; the square root of the mean square error (SRMSE was 1.99 kg, and the pure error (PE was 2.96. There was no difference between FFM predicted by the new equation (48.57 ± 10.9 kg and that measured by ADP (48.43 ± 11.3 kg. The new equation did not differ from the line of identity, had a high R2 and a low SRMSE, and showed no significant bias (0.87 ± 2.84 kg. Conclusion The new bioelectrical impedance equation based on the two-compartment model (2C was accurate, precise, and free of bias. This equation can be used to assess body composition and nutritional status in populations similar in anthropometric and physical characteristics to this sample.

  16. Cross-Validation of a Glucose-Insulin-Glucagon Pharmacodynamics Model for Simulation using Data from Patients with Type 1 Diabetes

    DEFF Research Database (Denmark)

    Wendt, Sabrina Lyngbye; Ranjan, Ajenthen; Møller, Jan Kloppenborg

    2017-01-01

    three PD model test fits in each of the seven subjects. Thus, we successfully validated the PD model by leave-one-out cross-validation in seven out of eight T1D patients. Conclusions: The PD model accurately simulates glucose excursions based on plasma insulin and glucagon concentrations. The reported...... for concentrations of glucagon, insulin, and glucose. We fitted pharmacokinetic (PK) models to insulin and glucagon data using maximum likelihood and maximum a posteriori estimation methods. Similarly, we fitted a pharmacodynamic (PD) model to glucose data. The PD model included multiplicative effects of insulin...... and glucagon on EGP. Bias and precision of PD model test fits were assessed by mean predictive error (MPE) and mean absolute predictive error (MAPE). Results: Assuming constant variables in a subject across nonoutlier visits and using thresholds of ±15% MPE and 20% MAPE, we accepted at least one and at most...

  17. Cross-validation analysis for genetic evaluation models for ranking in endurance horses.

    Science.gov (United States)

    García-Ballesteros, S; Varona, L; Valera, M; Gutiérrez, J P; Cervantes, I

    2018-01-01

    Ranking trait was used as a selection criterion for competition horses to estimate racing performance. In the literature the most common approaches to estimate breeding values are the linear or threshold statistical models. However, recent studies have shown that a Thurstonian approach was able to fix the race effect (competitive level of the horses that participate in the same race), thus suggesting a better prediction accuracy of breeding values for ranking trait. The aim of this study was to compare the predictability of linear, threshold and Thurstonian approaches for genetic evaluation of ranking in endurance horses. For this purpose, eight genetic models were used for each approach with different combinations of random effects: rider, rider-horse interaction and environmental permanent effect. All genetic models included gender, age and race as systematic effects. The database that was used contained 4065 ranking records from 966 horses and that for the pedigree contained 8733 animals (47% Arabian horses), with an estimated heritability around 0.10 for the ranking trait. The prediction ability of the models for racing performance was evaluated using a cross-validation approach. The average correlation between real and predicted performances across genetic models was around 0.25 for threshold, 0.58 for linear and 0.60 for Thurstonian approaches. Although no significant differences were found between models within approaches, the best genetic model included: the rider and rider-horse random effects for threshold, only rider and environmental permanent effects for linear approach and all random effects for Thurstonian approach. The absolute correlations of predicted breeding values among models were higher between threshold and Thurstonian: 0.90, 0.91 and 0.88 for all animals, top 20% and top 5% best animals. For rank correlations these figures were 0.85, 0.84 and 0.86. The lower values were those between linear and threshold approaches (0.65, 0.62 and 0.51). In

  18. Cultural Orientations Framework (COF) Assessment Questionnaire in Cross-Cultural Coaching: A Cross-Validation with Wave Focus Styles

    OpenAIRE

    Rojon, C; McDowall, A

    2010-01-01

    This paper outlines a cross-validation of the Cultural Orientations Framework assessment questionnaire\\ud (COF, Rosinski, 2007; a new tool designed for cross-cultural coaching) with the Saville Consulting\\ud Wave Focus Styles questionnaire (Saville Consulting, 2006; an existing validated measure of\\ud occupational personality), using data from UK and German participants (N = 222). The convergent and\\ud divergent validity of the questionnaire was adequate. Contrary to previous findings which u...

  19. Screening for postdeployment conditions: development and cross-validation of an embedded validity scale in the neurobehavioral symptom inventory.

    Science.gov (United States)

    Vanderploeg, Rodney D; Cooper, Douglas B; Belanger, Heather G; Donnell, Alison J; Kennedy, Jan E; Hopewell, Clifford A; Scott, Steven G

    2014-01-01

    To develop and cross-validate internal validity scales for the Neurobehavioral Symptom Inventory (NSI). Four existing data sets were used: (1) outpatient clinical traumatic brain injury (TBI)/neurorehabilitation database from a military site (n = 403), (2) National Department of Veterans Affairs TBI evaluation database (n = 48 175), (3) Florida National Guard nonclinical TBI survey database (n = 3098), and (4) a cross-validation outpatient clinical TBI/neurorehabilitation database combined across 2 military medical centers (n = 206). Secondary analysis of existing cohort data to develop (study 1) and cross-validate (study 2) internal validity scales for the NSI. The NSI, Mild Brain Injury Atypical Symptoms, and Personality Assessment Inventory scores. Study 1: Three NSI validity scales were developed, composed of 5 unusual items (Negative Impression Management [NIM5]), 6 low-frequency items (LOW6), and the combination of 10 nonoverlapping items (Validity-10). Cut scores maximizing sensitivity and specificity on these measures were determined, using a Mild Brain Injury Atypical Symptoms score of 8 or more as the criterion for invalidity. Study 2: The same validity scale cut scores again resulted in the highest classification accuracy and optimal balance between sensitivity and specificity in the cross-validation sample, using a Personality Assessment Inventory Negative Impression Management scale with a T score of 75 or higher as the criterion for invalidity. The NSI is widely used in the Department of Defense and Veterans Affairs as a symptom-severity assessment following TBI, but is subject to symptom overreporting or exaggeration. This study developed embedded NSI validity scales to facilitate the detection of invalid response styles. The NSI Validity-10 scale appears to hold considerable promise for validity assessment when the NSI is used as a population-screening tool.

  20. Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation

    Directory of Open Access Journals (Sweden)

    Saatchi Mahdi

    2011-11-01

    Full Text Available Abstract Background Genomic selection is a recently developed technology that is beginning to revolutionize animal breeding. The objective of this study was to estimate marker effects to derive prediction equations for direct genomic values for 16 routinely recorded traits of American Angus beef cattle and quantify corresponding accuracies of prediction. Methods Deregressed estimated breeding values were used as observations in a weighted analysis to derive direct genomic values for 3570 sires genotyped using the Illumina BovineSNP50 BeadChip. These bulls were clustered into five groups using K-means clustering on pedigree estimates of additive genetic relationships between animals, with the aim of increasing within-group and decreasing between-group relationships. All five combinations of four groups were used for model training, with cross-validation performed in the group not used in training. Bivariate animal models were used for each trait to estimate the genetic correlation between deregressed estimated breeding values and direct genomic values. Results Accuracies of direct genomic values ranged from 0.22 to 0.69 for the studied traits, with an average of 0.44. Predictions were more accurate when animals within the validation group were more closely related to animals in the training set. When training and validation sets were formed by random allocation, the accuracies of direct genomic values ranged from 0.38 to 0.85, with an average of 0.65, reflecting the greater relationship between animals in training and validation. The accuracies of direct genomic values obtained from training on older animals and validating in younger animals were intermediate to the accuracies obtained from K-means clustering and random clustering for most traits. The genetic correlation between deregressed estimated breeding values and direct genomic values ranged from 0.15 to 0.80 for the traits studied. Conclusions These results suggest that genomic estimates

  1. Exact Cross-Validation for kNN and applications to passive and active learning in classification

    OpenAIRE

    Célisse, Alain; Mary-Huard, Tristan

    2011-01-01

    In the binary classification framework, a closed form expression of the cross-validation Leave-p-Out (LpO) risk estimator for the k Nearest Neighbor algorithm (kNN) is derived. It is first used to study the LpO risk minimization strategy for choosing k in the passive learning setting. The impact of p on the choice of k and the LpO estimation of the risk are inferred. In the active learning setting, a procedure is proposed that selects new examples using a LpO committee of kNN classifiers. The...

  2. Simultaneous estimation of cross-validation errors in least squares collocation applied for statistical testing and evaluation of the noise variance components

    Science.gov (United States)

    Behnabian, Behzad; Mashhadi Hossainali, Masoud; Malekzadeh, Ahad

    2018-02-01

    The cross-validation technique is a popular method to assess and improve the quality of prediction by least squares collocation (LSC). We present a formula for direct estimation of the vector of cross-validation errors (CVEs) in LSC which is much faster than element-wise CVE computation. We show that a quadratic form of CVEs follows Chi-squared distribution. Furthermore, a posteriori noise variance factor is derived by the quadratic form of CVEs. In order to detect blunders in the observations, estimated standardized CVE is proposed as the test statistic which can be applied when noise variances are known or unknown. We use LSC together with the methods proposed in this research for interpolation of crustal subsidence in the northern coast of the Gulf of Mexico. The results show that after detection and removing outliers, the root mean square (RMS) of CVEs and estimated noise standard deviation are reduced about 51 and 59%, respectively. In addition, RMS of LSC prediction error at data points and RMS of estimated noise of observations are decreased by 39 and 67%, respectively. However, RMS of LSC prediction error on a regular grid of interpolation points covering the area is only reduced about 4% which is a consequence of sparse distribution of data points for this case study. The influence of gross errors on LSC prediction results is also investigated by lower cutoff CVEs. It is indicated that after elimination of outliers, RMS of this type of errors is also reduced by 19.5% for a 5 km radius of vicinity. We propose a method using standardized CVEs for classification of dataset into three groups with presumed different noise variances. The noise variance components for each of the groups are estimated using restricted maximum-likelihood method via Fisher scoring technique. Finally, LSC assessment measures were computed for the estimated heterogeneous noise variance model and compared with those of the homogeneous model. The advantage of the proposed method is the

  3. Genomic prediction using different estimation methodology, blending and cross-validation techniques for growth traits and visual scores in Hereford and Braford cattle.

    Science.gov (United States)

    Campos, G S; Reimann, F A; Cardoso, L L; Ferreira, C E R; Junqueira, V S; Schmidt, P I; Braccini Neto, J; Yokoo, M J I; Sollero, B P; Boligon, A A; Cardoso, F F

    2018-05-07

    The objective of the present study was to evaluate the accuracy and bias of direct and blended genomic predictions using different methods and cross-validation techniques for growth traits (weight and weight gains) and visual scores (conformation, precocity, muscling and size) obtained at weaning and at yearling in Hereford and Braford breeds. Phenotypic data contained 126,290 animals belonging to the Delta G Connection genetic improvement program, and a set of 3,545 animals genotyped with the 50K chip and 131 sires with the 777K. After quality control, 41,045 markers remained for all animals. An animal model was used to estimate (co)variances components and to predict breeding values, which were later used to calculate the deregressed estimated breeding values (DEBV). Animals with genotype and phenotype for the traits studied were divided into four or five groups by random and k-means clustering cross-validation strategies. The values of accuracy of the direct genomic values (DGV) were moderate to high magnitude for at weaning and at yearling traits, ranging from 0.19 to 0.45 for the k-means and 0.23 to 0.78 for random clustering among all traits. The greatest gain in relation to the pedigree BLUP (PBLUP) was 9.5% with the BayesB method with both the k-means and the random clustering. Blended genomic value accuracies ranged from 0.19 to 0.56 for k-means and from 0.21 to 0.82 for random clustering. The analyzes using the historical pedigree and phenotypes contributed additional information to calculate the GEBV and in general, the largest gains were for the single-step (ssGBLUP) method in bivariate analyses with a mean increase of 43.00% among all traits measured at weaning and of 46.27% for those evaluated at yearling. The accuracy values for the marker effects estimation methods were lower for k-means clustering, indicating that the training set relationship to the selection candidates is a major factor affecting accuracy of genomic predictions. The gains in

  4. Cross-validation of the factorial structure of the Neighborhood Environment Walkability Scale (NEWS and its abbreviated form (NEWS-A

    Directory of Open Access Journals (Sweden)

    Cerin Ester

    2009-06-01

    Full Text Available Abstract Background The Neighborhood Environment Walkability Scale (NEWS and its abbreviated form (NEWS-A assess perceived environmental attributes believed to influence physical activity. A multilevel confirmatory factor analysis (MCFA conducted on a sample from Seattle, WA showed that, at the respondent level, the factor-analyzable items of the NEWS and NEWS-A measured 11 and 10 constructs of perceived neighborhood environment, respectively. At the census blockgroup (used by the US Census Bureau as a subunit of census tracts level, the MCFA yielded five factors for both NEWS and NEWS-A. The aim of this study was to cross-validate the individual- and blockgroup-level measurement models of the NEWS and NEWS-A in a geographical location and population different from those used in the original validation study. Methods A sample of 912 adults was recruited from 16 selected neighborhoods (116 census blockgroups in the Baltimore, MD region. Neighborhoods were stratified according to their socio-economic status and transport-related walkability level measured using Geographic Information Systems. Participants self-completed the NEWS. MCFA was used to cross-validate the individual- and blockgroup-level measurement models of the NEWS and NEWS-A. Results The data provided sufficient support for the factorial validity of the original individual-level measurement models, which consisted of 11 (NEWS and 10 (NEWS-A correlated factors. The original blockgroup-level measurement model of the NEWS and NEWS-A showed poor fit to the data and required substantial modifications. These included the combining of aspects of building aesthetics with safety from crime into one factor; the separation of natural aesthetics and building aesthetics into two factors; and for the NEWS-A, the separation of presence of sidewalks/walking routes from other infrastructure for walking. Conclusion This study provided support for the generalizability of the individual

  5. Sediment transport patterns in the San Francisco Bay Coastal System from cross-validation of bedform asymmetry and modeled residual flux

    Science.gov (United States)

    Barnard, Patrick L.; Erikson, Li H.; Elias, Edwin P.L.; Dartnell, Peter; Barnard, P.L.; Jaffee, B.E.; Schoellhamer, D.H.

    2013-01-01

    The morphology of ~ 45,000 bedforms from 13 multibeam bathymetry surveys was used as a proxy for identifying net bedload sediment transport directions and pathways throughout the San Francisco Bay estuary and adjacent outer coast. The spatially-averaged shape asymmetry of the bedforms reveals distinct pathways of ebb and flood transport. Additionally, the region-wide, ebb-oriented asymmetry of 5% suggests net seaward-directed transport within the estuarine-coastal system, with significant seaward asymmetry at the mouth of San Francisco Bay (11%), through the northern reaches of the Bay (7–8%), and among the largest bedforms (21% for λ > 50 m). This general indication for the net transport of sand to the open coast strongly suggests that anthropogenic removal of sediment from the estuary, particularly along clearly defined seaward transport pathways, will limit the supply of sand to chronically eroding, open-coast beaches. The bedform asymmetry measurements significantly agree (up to ~ 76%) with modeled annual residual transport directions derived from a hydrodynamically-calibrated numerical model, and the orientation of adjacent, flow-sculpted seafloor features such as mega-flute structures, providing a comprehensive validation of the technique. The methods described in this paper to determine well-defined, cross-validated sediment transport pathways can be applied to estuarine-coastal systems globally where bedforms are present. The results can inform and improve regional sediment management practices to more efficiently utilize often limited sediment resources and mitigate current and future sediment supply-related impacts.

  6. Continuously revised assurance cases with stakeholders’ cross-validation: a DEOS experience

    Directory of Open Access Journals (Sweden)

    Kimio Kuramitsu

    2016-12-01

    Full Text Available Recently, assurance cases have received much attention in the field of software-based computer systems and IT services. However, software changes very often, and there are no strong regulations for software. These facts are two main challenges to be addressed in the development of software assurance cases. We propose a method of developing assurance cases by means of continuous revision at every stage of the system life cycle, including in operation and service recovery in failure cases. Instead of a regulator, dependability arguments are validated by multiple stakeholders competing with each other. This paper reported our experience with the proposed method in the case of Aspen education service. The case study demonstrates that continuous revisions enable stakeholders to share dependability problems across software life cycle stages, which will lead to the long-term improvement of service dependability.

  7. Improved GRACE regional mass balance estimates of the Greenland ice sheet cross-validated with the input-output method

    NARCIS (Netherlands)

    Xu, Zheng; Schrama, Ernst J. O.; van der Wal, Wouter; van den Broeke, Michiel; Enderlin, Ellyn M.

    2016-01-01

    In this study, we use satellite gravimetry data from the Gravity Recovery and Climate Experiment (GRACE) to estimate regional mass change of the Greenland ice sheet (GrIS) and neighboring glaciated regions using a least squares inversion approach. We also consider results from the input–output

  8. Improved GRACE regional mass balance estimates of the Greenland ice sheet cross-validated with the input–output method

    NARCIS (Netherlands)

    Xu, Z.; Schrama, E.J.O.; van der Wal, W.; van den Broeke, MR; Enderlin, EM

    2016-01-01

    In this study, we use satellite gravimetry data from the Gravity Recovery and Climate Experiment (GRACE) to estimate regional mass change of the Greenland ice sheet (GrIS) and neighboring glaciated regions using a least squares inversion approach. We also consider results from the input–output

  9. Fast CSF MRI for brain segmentation; Cross-validation by comparison with 3D T1-based brain segmentation methods

    NARCIS (Netherlands)

    van der Kleij, Lisa A; de Bresser, Jeroen; Hendrikse, Jeroen; Siero, Jeroen C W; Petersen, Esben T; De Vis, Jill B

    2018-01-01

    OBJECTIVE: In previous work we have developed a fast sequence that focusses on cerebrospinal fluid (CSF) based on the long T2 of CSF. By processing the data obtained with this CSF MRI sequence, brain parenchymal volume (BPV) and intracranial volume (ICV) can be automatically obtained. The aim of

  10. Cerebral Blood Flow Measurement Using fMRI and PET: A Cross-Validation Study

    Directory of Open Access Journals (Sweden)

    Jean J. Chen

    2008-01-01

    Full Text Available An important aspect of functional magnetic resonance imaging (fMRI is the study of brain hemodynamics, and MR arterial spin labeling (ASL perfusion imaging has gained wide acceptance as a robust and noninvasive technique. However, the cerebral blood flow (CBF measurements obtained with ASL fMRI have not been fully validated, particularly during global CBF modulations. We present a comparison of cerebral blood flow changes (ΔCBF measured using a flow-sensitive alternating inversion recovery (FAIR ASL perfusion method to those obtained using H2O15 PET, which is the current gold standard for in vivo imaging of CBF. To study regional and global CBF changes, a group of 10 healthy volunteers were imaged under identical experimental conditions during presentation of 5 levels of visual stimulation and one level of hypercapnia. The CBF changes were compared using 3 types of region-of-interest (ROI masks. FAIR measurements of CBF changes were found to be slightly lower than those measured with PET (average ΔCBF of 21.5±8.2% for FAIR versus 28.2±12.8% for PET at maximum stimulation intensity. Nonetheless, there was a strong correlation between measurements of the two modalities. Finally, a t-test comparison of the slopes of the linear fits of PET versus ASL ΔCBF for all 3 ROI types indicated no significant difference from unity (P>.05.

  11. Cross Validation on the Equality of Uav-Based and Contour-Based Dems

    Science.gov (United States)

    Ma, R.; Xu, Z.; Wu, L.; Liu, S.

    2018-04-01

    Unmanned Aerial Vehicles (UAV) have been widely used for Digital Elevation Model (DEM) generation in geographic applications. This paper proposes a novel framework of generating DEM from UAV images. It starts with the generation of the point clouds by image matching, where the flight control data are used as reference for searching for the corresponding images, leading to a significant time saving. Besides, a set of ground control points (GCP) obtained from field surveying are used to transform the point clouds to the user's coordinate system. Following that, we use a multi-feature based supervised classification method for discriminating non-ground points from ground ones. In the end, we generate DEM by constructing triangular irregular networks and rasterization. The experiments are conducted in the east of Jilin province in China, which has been suffered from soil erosion for several years. The quality of UAV based DEM (UAV-DEM) is compared with that generated from contour interpolation (Contour-DEM). The comparison shows a higher resolution, as well as higher accuracy of UAV-DEMs, which contains more geographic information. In addition, the RMSE errors of the UAV-DEMs generated from point clouds with and without GCPs are ±0.5 m and ±20 m, respectively.

  12. Psychophysiological Associations between Chronic Tinnitus and Sleep: A Cross Validation of Tinnitus and Insomnia Questionnaires

    Directory of Open Access Journals (Sweden)

    Martin Schecklmann

    2015-01-01

    Full Text Available Background. The aim of the present study was to assess the prevalence of insomnia in chronic tinnitus and the association of tinnitus distress and sleep disturbance. Methods. We retrospectively analysed data of 182 patients with chronic tinnitus who completed the Tinnitus Questionnaire (TQ and the Regensburg Insomnia Scale (RIS. Descriptive comparisons with the validation sample of the RIS including exclusively patients with primary/psychophysiological insomnia, correlation analyses of the RIS with TQ scales, and principal component analyses (PCA in the tinnitus sample were performed. TQ total score was corrected for the TQ sleep items. Results. Prevalence of insomnia was high in tinnitus patients (76% and tinnitus distress correlated with sleep disturbance (r=0.558. TQ sleep subscore correlated with the RIS sum score (r=0.690. PCA with all TQ and RIS items showed one sleep factor consisting of all RIS and the TQ sleep items. PCA with only TQ sleep and RIS items showed sleep- and tinnitus-specific factors. The sleep factors (only RIS items were sleep depth and fearful focusing. The TQ sleep items represented tinnitus-related sleep problems. Discussion. Chronic tinnitus and primary insomnia are highly related and might share similar psychological and neurophysiological mechanisms leading to impaired sleep quality.

  13. Evidence cross-validation and Bayesian inference of MAST plasma equilibria

    Energy Technology Data Exchange (ETDEWEB)

    Nessi, G. T. von; Hole, M. J. [Research School of Physical Sciences and Engineering, Australian National University, Canberra ACT 0200 (Australia); Svensson, J. [Max-Planck-Institut fuer Plasmaphysik, D-17491 Greifswald (Germany); Appel, L. [EURATOM/CCFE Fusion Association, Culham Science Centre, Abingdon, Oxon OX14 3DB (United Kingdom)

    2012-01-15

    In this paper, current profiles for plasma discharges on the mega-ampere spherical tokamak are directly calculated from pickup coil, flux loop, and motional-Stark effect observations via methods based in the statistical theory of Bayesian analysis. By representing toroidal plasma current as a series of axisymmetric current beams with rectangular cross-section and inferring the current for each one of these beams, flux-surface geometry and q-profiles are subsequently calculated by elementary application of Biot-Savart's law. The use of this plasma model in the context of Bayesian analysis was pioneered by Svensson and Werner on the joint-European tokamak [Svensson and Werner,Plasma Phys. Controlled Fusion 50(8), 085002 (2008)]. In this framework, linear forward models are used to generate diagnostic predictions, and the probability distribution for the currents in the collection of plasma beams was subsequently calculated directly via application of Bayes' formula. In this work, we introduce a new diagnostic technique to identify and remove outlier observations associated with diagnostics falling out of calibration or suffering from an unidentified malfunction. These modifications enable a good agreement between Bayesian inference of the last-closed flux-surface with other corroborating data, such as that from force balance considerations using EFIT++[Appel et al., ''A unified approach to equilibrium reconstruction'' Proceedings of the 33rd EPS Conference on Plasma Physics (Rome, Italy, 2006)]. In addition, this analysis also yields errors on the plasma current profile and flux-surface geometry as well as directly predicting the Shafranov shift of the plasma core.

  14. Evidence cross-validation and Bayesian inference of MAST plasma equilibria

    International Nuclear Information System (INIS)

    Nessi, G. T. von; Hole, M. J.; Svensson, J.; Appel, L.

    2012-01-01

    In this paper, current profiles for plasma discharges on the mega-ampere spherical tokamak are directly calculated from pickup coil, flux loop, and motional-Stark effect observations via methods based in the statistical theory of Bayesian analysis. By representing toroidal plasma current as a series of axisymmetric current beams with rectangular cross-section and inferring the current for each one of these beams, flux-surface geometry and q-profiles are subsequently calculated by elementary application of Biot-Savart's law. The use of this plasma model in the context of Bayesian analysis was pioneered by Svensson and Werner on the joint-European tokamak [Svensson and Werner,Plasma Phys. Controlled Fusion 50(8), 085002 (2008)]. In this framework, linear forward models are used to generate diagnostic predictions, and the probability distribution for the currents in the collection of plasma beams was subsequently calculated directly via application of Bayes' formula. In this work, we introduce a new diagnostic technique to identify and remove outlier observations associated with diagnostics falling out of calibration or suffering from an unidentified malfunction. These modifications enable a good agreement between Bayesian inference of the last-closed flux-surface with other corroborating data, such as that from force balance considerations using EFIT++[Appel et al., ''A unified approach to equilibrium reconstruction'' Proceedings of the 33rd EPS Conference on Plasma Physics (Rome, Italy, 2006)]. In addition, this analysis also yields errors on the plasma current profile and flux-surface geometry as well as directly predicting the Shafranov shift of the plasma core.

  15. A Procedure for Identification of Appropriate State Space and ARIMA Models Based on Time-Series Cross-Validation

    Directory of Open Access Journals (Sweden)

    Patrícia Ramos

    2016-11-01

    Full Text Available In this work, a cross-validation procedure is used to identify an appropriate Autoregressive Integrated Moving Average model and an appropriate state space model for a time series. A minimum size for the training set is specified. The procedure is based on one-step forecasts and uses different training sets, each containing one more observation than the previous one. All possible state space models and all ARIMA models where the orders are allowed to range reasonably are fitted considering raw data and log-transformed data with regular differencing (up to second order differences and, if the time series is seasonal, seasonal differencing (up to first order differences. The value of root mean squared error for each model is calculated averaging the one-step forecasts obtained. The model which has the lowest root mean squared error value and passes the Ljung–Box test using all of the available data with a reasonable significance level is selected among all the ARIMA and state space models considered. The procedure is exemplified in this paper with a case study of retail sales of different categories of women’s footwear from a Portuguese retailer, and its accuracy is compared with three reliable forecasting approaches. The results show that our procedure consistently forecasts more accurately than the other approaches and the improvements in the accuracy are significant.

  16. Latent structure and reliability analysis of the measure of body apperception: cross-validation for head and neck cancer patients.

    Science.gov (United States)

    Jean-Pierre, Pascal; Fundakowski, Christopher; Perez, Enrique; Jean-Pierre, Shadae E; Jean-Pierre, Ashley R; Melillo, Angelica B; Libby, Rachel; Sargi, Zoukaa

    2013-02-01

    Cancer and its treatments are associated with psychological distress that can negatively impact self-perception, psychosocial functioning, and quality of life. Patients with head and neck cancers (HNC) are particularly susceptible to psychological distress. This study involved a cross-validation of the Measure of Body Apperception (MBA) for HNC patients. One hundred and twenty-two English-fluent HNC patients between 20 and 88 years of age completed the MBA on a Likert scale ranging from "1 = disagree" to "4 = agree." We assessed the latent structure and internal consistency reliability of the MBA using Principal Components Analysis (PCA) and Cronbach's coefficient alpha (α), respectively. We determined convergent and divergent validities of the MBA using correlations with the Hospital Anxiety and Depression Scale (HADS), observer disfigurement rating, and patients' clinical and demographic variables. The PCA revealed a coherent set of items that explained 38 % of the variance. The Kaiser-Meyer-Olkin measure of sampling adequacy was 0.73 and the Bartlett's test of sphericity was statistically significant (χ (2) (28) = 253.64; p 0.05). The MBA is a valid and reliable screening measure of body apperception for HNC patients.

  17. Cross-validation of generalised body composition equations with diverse young men and women: the Training Intervention and Genetics of Exercise Response (TIGER) Study

    Science.gov (United States)

    Generalised skinfold equations developed in the 1970s are commonly used to estimate laboratory-measured percentage fat (BF%). The equations were developed on predominately white individuals using Siri's two-component percentage fat equation (BF%-GEN). We cross-validated the Jackson-Pollock (JP) gene...

  18. Cross-Validation of a Recently Published Equation Predicting Energy Expenditure to Run or Walk a Mile in Normal-Weight and Overweight Adults

    Science.gov (United States)

    Morris, Cody E.; Owens, Scott G.; Waddell, Dwight E.; Bass, Martha A.; Bentley, John P.; Loftin, Mark

    2014-01-01

    An equation published by Loftin, Waddell, Robinson, and Owens (2010) was cross-validated using ten normal-weight walkers, ten overweight walkers, and ten distance runners. Energy expenditure was measured at preferred walking (normal-weight walker and overweight walkers) or running pace (distance runners) for 5 min and corrected to a mile. Energy…

  19. Rasch Validation and Cross-validation of the Health of Nation Outcome Scales (HoNOS) for Monitoring of Psychiatric Disability in Traumatized Refugees in Western Psychiatric Care

    DEFF Research Database (Denmark)

    Palic, Sabina; Kappel, Michelle Lind; Makransky, Guido

    2016-01-01

    group. A revised 10-item HoNOS fit the Rasch model at pre-treatment, and also showed excellent fit within the cross-validation data. Culture, gender, and need for translation did not exert serious bias on the measure’s performance. The results establish good monitoring properties of the 10-item Ho...

  20. WE-DE-201-04: Cross Validation of Knowledge-Based Treatment Planning for Prostate LDR Brachytherapy Using Principle Component Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Roper, J; Ghavidel, B; Godette, K; Schreibmann, E [Winship Cancer Institute of Emory University, GA (United States); Chanyavanich, V [Rocky Mountain Cancer Centers, CO (United States)

    2016-06-15

    Purpose: To validate a knowledge-based algorithm for prostate LDR brachytherapy treatment planning. Methods: A dataset of 100 cases was compiled from an active prostate seed implant service. Cases were randomized into 10 subsets. For each subset, the 90 remaining library cases were registered to a common reference frame and then characterized on a point by point basis using principle component analysis (PCA). Each test case was converted to PCA vectors using the same process and compared with each library case using a Mahalanobis distance to evaluate similarity. Rank order PCA scores were used to select the best-matched library case. The seed arrangement was extracted from the best-matched case and used as a starting point for planning the test case. Any subsequent modifications were recorded that required input from a treatment planner to achieve V100>95%, V150<60%, V200<20%. To simulate operating-room planning constraints, seed activity was held constant, and the seed count could not increase. Results: The computational time required to register test-case contours and evaluate PCA similarity across the library was 10s. Preliminary analysis of 2 subsets shows that 9 of 20 test cases did not require any seed modifications to obtain an acceptable plan. Five test cases required fewer than 10 seed modifications or a grid shift. Another 5 test cases required approximately 20 seed modifications. An acceptable plan was not achieved for 1 outlier, which was substantially larger than its best match. Modifications took between 5s and 6min. Conclusion: A knowledge-based treatment planning algorithm for prostate LDR brachytherapy is being cross validated using 100 prior cases. Preliminary results suggest that for this size library, acceptable plans can be achieved without planner input in about half of the cases while varying amounts of planner input are needed in remaining cases. Computational time and planning time are compatible with clinical practice.

  1. Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation.

    Science.gov (United States)

    Saatchi, Mahdi; McClure, Mathew C; McKay, Stephanie D; Rolf, Megan M; Kim, JaeWoo; Decker, Jared E; Taxis, Tasia M; Chapple, Richard H; Ramey, Holly R; Northcutt, Sally L; Bauck, Stewart; Woodward, Brent; Dekkers, Jack C M; Fernando, Rohan L; Schnabel, Robert D; Garrick, Dorian J; Taylor, Jeremy F

    2011-11-28

    Genomic selection is a recently developed technology that is beginning to revolutionize animal breeding. The objective of this study was to estimate marker effects to derive prediction equations for direct genomic values for 16 routinely recorded traits of American Angus beef cattle and quantify corresponding accuracies of prediction. Deregressed estimated breeding values were used as observations in a weighted analysis to derive direct genomic values for 3570 sires genotyped using the Illumina BovineSNP50 BeadChip. These bulls were clustered into five groups using K-means clustering on pedigree estimates of additive genetic relationships between animals, with the aim of increasing within-group and decreasing between-group relationships. All five combinations of four groups were used for model training, with cross-validation performed in the group not used in training. Bivariate animal models were used for each trait to estimate the genetic correlation between deregressed estimated breeding values and direct genomic values. Accuracies of direct genomic values ranged from 0.22 to 0.69 for the studied traits, with an average of 0.44. Predictions were more accurate when animals within the validation group were more closely related to animals in the training set. When training and validation sets were formed by random allocation, the accuracies of direct genomic values ranged from 0.38 to 0.85, with an average of 0.65, reflecting the greater relationship between animals in training and validation. The accuracies of direct genomic values obtained from training on older animals and validating in younger animals were intermediate to the accuracies obtained from K-means clustering and random clustering for most traits. The genetic correlation between deregressed estimated breeding values and direct genomic values ranged from 0.15 to 0.80 for the traits studied. These results suggest that genomic estimates of genetic merit can be produced in beef cattle at a young age but

  2. The Cross-Calibration of Spectral Radiances and Cross-Validation of CO2 Estimates from GOSAT and OCO-2

    Directory of Open Access Journals (Sweden)

    Fumie Kataoka

    2017-11-01

    Full Text Available The Greenhouse gases Observing SATellite (GOSAT launched in January 2009 has provided radiance spectra with a Fourier Transform Spectrometer for more than eight years. The Orbiting Carbon Observatory 2 (OCO-2 launched in July 2014, collects radiance spectra using an imaging grating spectrometer. Both sensors observe sunlight reflected from Earth’s surface and retrieve atmospheric carbon dioxide (CO2 concentrations, but use different spectrometer technologies, observing geometries, and ground track repeat cycles. To demonstrate the effectiveness of satellite remote sensing for CO2 monitoring, the GOSAT and OCO-2 teams have worked together pre- and post-launch to cross-calibrate the instruments and cross-validate their retrieval algorithms and products. In this work, we first compare observed radiance spectra within three narrow bands centered at 0.76, 1.60 and 2.06 µm, at temporally coincident and spatially collocated points from September 2014 to March 2017. We reconciled the differences in observation footprints size, viewing geometry and associated differences in surface bidirectional reflectance distribution function (BRDF. We conclude that the spectral radiances measured by the two instruments agree within 5% for all bands. Second, we estimated mean bias and standard deviation of column-averaged CO2 dry air mole fraction (XCO2 retrieved from GOSAT and OCO-2 from September 2014 to May 2016. GOSAT retrievals used Build 7.3 (V7.3 of the Atmospheric CO2 Observations from Space (ACOS algorithm while OCO-2 retrievals used Version 7 of the OCO-2 retrieval algorithm. The mean biases and standard deviations are −0.57 ± 3.33 ppm over land with high gain, −0.17 ± 1.48 ppm over ocean with high gain and −0.19 ± 2.79 ppm over land with medium gain. Finally, our study is complemented with an analysis of error sources: retrieved surface pressure (Psurf, aerosol optical depth (AOD, BRDF and surface albedo inhomogeneity. We found no change in XCO2

  3. Robustness of two single-item self-esteem measures: cross-validation with a measure of stigma in a sample of psychiatric patients.

    Science.gov (United States)

    Bagley, Christopher

    2005-08-01

    Robins' Single-item Self-esteem Inventory was compared with a single item from the Coopersmith Self-esteem. Although a new scoring format was used, there was good evidence of cross-validation in 83 current and former psychiatric patients who completed Harvey's adapted measure of stigma felt and experienced by users of mental health services. Scores on the two single-item self-esteem measures correlated .76 (p self-esteem in users of mental health services.

  4. Methods to compute reliabilities for genomic predictions of feed intake

    Science.gov (United States)

    For new traits without historical reference data, cross-validation is often the preferred method to validate reliability (REL). Time truncation is less useful because few animals gain substantial REL after the truncation point. Accurate cross-validation requires separating genomic gain from pedigree...

  5. Genome-Wide Association Studies and Comparison of Models and Cross-Validation Strategies for Genomic Prediction of Quality Traits in Advanced Winter Wheat Breeding Lines

    Directory of Open Access Journals (Sweden)

    Peter S. Kristensen

    2018-02-01

    Full Text Available The aim of the this study was to identify SNP markers associated with five important wheat quality traits (grain protein content, Zeleny sedimentation, test weight, thousand-kernel weight, and falling number, and to investigate the predictive abilities of GBLUP and Bayesian Power Lasso models for genomic prediction of these traits. In total, 635 winter wheat lines from two breeding cycles in the Danish plant breeding company Nordic Seed A/S were phenotyped for the quality traits and genotyped for 10,802 SNPs. GWAS were performed using single marker regression and Bayesian Power Lasso models. SNPs with large effects on Zeleny sedimentation were found on chromosome 1B, 1D, and 5D. However, GWAS failed to identify single SNPs with significant effects on the other traits, indicating that these traits were controlled by many QTL with small effects. The predictive abilities of the models for genomic prediction were studied using different cross-validation strategies. Leave-One-Out cross-validations resulted in correlations between observed phenotypes corrected for fixed effects and genomic estimated breeding values of 0.50 for grain protein content, 0.66 for thousand-kernel weight, 0.70 for falling number, 0.71 for test weight, and 0.79 for Zeleny sedimentation. Alternative cross-validations showed that the genetic relationship between lines in training and validation sets had a bigger impact on predictive abilities than the number of lines included in the training set. Using Bayesian Power Lasso instead of GBLUP models, gave similar or slightly higher predictive abilities. Genomic prediction based on all SNPs was more effective than prediction based on few associated SNPs.

  6. Derivation and Cross-Validation of Cutoff Scores for Patients With Schizophrenia Spectrum Disorders on WAIS-IV Digit Span-Based Performance Validity Measures.

    Science.gov (United States)

    Glassmire, David M; Toofanian Ross, Parnian; Kinney, Dominique I; Nitch, Stephen R

    2016-06-01

    Two studies were conducted to identify and cross-validate cutoff scores on the Wechsler Adult Intelligence Scale-Fourth Edition Digit Span-based embedded performance validity (PV) measures for individuals with schizophrenia spectrum disorders. In Study 1, normative scores were identified on Digit Span-embedded PV measures among a sample of patients (n = 84) with schizophrenia spectrum diagnoses who had no known incentive to perform poorly and who put forth valid effort on external PV tests. Previously identified cutoff scores resulted in unacceptable false positive rates and lower cutoff scores were adopted to maintain specificity levels ≥90%. In Study 2, the revised cutoff scores were cross-validated within a sample of schizophrenia spectrum patients (n = 96) committed as incompetent to stand trial. Performance on Digit Span PV measures was significantly related to Full Scale IQ in both studies, indicating the need to consider the intellectual functioning of examinees with psychotic spectrum disorders when interpreting scores on Digit Span PV measures. © The Author(s) 2015.

  7. Comparison of a new expert elicitation model with the Classical Model, equal weights and single experts, using a cross-validation technique

    Energy Technology Data Exchange (ETDEWEB)

    Flandoli, F. [Dip.to di Matematica Applicata, Universita di Pisa, Pisa (Italy); Giorgi, E. [Dip.to di Matematica Applicata, Universita di Pisa, Pisa (Italy); Istituto Nazionale di Geofisica e Vulcanologia, Sezione di Pisa, via della Faggiola 32, 56126 Pisa (Italy); Aspinall, W.P. [Dept. of Earth Sciences, University of Bristol, and Aspinall and Associates, Tisbury (United Kingdom); Neri, A., E-mail: neri@pi.ingv.it [Istituto Nazionale di Geofisica e Vulcanologia, Sezione di Pisa, via della Faggiola 32, 56126 Pisa (Italy)

    2011-10-15

    The problem of ranking and weighting experts' performances when quantitative judgments are being elicited for decision support is considered. A new scoring model, the Expected Relative Frequency model, is presented, based on the closeness between central values provided by the expert and known values used for calibration. Using responses from experts in five different elicitation datasets, a cross-validation technique is used to compare this new approach with the Cooke Classical Model, the Equal Weights model, and individual experts. The analysis is performed using alternative reward schemes designed to capture proficiency either in quantifying uncertainty, or in estimating true central values. Results show that although there is only a limited probability that one approach is consistently better than another, the Cooke Classical Model is generally the most suitable for assessing uncertainties, whereas the new ERF model should be preferred if the goal is central value estimation accuracy. - Highlights: > A new expert elicitation model, named Expected Relative Frequency (ERF), is presented. > A cross-validation approach to evaluate the performance of different elicitation models is applied. > The new ERF model shows the best performance with respect to the point-wise estimates.

  8. Cross-validation of theoretically quantified fiber continuum generation and absolute pulse measurement by MIIPS for a broadband coherently controlled optical source

    DEFF Research Database (Denmark)

    Tu, H.; Liu, Y.; Lægsgaard, Jesper

    2012-01-01

    source with the MIIPS-integrated pulse shaper produces compressed transform-limited 9.6 fs (FWHM) pulses or arbitrarily shaped pulses at a central wavelength of 1020 nm, an average power over 100 mW, and a repetition rate of 76 MHz. In comparison to the 229-fs pump laser pulses that generate the fiber......The predicted spectral phase of a fiber continuum pulsed source rigorously quantified by the scalar generalized nonlinear Schrödinger equation is found to be in excellent agreement with that measured by multiphoton intrapulse interference phase scan (MIIPS) with background subtraction. This cross......-validation confirms the absolute pulse measurement by MIIPS and the transform-limited compression of the fiber continuum pulses by the pulse shaper performing the MIIPS measurement, and permits the subsequent coherent control on the fiber continuum pulses by this pulse shaper. The combination of the fiber continuum...

  9. Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia

    Science.gov (United States)

    Pradhan, Biswajeet

    2010-05-01

    This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross

  10. Using quantitative image analysis to classify axillary lymph nodes on breast MRI: A new application for the Z 0011 Era

    Energy Technology Data Exchange (ETDEWEB)

    Schacht, David V., E-mail: dschacht@radiology.bsd.uchicago.edu; Drukker, Karen, E-mail: kdrukker@uchicago.edu; Pak, Iris, E-mail: irisgpak@gmail.com; Abe, Hiroyuki, E-mail: habe@radiology.bsd.uchicago.edu; Giger, Maryellen L., E-mail: m-giger@uchicago.edu

    2015-03-15

    Highlights: •Quantitative image analysis showed promise in evaluating axillary lymph nodes. •13 of 28 features performed better than guessing at metastatic status. •When all features were used in together, a considerably higher AUC was obtained. -- Abstract: Purpose: To assess the performance of computer extracted feature analysis of dynamic contrast enhanced (DCE) magnetic resonance images (MRI) of axillary lymph nodes. To determine which quantitative features best predict nodal metastasis. Methods: This institutional board-approved HIPAA compliant study, in which informed patient consent was waived, collected enhanced T1 images of the axilla from patients with breast cancer. Lesion segmentation and feature analysis were performed on 192 nodes using a laboratory-developed quantitative image analysis (QIA) workstation. The importance of 28 features were assessed. Classification used the features as input to a neural net classifier in a leave-one-case-out cross-validation and evaluated with receiver operating characteristic (ROC) analysis. Results: The area under the ROC curve (AUC) values for features in the task of distinguishing between positive and negative nodes ranged from just over 0.50 to 0.70. Five features yielded AUCs greater than 0.65: two morphological and three textural features. In cross-validation, the neural net classifier obtained an AUC of 0.88 (SE 0.03) for the task of distinguishing between positive and negative nodes. Conclusion: QIA of DCE MRI demonstrated promising performance in discriminating between positive and negative axillary nodes.

  11. Geostatistical validation and cross-validation of magnetometric measurements of soil pollution with Potentially Toxic Elements in problematic areas

    Science.gov (United States)

    Fabijańczyk, Piotr; Zawadzki, Jarosław

    2016-04-01

    Field magnetometry is fast method that was previously effectively used to assess the potential soil pollution. One of the most popular devices that are used to measure the soil magnetic susceptibility on the soil surface is a MS2D Bartington. Single reading using MS2D device of soil magnetic susceptibility is low time-consuming but often characterized by considerable errors related to the instrument or environmental and lithogenic factors. In this connection, measured values of soil magnetic susceptibility have to be usually validated using more precise, but also much more expensive, chemical measurements. The goal of this study was to analyze validation methods of magnetometric measurements using chemical analyses of a concentration of elements in soil. Additionally, validation of surface measurements of soil magnetic susceptibility was performed using selected parameters of a distribution of magnetic susceptibility in a soil profile. Validation was performed using selected geostatistical measures of cross-correlation. The geostatistical approach was compared with validation performed using the classic statistics. Measurements were performed at selected areas located in the Upper Silesian Industrial Area in Poland, and in the selected parts of Norway. In these areas soil magnetic susceptibility was measured on the soil surface using a MS2D Bartington device and in the soil profile using MS2C Bartington device. Additionally, soil samples were taken in order to perform chemical measurements. Acknowledgment The research leading to these results has received funding from the Polish-Norwegian Research Programme operated by the National Centre for Research and Development under the Norwegian Financial Mechanism 2009-2014 in the frame of Project IMPACT - Contract No Pol-Nor/199338/45/2013.

  12. Defining the most probable location of the parahippocampal place area using cortex-based alignment and cross-validation.

    Science.gov (United States)

    Weiner, Kevin S; Barnett, Michael A; Witthoft, Nathan; Golarai, Golijeh; Stigliani, Anthony; Kay, Kendrick N; Gomez, Jesse; Natu, Vaidehi S; Amunts, Katrin; Zilles, Karl; Grill-Spector, Kalanit

    2018-04-15

    The parahippocampal place area (PPA) is a widely studied high-level visual region in the human brain involved in place and scene processing. The goal of the present study was to identify the most probable location of place-selective voxels in medial ventral temporal cortex. To achieve this goal, we first used cortex-based alignment (CBA) to create a probabilistic place-selective region of interest (ROI) from one group of 12 participants. We then tested how well this ROI could predict place selectivity in each hemisphere within a new group of 12 participants. Our results reveal that a probabilistic ROI (pROI) generated from one group of 12 participants accurately predicts the location and functional selectivity in individual brains from a new group of 12 participants, despite between subject variability in the exact location of place-selective voxels relative to the folding of parahippocampal cortex. Additionally, the prediction accuracy of our pROI is significantly higher than that achieved by volume-based Talairach alignment. Comparing the location of the pROI of the PPA relative to published data from over 500 participants, including data from the Human Connectome Project, shows a striking convergence of the predicted location of the PPA and the cortical location of voxels exhibiting the highest place selectivity across studies using various methods and stimuli. Specifically, the most predictive anatomical location of voxels exhibiting the highest place selectivity in medial ventral temporal cortex is the junction of the collateral and anterior lingual sulci. Methodologically, we make this pROI freely available (vpnl.stanford.edu/PlaceSelectivity), which provides a means to accurately identify a functional region from anatomical MRI data when fMRI data are not available (for example, in patient populations). Theoretically, we consider different anatomical and functional factors that may contribute to the consistent anatomical location of place selectivity

  13. Slips of action and sequential decisions: a cross-validation study of tasks assessing habitual and goal-directed action control

    Directory of Open Access Journals (Sweden)

    Zsuzsika Sjoerds

    2016-12-01

    Full Text Available Instrumental learning and decision-making rely on two parallel systems: a goal-directed and a habitual system. In the past decade, several paradigms have been developed to study these systems in animals and humans by means of e.g. overtraining, devaluation procedures and sequential decision-making. These different paradigms are thought to measure the same constructs, but cross-validation has rarely been investigated. In this study we compared two widely used paradigms that assess aspects of goal-directed and habitual behavior. We correlated parameters from a two-step sequential decision-making task that assesses model-based and model-free learning with a slips-of-action paradigm that assesses the ability to suppress cue-triggered, learnt responses when the outcome has been devalued and is therefore no longer desirable. Model-based control during the two-step task showed a very moderately positive correlation with goal-directed devaluation sensitivity, whereas model-free control did not. Interestingly, parameter estimates of model-based and goal-directed behavior in the two tasks were positively correlated with higher-order cognitive measures (e.g. visual short-term memory. These cognitive measures seemed to (at least partly mediate the association between model-based control during sequential decision-making and goal-directed behavior after instructed devaluation. This study provides moderate support for a common framework to describe the propensity towards goal-directed behavior as measured with two frequently used tasks. However, we have to caution that the amount of shared variance between the goal-directed and model-based system in both tasks was rather low, suggesting that each task does also pick up distinct aspects of goal-directed behavior. Further investigation of the commonalities and differences between the model-free and habit systems as measured with these, and other, tasks is needed. Also, a follow-up cross-validation on the neural

  14. Improved GRACE regional mass balance estimates of the Greenland Ice Sheet cross-validated with the input-output method (discussion paper)

    NARCIS (Netherlands)

    Xu, Z.; Schrama, E.J.O.; Van der Wal, W.; Van den Broeke, M.; Enderlin, E.M.

    2015-01-01

    In this study, we use satellite gravimetry data from the Gravity Recovery and Climate Experiment (GRACE) to estimate regional mass changes of the Greenland ice sheet (GrIS) and neighbouring glaciated regions using a least-squares inversion approach. We also consider results from the input-output

  15. Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart.

    Science.gov (United States)

    Tóth, Gergely; Bodai, Zsolt; Héberger, Károly

    2013-10-01

    Coefficient of determination (R (2)) and its leave-one-out cross-validated analogue (denoted by Q (2) or R cv (2) ) are the most frequantly published values to characterize the predictive performance of models. In this article we use R (2) and Q (2) in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q (2))/(1 - R (2)) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q (2))/(1 - R (2)) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q (2) and R (2) values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling.

  16. Cross-validation of the Dot Counting Test in a large sample of credible and non-credible patients referred for neuropsychological testing.

    Science.gov (United States)

    McCaul, Courtney; Boone, Kyle B; Ermshar, Annette; Cottingham, Maria; Victor, Tara L; Ziegler, Elizabeth; Zeller, Michelle A; Wright, Matthew

    2018-01-18

    To cross-validate the Dot Counting Test in a large neuropsychological sample. Dot Counting Test scores were compared in credible (n = 142) and non-credible (n = 335) neuropsychology referrals. Non-credible patients scored significantly higher than credible patients on all Dot Counting Test scores. While the original E-score cut-off of ≥17 achieved excellent specificity (96.5%), it was associated with mediocre sensitivity (52.8%). However, the cut-off could be substantially lowered to ≥13.80, while still maintaining adequate specificity (≥90%), and raising sensitivity to 70.0%. Examination of non-credible subgroups revealed that Dot Counting Test sensitivity in feigned mild traumatic brain injury (mTBI) was 55.8%, whereas sensitivity was 90.6% in patients with non-credible cognitive dysfunction in the context of claimed psychosis, and 81.0% in patients with non-credible cognitive performance in depression or severe TBI. Thus, the Dot Counting Test may have a particular role in detection of non-credible cognitive symptoms in claimed psychiatric disorders. Alternative to use of the E-score, failure on ≥1 cut-offs applied to individual Dot Counting Test scores (≥6.0″ for mean grouped dot counting time, ≥10.0″ for mean ungrouped dot counting time, and ≥4 errors), occurred in 11.3% of the credible sample, while nearly two-thirds (63.6%) of the non-credible sample failed one of more of these cut-offs. An E-score cut-off of 13.80, or failure on ≥1 individual score cut-offs, resulted in few false positive identifications in credible patients, and achieved high sensitivity (64.0-70.0%), and therefore appear appropriate for use in identifying neurocognitive performance invalidity.

  17. A model of prediction and cross-validation of fat-free mass in men with motor complete spinal cord injury.

    Science.gov (United States)

    Gorgey, Ashraf S; Dolbow, David R; Gater, David R

    2012-07-01

    To establish and validate prediction equations by using body weight to predict legs, trunk, and whole-body fat-free mass (FFM) in men with chronic complete spinal cord injury (SCI). Cross-sectional design. Research setting in a large medical center. Individuals with SCI (N=63) divided into prediction (n=42) and cross-validation (n=21) groups. Not applicable. Whole-body FFM and regional FFM were determined by using dual-energy x-ray absorptiometry. Body weight was measured by using a wheelchair weighing scale after subtracting the weight of the chair. Body weight predicted legs FFM (legs FFM=.09×body weight+6.1; R(2)=.25, standard error of the estimate [SEE]=3.1kg, PFFM (trunk FFM=.21×body weight+8.6; R(2)=.56, SEE=3.6kg, PFFM (whole-body FFM=.288×body weight+26.3; R(2)=.53, SEE=5.3kg, PFFM(predicted) (FFM predicted from the derived equations) shared 86% of the variance in whole-body FFM(measured) (FFM measured using dual-energy x-ray absorptiometry scan) (R(2)=.86, SEE=1.8kg, PFFM(measured), and 66% of legs FFM(measured). The trunk FFM(predicted) shared 69% of the variance in trunk FFM(measured) (R(2)=.69, SEE=2.7kg, PFFM(predicted) shared 67% of the variance in legs FFM(measured) (R(2)=.67, SEE=2.8kg, PFFM did not differ between the prediction and validation groups. Body weight can be used to predict whole-body FFM and regional FFM. The predicted whole-body FFM improved the prediction of trunk FFM and legs FFM. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  18. Evolution of Precipitation Structure During the November DYNAMO MJO Event: Cloud-Resolving Model Intercomparison and Cross Validation Using Radar Observations

    Science.gov (United States)

    Li, Xiaowen; Janiga, Matthew A.; Wang, Shuguang; Tao, Wei-Kuo; Rowe, Angela; Xu, Weixin; Liu, Chuntao; Matsui, Toshihisa; Zhang, Chidong

    2018-04-01

    Evolution of precipitation structures are simulated and compared with radar observations for the November Madden-Julian Oscillation (MJO) event during the DYNAmics of the MJO (DYNAMO) field campaign. Three ground-based, ship-borne, and spaceborne precipitation radars and three cloud-resolving models (CRMs) driven by observed large-scale forcing are used to study precipitation structures at different locations over the central equatorial Indian Ocean. Convective strength is represented by 0-dBZ echo-top heights, and convective organization by contiguous 17-dBZ areas. The multi-radar and multi-model framework allows for more stringent model validations. The emphasis is on testing models' ability to simulate subtle differences observed at different radar sites when the MJO event passed through. The results show that CRMs forced by site-specific large-scale forcing can reproduce not only common features in cloud populations but also subtle variations observed by different radars. The comparisons also revealed common deficiencies in CRM simulations where they underestimate radar echo-top heights for the strongest convection within large, organized precipitation features. Cross validations with multiple radars and models also enable quantitative comparisons in CRM sensitivity studies using different large-scale forcing, microphysical schemes and parameters, resolutions, and domain sizes. In terms of radar echo-top height temporal variations, many model sensitivity tests have better correlations than radar/model comparisons, indicating robustness in model performance on this aspect. It is further shown that well-validated model simulations could be used to constrain uncertainties in observed echo-top heights when the low-resolution surveillance scanning strategy is used.

  19. Higher risks when working unusual times? A cross-validation of the effects on safety, health, and work-life balance.

    Science.gov (United States)

    Greubel, Jana; Arlinghaus, Anna; Nachreiner, Friedhelm; Lombardi, David A

    2016-11-01

    Replication and cross-validation of results on health and safety risks of work at unusual times. Data from two independent surveys (European Working Conditions Surveys 2005 and 2010; EU 2005: n = 23,934 and EU 2010: n = 35,187) were used to examine the relative risks of working at unusual times (evenings, Saturdays, and Sundays) on work-life balance, work-related health complaints, and occupational accidents using logistic regression while controlling for potential confounders such as demographics, work load, and shift work. For the EU 2005 survey, evening work was significantly associated with an increased risk of poor work-life balance (OR 1.69) and work-related health complaints (OR 1.14), Saturday work with poor work-life balance (OR 1.49) and occupational accidents (OR 1.34), and Sunday work with poor work-life balance (OR 1.15) and work-related health complaints (OR 1.17). For EU 2010, evening work was associated with poor work-life balance (OR 1.51) and work-related health complaints (OR 1.12), Saturday work with poor work-life balance (OR 1.60) and occupational accidents (OR 1.19) but a decrease in risk for work-related health complaints (OR 0.86) and Sunday work with work-related health complaints (OR 1.13). Risk estimates in both samples yielded largely similar results with comparable ORs and overlapping confidence intervals. Work at unusual times constitutes a considerable risk to social participation and health and showed structurally consistent effects over time and across samples.

  20. A cross-validation trial of an Internet-based prevention program for alcohol and cannabis: Preliminary results from a cluster randomised controlled trial.

    Science.gov (United States)

    Champion, Katrina E; Newton, Nicola C; Stapinski, Lexine; Slade, Tim; Barrett, Emma L; Teesson, Maree

    2016-01-01

    Replication is an important step in evaluating evidence-based preventive interventions and is crucial for establishing the generalizability and wider impact of a program. Despite this, few replications have occurred in the prevention science field. This study aims to fill this gap by conducting a cross-validation trial of the Climate Schools: Alcohol and Cannabis course, an Internet-based prevention program, among a new cohort of Australian students. A cluster randomized controlled trial was conducted among 1103 students (Mage: 13.25 years) from 13 schools in Australia in 2012. Six schools received the Climate Schools course and 7 schools were randomized to a control group (health education as usual). All students completed a self-report survey at baseline and immediately post-intervention. Mixed-effects regressions were conducted for all outcome variables. Outcomes assessed included alcohol and cannabis use, knowledge and intentions to use these substances. Compared to the control group, immediately post-intervention the intervention group reported significantly greater alcohol (d = 0.67) and cannabis knowledge (d = 0.72), were less likely to have consumed any alcohol (even a sip or taste) in the past 6 months (odds ratio = 0.69) and were less likely to intend on using alcohol in the future (odds ratio = 0.62). However, there were no effects for binge drinking, cannabis use or intentions to use cannabis. These preliminary results provide some support for the Internet-based Climate Schools: Alcohol and Cannabis course as a feasible way of delivering alcohol and cannabis prevention. Intervention effects for alcohol and cannabis knowledge were consistent with results from the original trial; however, analyses of longer-term follow-up data are needed to provide a clearer indication of the efficacy of the intervention, particularly in relation to behavioral changes. © The Royal Australian and New Zealand College of Psychiatrists 2015.

  1. Applying a computer-aided scheme to detect a new radiographic image marker for prediction of chemotherapy outcome

    International Nuclear Information System (INIS)

    Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; Moore, Kathleen; Liu, Hong; Zheng, Bin

    2016-01-01

    To investigate the feasibility of automated segmentation of visceral and subcutaneous fat areas from computed tomography (CT) images of ovarian cancer patients and applying the computed adiposity-related image features to predict chemotherapy outcome. A computerized image processing scheme was developed to segment visceral and subcutaneous fat areas, and compute adiposity-related image features. Then, logistic regression models were applied to analyze association between the scheme-generated assessment scores and progression-free survival (PFS) of patients using a leave-one-case-out cross-validation method and a dataset involving 32 patients. The correlation coefficients between automated and radiologist’s manual segmentation of visceral and subcutaneous fat areas were 0.76 and 0.89, respectively. The scheme-generated prediction scores using adiposity-related radiographic image features significantly associated with patients’ PFS (p < 0.01). Using a computerized scheme enables to more efficiently and robustly segment visceral and subcutaneous fat areas. The computed adiposity-related image features also have potential to improve accuracy in predicting chemotherapy outcome

  2. Near surface geotechnical and geophysical data cross validated for site characterization applications. The cases of selected accelerometric stations in Crete island (Greece)

    Science.gov (United States)

    Loupasakis, Constantinos; Tsangaratos, Paraskevas; Rozos, Dimitrios; Rondoyianni, Theodora; Vafidis, Antonis; Steiakakis, Emanouil; Agioutantis, Zacharias; Savvaidis, Alexandros; Soupios, Pantelis; Papadopoulos, Ioannis; Papadopoulos, Nikos; Sarris, Apostolos; Mangriotis, Maria-Dafni; Dikmen, Unal

    2015-04-01

    The near surface ground conditions are highly important for the design of civil constructions. These conditions determine primarily the ability of the foundation formations to bear loads, the stress - strain relations and the corresponding deformations, as well as the soil amplification and corresponding peak ground motion in case of dynamic loading. The static and dynamic geotechnical parameters as well as the ground-type/soil-category can be determined by combining geotechnical and geophysical methods, such as engineering geological surface mapping, geotechnical drilling, in situ and laboratory testing and geophysical investigations. The above mentioned methods were combined for the site characterization in selected sites of the Hellenic Accelerometric Network (HAN) in the area of Crete Island. The combination of the geotechnical and geophysical methods in thirteen (13) sites provided sufficient information about their limitations, setting up the minimum tests requirements in relation to the type of the geological formations. The reduced accuracy of the surface mapping in urban sites, the uncertainties introduced by the geophysical survey in sites with complex geology and the 1-D data provided by the geotechnical drills are some of the causes affecting the right order and the quantity of the necessary investigation methods. Through this study the gradual improvement on the accuracy of the site characterization data in regards to the applied investigation techniques is presented by providing characteristic examples from the total number of thirteen sites. As an example of the gradual improvement of the knowledge about the ground conditions the case of AGN1 strong motion station, located at Agios Nikolaos city (Eastern Crete), is briefly presented. According to the medium scale geological map of IGME the station was supposed to be founded over limestone. The detailed geological mapping reveled that a few meters of loose alluvial deposits occupy the area, expected

  3. Increased anxiety and depression in Danish cardiac patients with a type D personality: cross-validation of the Type D Scale (DS14)

    DEFF Research Database (Denmark)

    Spindler, Helle; Kruse, Charlotte; Zwisler, Ann-Dorthe

    2009-01-01

    BACKGROUND: Type D personality is an emerging risk factor in cardiovascular disease. We examined the psychometric properties of the Danish version of the Type D Scale (DS14) and the impact of Type D on anxiety and depression in cardiac patients. METHOD: Cardiac patients (n = 707) completed the DS14......, the Hospital Anxiety and Depression Scale, and the Eysenck Personality Questionnaire. A subgroup (n = 318) also completed the DS14 at 3 or 12 weeks. RESULTS: The two-factor structure of the DS14 was confirmed; the subscales negative affectivity and social inhibition were shown to be valid, internally...... consistent (Cronbach's alpha = 0.87/0.91; mean inter-item correlations = 0.49/0.59), and stable over 3 and 12 weeks (r = 0.85/0.78; 0.83/0.79; ps depression (beta, 0.47; p

  4. Determination of Metals Present in Textile Dyes Using Laser-Induced Breakdown Spectroscopy and Cross-Validation Using Inductively Coupled Plasma/Atomic Emission Spectroscopy

    Directory of Open Access Journals (Sweden)

    K. Rehan

    2017-01-01

    Full Text Available Laser-induced breakdown spectroscopy (LIBS was used for the quantitative analysis of elements present in textile dyes at ambient pressure via the fundamental mode (1064 nm of a Nd:YAG pulsed laser. Three samples were collected for this purpose. Spectra of textile dyes were acquired using an HR spectrometer (LIBS2000+, Ocean Optics, Inc. having an optical resolution of 0.06 nm in the spectral range of 200 to 720 nm. Toxic metals like Cr, Cu, Fe, Ni, and Zn along with other elements like Al, Mg, Ca, and Na were revealed to exist in the samples. The %-age concentrations of the detected elements were measured by means of standard calibration curve method, intensities of every emission from every species, and calibration-free (CF LIBS approach. Only Sample 3 was found to contain heavy metals like Cr, Cu, and Ni above the prescribed limit. The results using LIBS were found to be in good agreement when compared to outcomes of inductively coupled plasma/atomic emission spectroscopy (ICP/AES.

  5. Cross-validation of the Spanish HP-Version of the Jefferson Scale of Empathy confirmed with some cross-cultural differences

    Directory of Open Access Journals (Sweden)

    Adelina Alcorta-Garza

    2016-07-01

    Full Text Available Context: Medical educators agree that empathy is essential for physicians’ professionalism. The Health Professional Version of the Jefferson Scale of Empathy (JSE-HP was developed in response to a need for a psychometrically sound instrument to measure empathy in the context of patient care. Although extensive support for its validity and reliability is available, the authors recognize the necessity to examine psychometrics of the JSE-HP in different socio-cultural contexts to assure the psychometric soundness of this instrument. The first aim of this study was to confirm its psychometric properties in the cross-cultural context of Spain and Latin American countries. The second aim was to measure the influence of social and cultural factors on the development of medical empathy in health practitioners.Methods: The original English version of the JSE-HP was translated into International Spanish using back-translation procedures. The Spanish version of the JSE-HP was administered to 896 physicians from Spain and thirteen Latin American countries. Data were subjected to exploratory factor analysis using principal component analysis with oblique rotation (promax to allow for correlation among the resulting factors, followed by a second analysis, using confirmatory factor analysis. Two theoretical models, one based on the English JSE-HP and another on the first Spanish student version of the JSE (JSE-S, were tested. Demographic variables were compared using group comparisons.Results: A total of 715 (80% surveys were returned fully completed. Cronbach’s alpha coefficient of the JSE for the entire sample was 0.84. The psychometric properties of the Spanish JSE-HP matched those of the original English JSE-HP. However, the Spanish JSE-S model proved more appropriate than the original English model for the sample in this study. Group comparisons among physicians classified by gender, medical specialties, cultural and cross-cultural backgrounds yielded

  6. Cross-validation of commercial enzyme-linked immunosorbent assay and radioimmunoassay for porcine C-peptide concentration measurements in non-human primate serum.

    Science.gov (United States)

    Gresch, Sarah C; Mutch, Lucas A; Janecek, Jody L; Hegstad-Davies, Rebecca L; Graham, Melanie L

    2017-09-01

    C-peptide concentration is widely used as a marker of insulin secretion and is especially relevant in evaluating islet graft function following transplantation, because its measurement is not confounded by the presence of exogenous insulin. To address the shortage of human islet donors, the use of porcine islets has been proposed as a possible solution and the stringent pig-to-non-human primate (NHP) model is often the most relevant for pre-clinical evaluation of the potential for diabetes reversal resulting from an islet xenograft. The Millipore radioimmunoassay (RIA) was exclusively used to measure porcine C-peptide (PCP) until 2013 when the assay was discontinued and subsequently a commercially available enzyme-linked immunosorbent assay (ELISA) from Mercodia has been widely adopted. Both assays have been used in pre-clinical trials evaluating the therapeutic potential of xenograft products in reversing diabetes in the pig-to-NHP model, to interpret data in a comparable way it may be useful to perform a harmonization of C-peptide measurements. We performed a method comparison by determining the PCP concentration in 620 serum samples collected from 20 diabetic cynomolgus macaques transplanted with adult porcine islets. All analyses were performed according to manufacturer instructions. With both assays, we demonstrated an acceptable detection limit, precision, and recovery. Linearity of the ELISA met acceptance criteria at all concentrations tested while linearity of the RIA only met acceptance criteria at five of the eight concentrations tested. The RIA had a detection limit of 0.16 ng/mL, and recovery ranged from 82% to 96% and met linearity acceptance criteria at 0.35 ng/mL and from 0.78 to 2.33 ng/mL. The ELISA had a detection limit of 0.03 ng/mL, and recovery ranged from 81% to 115% and met linearity acceptance criteria from 0.08 to 0.85 ng/mL. Both assays had intra-assay precision assay precision ELISA demonstrated a significant correlation with RIA (R

  7. Cross Validating Ocean Prediction and Monitoring Systems

    National Research Council Canada - National Science Library

    Mooers, Christopher; Meinen, Christopher; Baringer, Molly; Bang, Inkweon; Rhodes, Robert C; Barron, Charlie N; Bub, Frank

    2005-01-01

    With the ongoing development of ocean circulation models and real-time observing systems, routine estimation of the synoptic state of the ocean is becoming feasible for practical and scientific purposes...

  8. Mammographic quantitative image analysis and biologic image composition for breast lesion characterization and classification

    Energy Technology Data Exchange (ETDEWEB)

    Drukker, Karen, E-mail: kdrukker@uchicago.edu; Giger, Maryellen L.; Li, Hui [Department of Radiology, University of Chicago, Chicago, Illinois 60637 (United States); Duewer, Fred; Malkov, Serghei; Joe, Bonnie; Kerlikowske, Karla; Shepherd, John A. [Radiology Department, University of California, San Francisco, California 94143 (United States); Flowers, Chris I. [Department of Radiology, University of South Florida, Tampa, Florida 33612 (United States); Drukteinis, Jennifer S. [Department of Radiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida 33612 (United States)

    2014-03-15

    Purpose: To investigate whether biologic image composition of mammographic lesions can improve upon existing mammographic quantitative image analysis (QIA) in estimating the probability of malignancy. Methods: The study population consisted of 45 breast lesions imaged with dual-energy mammography prior to breast biopsy with final diagnosis resulting in 10 invasive ductal carcinomas, 5 ductal carcinomain situ, 11 fibroadenomas, and 19 other benign diagnoses. Analysis was threefold: (1) The raw low-energy mammographic images were analyzed with an established in-house QIA method, “QIA alone,” (2) the three-compartment breast (3CB) composition measure—derived from the dual-energy mammography—of water, lipid, and protein thickness were assessed, “3CB alone”, and (3) information from QIA and 3CB was combined, “QIA + 3CB.” Analysis was initiated from radiologist-indicated lesion centers and was otherwise fully automated. Steps of the QIA and 3CB methods were lesion segmentation, characterization, and subsequent classification for malignancy in leave-one-case-out cross-validation. Performance assessment included box plots, Bland–Altman plots, and Receiver Operating Characteristic (ROC) analysis. Results: The area under the ROC curve (AUC) for distinguishing between benign and malignant lesions (invasive and DCIS) was 0.81 (standard error 0.07) for the “QIA alone” method, 0.72 (0.07) for “3CB alone” method, and 0.86 (0.04) for “QIA+3CB” combined. The difference in AUC was 0.043 between “QIA + 3CB” and “QIA alone” but failed to reach statistical significance (95% confidence interval [–0.17 to + 0.26]). Conclusions: In this pilot study analyzing the new 3CB imaging modality, knowledge of the composition of breast lesions and their periphery appeared additive in combination with existing mammographic QIA methods for the distinction between different benign and malignant lesion types.

  9. Computer-aided classification of breast masses using contrast-enhanced digital mammograms

    Science.gov (United States)

    Danala, Gopichandh; Aghaei, Faranak; Heidari, Morteza; Wu, Teresa; Patel, Bhavika; Zheng, Bin

    2018-02-01

    By taking advantages of both mammography and breast MRI, contrast-enhanced digital mammography (CEDM) has emerged as a new promising imaging modality to improve efficacy of breast cancer screening and diagnosis. The primary objective of study is to develop and evaluate a new computer-aided detection and diagnosis (CAD) scheme of CEDM images to classify between malignant and benign breast masses. A CEDM dataset consisting of 111 patients (33 benign and 78 malignant) was retrospectively assembled. Each case includes two types of images namely, low-energy (LE) and dual-energy subtracted (DES) images. First, CAD scheme applied a hybrid segmentation method to automatically segment masses depicting on LE and DES images separately. Optimal segmentation results from DES images were also mapped to LE images and vice versa. Next, a set of 109 quantitative image features related to mass shape and density heterogeneity was initially computed. Last, four multilayer perceptron-based machine learning classifiers integrated with correlationbased feature subset evaluator and leave-one-case-out cross-validation method was built to classify mass regions depicting on LE and DES images, respectively. Initially, when CAD scheme was applied to original segmentation of DES and LE images, the areas under ROC curves were 0.7585+/-0.0526 and 0.7534+/-0.0470, respectively. After optimal segmentation mapping from DES to LE images, AUC value of CAD scheme significantly increased to 0.8477+/-0.0376 (pbreast tissue on lesions, segmentation accuracy was significantly improved as compared to regular mammograms, the study demonstrated that computer-aided classification of breast masses using CEDM images yielded higher performance.

  10. Multi-probe-based resonance-frequency electrical impedance spectroscopy for detection of suspicious breast lesions: improving performance using partial ROC optimization

    Science.gov (United States)

    Lederman, Dror; Zheng, Bin; Wang, Xingwei; Wang, Xiao Hui; Gur, David

    2011-03-01

    We have developed a multi-probe resonance-frequency electrical impedance spectroscope (REIS) system to detect breast abnormalities. Based on assessing asymmetry in REIS signals acquired between left and right breasts, we developed several machine learning classifiers to classify younger women (i.e., under 50YO) into two groups of having high and low risk for developing breast cancer. In this study, we investigated a new method to optimize performance based on the area under a selected partial receiver operating characteristic (ROC) curve when optimizing an artificial neural network (ANN), and tested whether it could improve classification performance. From an ongoing prospective study, we selected a dataset of 174 cases for whom we have both REIS signals and diagnostic status verification. The dataset includes 66 "positive" cases recommended for biopsy due to detection of highly suspicious breast lesions and 108 "negative" cases determined by imaging based examinations. A set of REIS-based feature differences, extracted from the two breasts using a mirror-matched approach, was computed and constituted an initial feature pool. Using a leave-one-case-out cross-validation method, we applied a genetic algorithm (GA) to train the ANN with an optimal subset of features. Two optimization criteria were separately used in GA optimization, namely the area under the entire ROC curve (AUC) and the partial area under the ROC curve, up to a predetermined threshold (i.e., 90% specificity). The results showed that although the ANN optimized using the entire AUC yielded higher overall performance (AUC = 0.83 versus 0.76), the ANN optimized using the partial ROC area criterion achieved substantially higher operational performance (i.e., increasing sensitivity level from 28% to 48% at 95% specificity and/ or from 48% to 58% at 90% specificity).

  11. Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm

    Science.gov (United States)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Hollingsworth, Alan B.; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qiu, Yuchen; Liu, Hong; Zheng, Bin

    2018-02-01

    In order to automatically identify a set of effective mammographic image features and build an optimal breast cancer risk stratification model, this study aims to investigate advantages of applying a machine learning approach embedded with a locally preserving projection (LPP) based feature combination and regeneration algorithm to predict short-term breast cancer risk. A dataset involving negative mammograms acquired from 500 women was assembled. This dataset was divided into two age-matched classes of 250 high risk cases in which cancer was detected in the next subsequent mammography screening and 250 low risk cases, which remained negative. First, a computer-aided image processing scheme was applied to segment fibro-glandular tissue depicted on mammograms and initially compute 44 features related to the bilateral asymmetry of mammographic tissue density distribution between left and right breasts. Next, a multi-feature fusion based machine learning classifier was built to predict the risk of cancer detection in the next mammography screening. A leave-one-case-out (LOCO) cross-validation method was applied to train and test the machine learning classifier embedded with a LLP algorithm, which generated a new operational vector with 4 features using a maximal variance approach in each LOCO process. Results showed a 9.7% increase in risk prediction accuracy when using this LPP-embedded machine learning approach. An increased trend of adjusted odds ratios was also detected in which odds ratios increased from 1.0 to 11.2. This study demonstrated that applying the LPP algorithm effectively reduced feature dimensionality, and yielded higher and potentially more robust performance in predicting short-term breast cancer risk.

  12. Computer-aided diagnosis of lung cancer: the effect of training data sets on classification accuracy of lung nodules

    Science.gov (United States)

    Gong, Jing; Liu, Ji-Yu; Sun, Xi-Wen; Zheng, Bin; Nie, Sheng-Dong

    2018-02-01

    This study aims to develop a computer-aided diagnosis (CADx) scheme for classification between malignant and benign lung nodules, and also assess whether CADx performance changes in detecting nodules associated with early and advanced stage lung cancer. The study involves 243 biopsy-confirmed pulmonary nodules. Among them, 76 are benign, 81 are stage I and 86 are stage III malignant nodules. The cases are separated into three data sets involving: (1) all nodules, (2) benign and stage I malignant nodules, and (3) benign and stage III malignant nodules. A CADx scheme is applied to segment lung nodules depicted on computed tomography images and we initially computed 66 3D image features. Then, three machine learning models namely, a support vector machine, naïve Bayes classifier and linear discriminant analysis, are separately trained and tested by using three data sets and a leave-one-case-out cross-validation method embedded with a Relief-F feature selection algorithm. When separately using three data sets to train and test three classifiers, the average areas under receiver operating characteristic curves (AUC) are 0.94, 0.90 and 0.99, respectively. When using the classifiers trained using data sets with all nodules, average AUC values are 0.88 and 0.99 for detecting early and advanced stage nodules, respectively. AUC values computed from three classifiers trained using the same data set are consistent without statistically significant difference (p  >  0.05). This study demonstrates (1) the feasibility of applying a CADx scheme to accurately distinguish between benign and malignant lung nodules, and (2) a positive trend between CADx performance and cancer progression stage. Thus, in order to increase CADx performance in detecting subtle and early cancer, training data sets should include more diverse early stage cancer cases.

  13. Improving performance of breast cancer risk prediction using a new CAD-based region segmentation scheme

    Science.gov (United States)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Qiu, Yuchen; Zheng, Bin

    2018-02-01

    Objective of this study is to develop and test a new computer-aided detection (CAD) scheme with improved region of interest (ROI) segmentation combined with an image feature extraction framework to improve performance in predicting short-term breast cancer risk. A dataset involving 570 sets of "prior" negative mammography screening cases was retrospectively assembled. In the next sequential "current" screening, 285 cases were positive and 285 cases remained negative. A CAD scheme was applied to all 570 "prior" negative images to stratify cases into the high and low risk case group of having cancer detected in the "current" screening. First, a new ROI segmentation algorithm was used to automatically remove useless area of mammograms. Second, from the matched bilateral craniocaudal view images, a set of 43 image features related to frequency characteristics of ROIs were initially computed from the discrete cosine transform and spatial domain of the images. Third, a support vector machine model based machine learning classifier was used to optimally classify the selected optimal image features to build a CAD-based risk prediction model. The classifier was trained using a leave-one-case-out based cross-validation method. Applying this improved CAD scheme to the testing dataset, an area under ROC curve, AUC = 0.70+/-0.04, which was significantly higher than using the extracting features directly from the dataset without the improved ROI segmentation step (AUC = 0.63+/-0.04). This study demonstrated that the proposed approach could improve accuracy on predicting short-term breast cancer risk, which may play an important role in helping eventually establish an optimal personalized breast cancer paradigm.

  14. Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis.

    Science.gov (United States)

    Samala, Ravi K; Chan, Heang-Ping; Hadjiiski, Lubomir M; Helvie, Mark A; Richter, Caleb; Cha, Kenny

    2018-05-01

    Deep learning models are highly parameterized, resulting in difficulty in inference and transfer learning for image recognition tasks. In this work, we propose a layered pathway evolution method to compress a deep convolutional neural network (DCNN) for classification of masses in digital breast tomosynthesis (DBT). The objective is to prune the number of tunable parameters while preserving the classification accuracy. In the first stage transfer learning, 19 632 augmented regions-of-interest (ROIs) from 2454 mass lesions on mammograms were used to train a pre-trained DCNN on ImageNet. In the second stage transfer learning, the DCNN was used as a feature extractor followed by feature selection and random forest classification. The pathway evolution was performed using genetic algorithm in an iterative approach with tournament selection driven by count-preserving crossover and mutation. The second stage was trained with 9120 DBT ROIs from 228 mass lesions using leave-one-case-out cross-validation. The DCNN was reduced by 87% in the number of neurons, 34% in the number of parameters, and 95% in the number of multiply-and-add operations required in the convolutional layers. The test AUC on 89 mass lesions from 94 independent DBT cases before and after pruning were 0.88 and 0.90, respectively, and the difference was not statistically significant (p  >  0.05). The proposed DCNN compression approach can reduce the number of required operations by 95% while maintaining the classification performance. The approach can be extended to other deep neural networks and imaging tasks where transfer learning is appropriate.

  15. Fusion of classifiers for REIS-based detection of suspicious breast lesions

    Science.gov (United States)

    Lederman, Dror; Wang, Xingwei; Zheng, Bin; Sumkin, Jules H.; Tublin, Mitchell; Gur, David

    2011-03-01

    After developing a multi-probe resonance-frequency electrical impedance spectroscopy (REIS) system aimed at detecting women with breast abnormalities that may indicate a developing breast cancer, we have been conducting a prospective clinical study to explore the feasibility of applying this REIS system to classify younger women (breast cancer. The system comprises one central probe placed in contact with the nipple, and six additional probes uniformly distributed along an outside circle to be placed in contact with six points on the outer breast skin surface. In this preliminary study, we selected an initial set of 174 examinations on participants that have completed REIS examinations and have clinical status verification. Among these, 66 examinations were recommended for biopsy due to findings of a highly suspicious breast lesion ("positives"), and 108 were determined as negative during imaging based procedures ("negatives"). A set of REIS-based features, extracted using a mirror-matched approach, was computed and fed into five machine learning classifiers. A genetic algorithm was used to select an optimal subset of features for each of the five classifiers. Three fusion rules, namely sum rule, weighted sum rule and weighted median rule, were used to combine the results of the classifiers. Performance evaluation was performed using a leave-one-case-out cross-validation method. The results indicated that REIS may provide a new technology to identify younger women with higher than average risk of having or developing breast cancer. Furthermore, it was shown that fusion rule, such as a weighted median fusion rule and a weighted sum fusion rule may improve performance as compared with the highest performing single classifier.

  16. Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis

    Science.gov (United States)

    Samala, Ravi K.; Chan, Heang-Ping; Hadjiiski, Lubomir M.; Helvie, Mark A.; Richter, Caleb; Cha, Kenny

    2018-05-01

    Deep learning models are highly parameterized, resulting in difficulty in inference and transfer learning for image recognition tasks. In this work, we propose a layered pathway evolution method to compress a deep convolutional neural network (DCNN) for classification of masses in digital breast tomosynthesis (DBT). The objective is to prune the number of tunable parameters while preserving the classification accuracy. In the first stage transfer learning, 19 632 augmented regions-of-interest (ROIs) from 2454 mass lesions on mammograms were used to train a pre-trained DCNN on ImageNet. In the second stage transfer learning, the DCNN was used as a feature extractor followed by feature selection and random forest classification. The pathway evolution was performed using genetic algorithm in an iterative approach with tournament selection driven by count-preserving crossover and mutation. The second stage was trained with 9120 DBT ROIs from 228 mass lesions using leave-one-case-out cross-validation. The DCNN was reduced by 87% in the number of neurons, 34% in the number of parameters, and 95% in the number of multiply-and-add operations required in the convolutional layers. The test AUC on 89 mass lesions from 94 independent DBT cases before and after pruning were 0.88 and 0.90, respectively, and the difference was not statistically significant (p  >  0.05). The proposed DCNN compression approach can reduce the number of required operations by 95% while maintaining the classification performance. The approach can be extended to other deep neural networks and imaging tasks where transfer learning is appropriate.

  17. Applying a CAD-generated imaging marker to assess short-term breast cancer risk

    Science.gov (United States)

    Mirniaharikandehei, Seyedehnafiseh; Zarafshani, Ali; Heidari, Morteza; Wang, Yunzhi; Aghaei, Faranak; Zheng, Bin

    2018-02-01

    Although whether using computer-aided detection (CAD) helps improve radiologists' performance in reading and interpreting mammograms is controversy due to higher false-positive detection rates, objective of this study is to investigate and test a new hypothesis that CAD-generated false-positives, in particular, the bilateral summation of false-positives, is a potential imaging marker associated with short-term breast cancer risk. An image dataset involving negative screening mammograms acquired from 1,044 women was retrospectively assembled. Each case involves 4 images of craniocaudal (CC) and mediolateral oblique (MLO) view of the left and right breasts. In the next subsequent mammography screening, 402 cases were positive for cancer detected and 642 remained negative. A CAD scheme was applied to process all "prior" negative mammograms. Some features from CAD scheme were extracted, which include detection seeds, the total number of false-positive regions, an average of detection scores and the sum of detection scores in CC and MLO view images. Then the features computed from two bilateral images of left and right breasts from either CC or MLO view were combined. In order to predict the likelihood of each testing case being positive in the next subsequent screening, two logistic regression models were trained and tested using a leave-one-case-out based cross-validation method. Data analysis demonstrated the maximum prediction accuracy with an area under a ROC curve of AUC=0.65+/-0.017 and the maximum adjusted odds ratio of 4.49 with a 95% confidence interval of [2.95, 6.83]. The results also illustrated an increasing trend in the adjusted odds ratio and risk prediction scores (pbreast cancer risk.

  18. Applying a new computer-aided detection scheme generated imaging marker to predict short-term breast cancer risk

    Science.gov (United States)

    Mirniaharikandehei, Seyedehnafiseh; Hollingsworth, Alan B.; Patel, Bhavika; Heidari, Morteza; Liu, Hong; Zheng, Bin

    2018-05-01

    This study aims to investigate the feasibility of identifying a new quantitative imaging marker based on false-positives generated by a computer-aided detection (CAD) scheme to help predict short-term breast cancer risk. An image dataset including four view mammograms acquired from 1044 women was retrospectively assembled. All mammograms were originally interpreted as negative by radiologists. In the next subsequent mammography screening, 402 women were diagnosed with breast cancer and 642 remained negative. An existing CAD scheme was applied ‘as is’ to process each image. From CAD-generated results, four detection features including the total number of (1) initial detection seeds and (2) the final detected false-positive regions, (3) average and (4) sum of detection scores, were computed from each image. Then, by combining the features computed from two bilateral images of left and right breasts from either craniocaudal or mediolateral oblique view, two logistic regression models were trained and tested using a leave-one-case-out cross-validation method to predict the likelihood of each testing case being positive in the next subsequent screening. The new prediction model yielded the maximum prediction accuracy with an area under a ROC curve of AUC  =  0.65  ±  0.017 and the maximum adjusted odds ratio of 4.49 with a 95% confidence interval of (2.95, 6.83). The results also showed an increasing trend in the adjusted odds ratio and risk prediction scores (p  breast cancer risk.

  19. Cross-validated stable-isotope dilution GC-MS and LC-MS/MS assays for monoacylglycerol lipase (MAGL) activity by measuring arachidonic acid released from the endocannabinoid 2-arachidonoyl glycerol.

    Science.gov (United States)

    Kayacelebi, Arslan Arinc; Schauerte, Celina; Kling, Katharina; Herbers, Jan; Beckmann, Bibiana; Engeli, Stefan; Jordan, Jens; Zoerner, Alexander A; Tsikas, Dimitrios

    2017-03-15

    2-Arachidonoyl glycerol (2AG) is an endocannabinoid that activates cannabinoid (CB) receptors CB1 and CB2. Monoacylglycerol lipase (MAGL) inactivates 2AG through hydrolysis to arachidonic acid (AA) and glycerol, thus modulating the activity at CB receptors. In the brain, AA released from 2AG by the action of MAGL serves as a substrate for cyclooxygenases which produce pro-inflammatory prostaglandins. Here we report stable-isotope GC-MS and LC-MS/MS assays for the reliable measurement of MAGL activity. The assays utilize deuterium-labeled 2AG (d 8 -2AG; 10μM) as the MAGL substrate and measure deuterium-labeled AA (d 8 -AA; range 0-1μM) as the MAGL product. Unlabelled AA (d 0 -AA, 1μM) serves as the internal standard. d 8 -AA and d 0 -AA are extracted from the aqueous buffered incubation mixtures by ethyl acetate. Upon solvent evaporation the residue is reconstituted in the mobile phase prior to LC-MS/MS analysis or in anhydrous acetonitrile for GC-MS analysis. LC-MS/MS analysis is performed in the negative electrospray ionization mode by selected-reaction monitoring the mass transitions [M-H] - →[M-H - CO 2 ] - , i.e., m/z 311→m/z 267 for d 8 -AA and m/z 303→m/z 259 for d 0 -AA. Prior to GC-MS analysis d 8 -AA and d 0 -AA were converted to their pentafluorobenzyl (PFB) esters by means of PFB-Br. GC-MS analysis is performed in the electron-capture negative-ion chemical ionization mode by selected-ion monitoring the ions [M-PFB] - , i.e., m/z 311 for d 8 -AA and m/z 303 for d 0 -AA. The GC-MS and LC-MS/MS assays were cross-validated. Linear regression analysis between the concentration (range, 0-1μM) of d 8 -AA measured by LC-MS/MS (y) and that by GC-MS (x) revealed a straight line (r 2 =0.9848) with the regression equation y=0.003+0.898x, indicating a good agreement. In dog liver, we detected MAGL activity that was inhibitable by the MAGL inhibitor JZL-184. Exogenous eicosatetraynoic acid is suitable as internal standard for the quantitative determination

  20. Method

    Directory of Open Access Journals (Sweden)

    Ling Fiona W.M.

    2017-01-01

    Full Text Available Rapid prototyping of microchannel gain lots of attention from researchers along with the rapid development of microfluidic technology. The conventional methods carried few disadvantages such as high cost, time consuming, required high operating pressure and temperature and involve expertise in operating the equipment. In this work, new method adapting xurography method is introduced to replace the conventional method of fabrication of microchannels. The novelty in this study is replacing the adhesion film with clear plastic film which was used to cut the design of the microchannel as the material is more suitable for fabricating more complex microchannel design. The microchannel was then mold using polymethyldisiloxane (PDMS and bonded with a clean glass to produce a close microchannel. The microchannel produced had a clean edge indicating good master mold was produced using the cutting plotter and the bonding between the PDMS and glass was good where no leakage was observed. The materials used in this method is cheap and the total time consumed is less than 5 hours where this method is suitable for rapid prototyping of microchannel.

  1. Detecting generalized synchronization of chaotic dynamical systems. A kernel-based method and choice of its parameter

    International Nuclear Information System (INIS)

    Suetani, Hiromichi; Iba, Yukito; Aihara, Kazuyuki

    2006-01-01

    An approach based on the kernel methods for capturing the nonlinear interdependence between two signals is introduced. It is demonstrated that the proposed approach is useful for characterizing generalized synchronization with a successful simple example. An attempt to choose an optimal kernel parameter based on cross validation is also discussed. (author)

  2. method

    Directory of Open Access Journals (Sweden)

    L. M. Kimball

    2002-01-01

    Full Text Available This paper presents an interior point algorithm to solve the multiperiod hydrothermal economic dispatch (HTED. The multiperiod HTED is a large scale nonlinear programming problem. Various optimization methods have been applied to the multiperiod HTED, but most neglect important network characteristics or require decomposition into thermal and hydro subproblems. The algorithm described here exploits the special bordered block diagonal structure and sparsity of the Newton system for the first order necessary conditions to result in a fast efficient algorithm that can account for all network aspects. Applying this new algorithm challenges a conventional method for the use of available hydro resources known as the peak shaving heuristic.

  3. Multiple inflammatory biomarker detection in a prospective cohort study: a cross-validation between well-established single-biomarker techniques and electrochemiluminescense-based multi-array platform

    NARCIS (Netherlands)

    Bussel, van B.C.T.; Ferreira, I.; Waarenburg, M.P.H.; Greevenbroek, van M.M.J.; Kallen, van der C.J.H.; Henry, R.M.A.; Feskens, E.J.M.; Stehouwer, C.D.A.; Schalkwijk, C.G.

    2013-01-01

    Background - In terms of time, effort and quality, multiplex technology is an attractive alternative for well-established single-biomarker measurements in clinical studies. However, limited data comparing these methods are available. Methods - We measured, in a large ongoing cohort study (n = 574),

  4. Multiple inflammatory biomarker detection in a prospective cohort study: a cross-validation between well-established single-biomarker techniques and an electrochemiluminescense-based multi-array platform.

    Directory of Open Access Journals (Sweden)

    Bas C T van Bussel

    Full Text Available BACKGROUND: In terms of time, effort and quality, multiplex technology is an attractive alternative for well-established single-biomarker measurements in clinical studies. However, limited data comparing these methods are available. METHODS: We measured, in a large ongoing cohort study (n = 574, by means of both a 4-plex multi-array biomarker assay developed by MesoScaleDiscovery (MSD and single-biomarker techniques (ELISA or immunoturbidimetric assay, the following biomarkers of low-grade inflammation: C-reactive protein (CRP, serum amyloid A (SAA, soluble intercellular adhesion molecule 1 (sICAM-1 and soluble vascular cell adhesion molecule 1 (sVCAM-1. These measures were realigned by weighted Deming regression and compared across a wide spectrum of subjects' cardiovascular risk factors by ANOVA. RESULTS: Despite that both methods ranked individuals' levels of biomarkers very similarly (Pearson's r all≥0.755 absolute concentrations of all biomarkers differed significantly between methods. Equations retrieved by the Deming regression enabled proper realignment of the data to overcome these differences, such that intra-class correlation coefficients were then 0.996 (CRP, 0.711 (SAA, 0.895 (sICAM-1 and 0.858 (sVCAM-1. Additionally, individual biomarkers differed across categories of glucose metabolism, weight, metabolic syndrome and smoking status to a similar extent by either method. CONCLUSIONS: Multiple low-grade inflammatory biomarker data obtained by the 4-plex multi-array platform of MSD or by well-established single-biomarker methods are comparable after proper realignment of differences in absolute concentrations, and are equally associated with cardiovascular risk factors, regardless of such differences. Given its greater efficiency, the MSD platform is a potential tool for the quantification of multiple biomarkers of low-grade inflammation in large ongoing and future clinical studies.

  5. Hybrid RGSA and Support Vector Machine Framework for Three-Dimensional Magnetic Resonance Brain Tumor Classification

    Directory of Open Access Journals (Sweden)

    R. Rajesh Sharma

    2015-01-01

    algorithm (RGSA. Support vector machines, over backpropagation network, and k-nearest neighbor are used to evaluate the goodness of classifier approach. The preliminary evaluation of the system is performed using 320 real-time brain MRI images. The system is trained and tested by using a leave-one-case-out method. The performance of the classifier is tested using the receiver operating characteristic curve of 0.986 (±002. The experimental results demonstrate the systematic and efficient feature extraction and feature selection algorithm to the performance of state-of-the-art feature classification methods.

  6. Doubly stochastic radial basis function methods

    Science.gov (United States)

    Yang, Fenglian; Yan, Liang; Ling, Leevan

    2018-06-01

    We propose a doubly stochastic radial basis function (DSRBF) method for function recoveries. Instead of a constant, we treat the RBF shape parameters as stochastic variables whose distribution were determined by a stochastic leave-one-out cross validation (LOOCV) estimation. A careful operation count is provided in order to determine the ranges of all the parameters in our methods. The overhead cost for setting up the proposed DSRBF method is O (n2) for function recovery problems with n basis. Numerical experiments confirm that the proposed method not only outperforms constant shape parameter formulation (in terms of accuracy with comparable computational cost) but also the optimal LOOCV formulation (in terms of both accuracy and computational cost).

  7. Cross-Validation of Indicators of Cognitive Workload

    National Research Council Canada - National Science Library

    Marshall, Sandra P; Bartels, Mike

    2005-01-01

    .... The current study replicated the human performance findings of the previous phase of AMBR and added eye tracking analyses to enhance understanding of participants' behavior and to compare NASA TLX...

  8. Dioscorea deltoidea in Nepal: Cross Validating Uses and Ethnopharmacological Relevance

    Czech Academy of Sciences Publication Activity Database

    Rokaya, Maan Bahadur; Sharma, L.

    2016-01-01

    Roč. 2, č. 2 (2016), s. 17-26 ISSN 2455-4812 R&D Projects: GA MŠk(CZ) LO1415 Institutional support: RVO:67179843 Keywords : food * poisoning * herbarium specimen * identification Subject RIV: EH - Ecology, Behaviour

  9. Cross Validated Temperament Scale Validities Computed Using Profile Similarity Metrics

    Science.gov (United States)

    2017-04-27

    ORGANIZATION NAME(S) AND ADDRESS(ES) U. S. Army Research Institute for the Behavioral & Social Sciences 6000 6TH Street (Bldg. 1464 / Mail...AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) U. S. Army Research Institute for the Behavioral & Social Sciences 6000 6TH...respondent’s scale score is equal to the mean of the non-reversed and recoded-reversed items. Table 1 portrays the conventional scoring algorithm on

  10. Internet Attack Traceback: Cross-Validation and Pebble-Trace

    Science.gov (United States)

    2013-02-28

    stolen-cyber-attack. [3] Hacked: Data breach costly for Ohio State, victims of compromised info http://www.thelantern.com/campus/hacked- data ... breach -costly-for-ohio-state-victims-of-compromised-info-1.1831311. [4] S. C. Lee and C. Shields, “Tracing the Source of Network Attack: A Technical

  11. Comparative analysis of clustering methods for gene expression time course data

    Directory of Open Access Journals (Sweden)

    Ivan G. Costa

    2004-01-01

    Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.

  12. Bladder cancer staging in CT urography: effect of stage labels on statistical modeling of a decision support system

    Science.gov (United States)

    Gandikota, Dhanuj; Hadjiiski, Lubomir; Cha, Kenny H.; Chan, Heang-Ping; Caoili, Elaine M.; Cohan, Richard H.; Weizer, Alon; Alva, Ajjai; Paramagul, Chintana; Wei, Jun; Zhou, Chuan

    2018-02-01

    In bladder cancer, stage T2 is an important threshold in the decision of administering neoadjuvant chemotherapy. Our long-term goal is to develop a quantitative computerized decision support system (CDSS-S) to aid clinicians in accurate staging. In this study, we examined the effect of stage labels of the training samples on modeling such a system. We used a data set of 84 bladder cancers imaged with CT Urography (CTU). At clinical staging prior to treatment, 43 lesions were staged as below stage T2 and 41 were stage T2 or above. After cystectomy and pathological staging that is considered the gold standard, 10 of the lesions were upstaged to stage T2 or above. After correcting the stage labels, 33 lesions were below stage T2, and 51 were stage T2 or above. For the CDSS-S, the lesions were segmented using our AI-CALS method and radiomic features were extracted. We trained a linear discriminant analysis (LDA) classifier with leave-one-case-out cross validation to distinguish between bladder lesions of stage T2 or above and those below stage T2. The CDSS-S was trained and tested with the corrected post-cystectomy labels, and as a comparison, CDSS-S was also trained with understaged pre-treatment labels and tested on lesions with corrected labels. The test AUC for the CDSS-S trained with corrected labels was 0.89 +/- 0.04. For the CDSS-S trained with understaged pre-treatment labels and tested on the lesions with corrected labels, the test AUC was 0.86 +/- 0.04. The likelihood of stage T2 or above for 9 out of the 10 understaged lesions was correctly increased for the CDSS-S trained with corrected labels. The CDSS-S is sensitive to the accuracy of stage labeling. The CDSS-S trained with correct labels shows promise in prediction of the bladder cancer stage.

  13. A Cross-Validation Study of the Other Customers Perceptions Scale in the Context of Sport and Fitness Centres. [Un estudio de validación cruzada sobre la escala de percepción de otros consumidores en el contexto de centros deportivos y de fitness].

    Directory of Open Access Journals (Sweden)

    Nicholas D Theodorakis

    2014-01-01

    Full Text Available This study aimed to extent the use of the Other Customer Perception (OCP scale by testing its psychometric properties and its generalizability in the context of sport and fitness centres. 360 members of three fitness clubs in Greece participated in the study. They were randomly divided into two subsamples (a calibration and a validation sample. Using Confirmatory Factor Analysis and composite reliability estimates the construct validity of OCP was supported. A cross-validation approach using invariance testing procedures across the two samples further supported the validity and generalizability of OCP in sport and fitness settings. OCP was found to be a reliable and valid scale for assessing the role of other customers in the service experience. Resumen Esta investigación ha pretendido extender el uso de la escala de percepción de otros consumidores (OCP por medio de la evaluación de sus propiedades psicométricas y su generalización en el contexto de centros deportivos y de fitness. La muestra la compusieron 360 miembros de tres clubes de fitness en Grecia, los cuales fueron divididos en dos submuestras (calibración y validación, respectivamente. Tras la aplicación del análisis factorial confirmatorio y estimaciones de fiabilidad compuesta, los resultados indican la validez de constructo de la escala. Además, se ha realizado un análisis de invarianza para el estudio de validación cruzada, que ha apoyado la generalización de su validez en este contexto de estudio. Por tanto, esta escala es fiable y válida para evaluar el papel de los otros consumidores en la experiencia con el servicio.

  14. Parallelization of the ROOT Machine Learning Methods

    CERN Document Server

    Vakilipourtakalou, Pourya

    2016-01-01

    Today computation is an inseparable part of scientific research. Specially in Particle Physics when there is a classification problem like discrimination of Signals from Backgrounds originating from the collisions of particles. On the other hand, Monte Carlo simulations can be used in order to generate a known data set of Signals and Backgrounds based on theoretical physics. The aim of Machine Learning is to train some algorithms on known data set and then apply these trained algorithms to the unknown data sets. However, the most common framework for data analysis in Particle Physics is ROOT. In order to use Machine Learning methods, a Toolkit for Multivariate Data Analysis (TMVA) has been added to ROOT. The major consideration in this report is the parallelization of some TMVA methods, specially Cross-Validation and BDT.

  15. Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.

    Science.gov (United States)

    Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi

    2017-09-22

    DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.

  16. Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods

    Directory of Open Access Journals (Sweden)

    Kaiyang Qu

    2017-09-01

    Full Text Available DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF, is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.

  17. PREDICTING THE BOILING POINT OF PCDD/Fs BY THE QSPR METHOD BASED ON THE MOLECULAR DISTANCE-EDGE VECTOR INDEX

    Directory of Open Access Journals (Sweden)

    Long Jiao

    2015-05-01

    Full Text Available The quantitative structure property relationship (QSPR for the boiling point (Tb of polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs was investigated. The molecular distance-edge vector (MDEV index was used as the structural descriptor. The quantitative relationship between the MDEV index and Tb was modeled by using multivariate linear regression (MLR and artificial neural network (ANN, respectively. Leave-one-out cross validation and external validation were carried out to assess the prediction performance of the models developed. For the MLR method, the prediction root mean square relative error (RMSRE of leave-one-out cross validation and external validation was 1.77 and 1.23, respectively. For the ANN method, the prediction RMSRE of leave-one-out cross validation and external validation was 1.65 and 1.16, respectively. A quantitative relationship between the MDEV index and Tb of PCDD/Fs was demonstrated. Both MLR and ANN are practicable for modeling this relationship. The MLR model and ANN model developed can be used to predict the Tb of PCDD/Fs. Thus, the Tb of each PCDD/F was predicted by the developed models.

  18. Impact of Statistical Learning Methods on the Predictive Power of Multivariate Normal Tissue Complication Probability Models

    Energy Technology Data Exchange (ETDEWEB)

    Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van' t [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands)

    2012-03-15

    Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.

  19. Impact of Statistical Learning Methods on the Predictive Power of Multivariate Normal Tissue Complication Probability Models

    International Nuclear Information System (INIS)

    Xu Chengjian; Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van’t

    2012-01-01

    Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.

  20. Impact of statistical learning methods on the predictive power of multivariate normal tissue complication probability models.

    Science.gov (United States)

    Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A

    2012-03-15

    To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. Pseudo-cubic thin-plate type Spline method for analyzing experimental data

    Energy Technology Data Exchange (ETDEWEB)

    Crecy, F de

    1994-12-31

    A mathematical tool, using pseudo-cubic thin-plate type Spline, has been developed for analysis of experimental data points. The main purpose is to obtain, without any a priori given model, a mathematical predictor with related uncertainties, usable at any point in the multidimensional parameter space. The smoothing parameter is determined by a generalized cross validation method. The residual standard deviation obtained is significantly smaller than that of a least square regression. An example of use is given with critical heat flux data, showing a significant decrease of the conception criterion (minimum allowable value of the DNB ratio). (author) 4 figs., 1 tab., 7 refs.

  2. Pseudo-cubic thin-plate type Spline method for analyzing experimental data

    International Nuclear Information System (INIS)

    Crecy, F. de.

    1993-01-01

    A mathematical tool, using pseudo-cubic thin-plate type Spline, has been developed for analysis of experimental data points. The main purpose is to obtain, without any a priori given model, a mathematical predictor with related uncertainties, usable at any point in the multidimensional parameter space. The smoothing parameter is determined by a generalized cross validation method. The residual standard deviation obtained is significantly smaller than that of a least square regression. An example of use is given with critical heat flux data, showing a significant decrease of the conception criterion (minimum allowable value of the DNB ratio). (author) 4 figs., 1 tab., 7 refs

  3. Development of gait segmentation methods for wearable foot pressure sensors.

    Science.gov (United States)

    Crea, S; De Rossi, S M M; Donati, M; Reberšek, P; Novak, D; Vitiello, N; Lenzi, T; Podobnik, J; Munih, M; Carrozza, M C

    2012-01-01

    We present an automated segmentation method based on the analysis of plantar pressure signals recorded from two synchronized wireless foot insoles. Given the strict limits on computational power and power consumption typical of wearable electronic components, our aim is to investigate the capability of a Hidden Markov Model machine-learning method, to detect gait phases with different levels of complexity in the processing of the wearable pressure sensors signals. Therefore three different datasets are developed: raw voltage values, calibrated sensor signals and a calibrated estimation of total ground reaction force and position of the plantar center of pressure. The method is tested on a pool of 5 healthy subjects, through a leave-one-out cross validation. The results show high classification performances achieved using estimated biomechanical variables, being on average the 96%. Calibrated signals and raw voltage values show higher delays and dispersions in phase transition detection, suggesting a lower reliability for online applications.

  4. ChloroP, a neural network-based method for predicting chloroplast transitpeptides and their cleavage sites

    DEFF Research Database (Denmark)

    Emanuelsson, O.; Nielsen, Henrik; von Heijne, Gunnar

    1999-01-01

    the cleavage sites given in SWISS-PROT. An analysis of 715 Arabidopsis thaliana sequences from SWISS-PROT suggests that the ChloroP method should be useful for the identification of putative transit peptides in genome-wide sequence data. The ChloroP predictor is available as a web-server at http......We present a neural network based method (ChloroP) for identifying chloroplast transit peptides and their cleavage sites. Using cross-validation, 88% of the sequences in our homology reduced training set were correctly classified as transit peptides or nontransit peptides. This performance level...

  5. Empirical evaluation of data normalization methods for molecular classification.

    Science.gov (United States)

    Huang, Huei-Chung; Qin, Li-Xuan

    2018-01-01

    Data artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-marker molecular classifiers-an increasingly important application of microarrays in the era of personalized medicine. In this study, we set out to evaluate the performance of three commonly used methods for data normalization in the context of molecular classification, using extensive simulations based on re-sampling from a unique pair of microRNA microarray datasets for the same set of samples. The data and code for our simulations are freely available as R packages at GitHub. In the presence of confounding handling effects, all three normalization methods tended to improve the accuracy of the classifier when evaluated in an independent test data. The level of improvement and the relative performance among the normalization methods depended on the relative level of molecular signal, the distributional pattern of handling effects (e.g., location shift vs scale change), and the statistical method used for building the classifier. In addition, cross-validation was associated with biased estimation of classification accuracy in the over-optimistic direction for all three normalization methods. Normalization may improve the accuracy of molecular classification for data with confounding handling effects; however, it cannot circumvent the over-optimistic findings associated with cross-validation for assessing classification accuracy.

  6. A New Method for Optimal Regularization Parameter Determination in the Inverse Problem of Load Identification

    Directory of Open Access Journals (Sweden)

    Wei Gao

    2016-01-01

    Full Text Available According to the regularization method in the inverse problem of load identification, a new method for determining the optimal regularization parameter is proposed. Firstly, quotient function (QF is defined by utilizing the regularization parameter as a variable based on the least squares solution of the minimization problem. Secondly, the quotient function method (QFM is proposed to select the optimal regularization parameter based on the quadratic programming theory. For employing the QFM, the characteristics of the values of QF with respect to the different regularization parameters are taken into consideration. Finally, numerical and experimental examples are utilized to validate the performance of the QFM. Furthermore, the Generalized Cross-Validation (GCV method and the L-curve method are taken as the comparison methods. The results indicate that the proposed QFM is adaptive to different measuring points, noise levels, and types of dynamic load.

  7. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.

    Science.gov (United States)

    Crossa, José; Pérez-Rodríguez, Paulino; Cuevas, Jaime; Montesinos-López, Osval; Jarquín, Diego; de Los Campos, Gustavo; Burgueño, Juan; González-Camacho, Juan M; Pérez-Elizalde, Sergio; Beyene, Yoseph; Dreisigacker, Susanne; Singh, Ravi; Zhang, Xuecai; Gowda, Manje; Roorkiwal, Manish; Rutkoski, Jessica; Varshney, Rajeev K

    2017-11-01

    Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. A Statistical Method of Identifying Interactions in Neuron–Glia Systems Based on Functional Multicell Ca2+ Imaging

    Science.gov (United States)

    Nakae, Ken; Ikegaya, Yuji; Ishikawa, Tomoe; Oba, Shigeyuki; Urakubo, Hidetoshi; Koyama, Masanori; Ishii, Shin

    2014-01-01

    Crosstalk between neurons and glia may constitute a significant part of information processing in the brain. We present a novel method of statistically identifying interactions in a neuron–glia network. We attempted to identify neuron–glia interactions from neuronal and glial activities via maximum-a-posteriori (MAP)-based parameter estimation by developing a generalized linear model (GLM) of a neuron–glia network. The interactions in our interest included functional connectivity and response functions. We evaluated the cross-validated likelihood of GLMs that resulted from the addition or removal of connections to confirm the existence of specific neuron-to-glia or glia-to-neuron connections. We only accepted addition or removal when the modification improved the cross-validated likelihood. We applied the method to a high-throughput, multicellular in vitro Ca2+ imaging dataset obtained from the CA3 region of a rat hippocampus, and then evaluated the reliability of connectivity estimates using a statistical test based on a surrogate method. Our findings based on the estimated connectivity were in good agreement with currently available physiological knowledge, suggesting our method can elucidate undiscovered functions of neuron–glia systems. PMID:25393874

  9. Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods.

    Science.gov (United States)

    Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming

    2014-12-01

    Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  10. A Comparison of Classification Methods for Telediagnosis of Parkinson’s Disease

    Directory of Open Access Journals (Sweden)

    Haydar Ozkan

    2016-03-01

    Full Text Available Parkinson’s disease (PD is a progressive and chronic nervous system disease that impairs the ability of speech, gait, and complex muscle-and-nerve actions. Early diagnosis of PD is quite important for alleviating the symptoms. Cost effective and convenient telemedicine technology helps to distinguish the patients with PD from healthy people using variations of dysphonia, gait or motor skills. In this study, a novel telemedicine technology was developed to detect PD remotely using dysphonia features. Feature transformation and several machine learning (ML methods with 2-, 5- and 10-fold cross-validations were implemented on the vocal features. It was observed that the combination of principal component analysis (PCA as a feature transformation (FT and k-nearest neighbor (k-NN as a classifier with 10-fold cross-validation has the best accuracy as 99.1%. All ML processes were applied to the prerecorded PD dataset using a newly created program named ParkDet 2.0. Additionally, the blind test interface was created on the ParkDet so that users could detect new patients with PD in future. Clinicians or medical technicians, without any knowledge of ML, will be able to use the blind test interface to detect PD at a clinic or remote location utilizing internet as a telemedicine application.

  11. Uncertainty management in stratigraphic well correlation and stratigraphic architectures: A training-based method

    Science.gov (United States)

    Edwards, Jonathan; Lallier, Florent; Caumon, Guillaume; Carpentier, Cédric

    2018-02-01

    We discuss the sampling and the volumetric impact of stratigraphic correlation uncertainties in basins and reservoirs. From an input set of wells, we evaluate the probability for two stratigraphic units to be associated using an analog stratigraphic model. In the presence of multiple wells, this method sequentially updates a stratigraphic column defining the stratigraphic layering for each possible set of realizations. The resulting correlations are then used to create stratigraphic grids in three dimensions. We apply this method on a set of synthetic wells sampling a forward stratigraphic model built with Dionisos. To perform cross-validation of the method, we introduce a distance comparing the relative geological time of two models for each geographic position, and we compare the models in terms of volumes. Results show the ability of the method to automatically generate stratigraphic correlation scenarios, and also highlight some challenges when sampling stratigraphic uncertainties from multiple wells.

  12. In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods.

    Science.gov (United States)

    Cheng, Feixiong; Shen, Jie; Yu, Yue; Li, Weihua; Liu, Guixia; Lee, Philip W; Tang, Yun

    2011-03-01

    There is an increasing need for the rapid safety assessment of chemicals by both industries and regulatory agencies throughout the world. In silico techniques are practical alternatives in the environmental hazard assessment. It is especially true to address the persistence, bioaccumulative and toxicity potentials of organic chemicals. Tetrahymena pyriformis toxicity is often used as a toxic endpoint. In this study, 1571 diverse unique chemicals were collected from the literature and composed of the largest diverse data set for T. pyriformis toxicity. Classification predictive models of T. pyriformis toxicity were developed by substructure pattern recognition and different machine learning methods, including support vector machine (SVM), C4.5 decision tree, k-nearest neighbors and random forest. The results of a 5-fold cross-validation showed that the SVM method performed better than other algorithms. The overall predictive accuracies of the SVM classification model with radial basis functions kernel was 92.2% for the 5-fold cross-validation and 92.6% for the external validation set, respectively. Furthermore, several representative substructure patterns for characterizing T. pyriformis toxicity were also identified via the information gain analysis methods. Copyright © 2010 Elsevier Ltd. All rights reserved.

  13. Improved pulmonary nodule classification utilizing quantitative lung parenchyma features.

    Science.gov (United States)

    Dilger, Samantha K N; Uthoff, Johanna; Judisch, Alexandra; Hammond, Emily; Mott, Sarah L; Smith, Brian J; Newell, John D; Hoffman, Eric A; Sieren, Jessica C

    2015-10-01

    Current computer-aided diagnosis (CAD) models for determining pulmonary nodule malignancy characterize nodule shape, density, and border in computed tomography (CT) data. Analyzing the lung parenchyma surrounding the nodule has been minimally explored. We hypothesize that improved nodule classification is achievable by including features quantified from the surrounding lung tissue. To explore this hypothesis, we have developed expanded quantitative CT feature extraction techniques, including volumetric Laws texture energy measures for the parenchyma and nodule, border descriptors using ray-casting and rubber-band straightening, histogram features characterizing densities, and global lung measurements. Using stepwise forward selection and leave-one-case-out cross-validation, a neural network was used for classification. When applied to 50 nodules (22 malignant and 28 benign) from high-resolution CT scans, 52 features (8 nodule, 39 parenchymal, and 5 global) were statistically significant. Nodule-only features yielded an area under the ROC curve of 0.918 (including nodule size) and 0.872 (excluding nodule size). Performance was improved through inclusion of parenchymal (0.938) and global features (0.932). These results show a trend toward increased performance when the parenchyma is included, coupled with the large number of significant parenchymal features that support our hypothesis: the pulmonary parenchyma is influenced differentially by malignant versus benign nodules, assisting CAD-based nodule characterizations.

  14. Accuracy Evaluation of C4.5 and Naive Bayes Classifiers Using Attribute Ranking Method

    Directory of Open Access Journals (Sweden)

    S. Sivakumari

    2009-03-01

    Full Text Available This paper intends to classify the Ljubljana Breast Cancer dataset using C4.5 Decision Tree and Nai?ve Bayes classifiers. In this work, classification is carriedout using two methods. In the first method, dataset is analysed using all the attributes in the dataset. In the second method, attributes are ranked using information gain ranking technique and only the high ranked attributes are used to build the classification model. We are evaluating the results of C4.5 Decision Tree and Nai?ve Bayes classifiers in terms of classifier accuracy for various folds of cross validation. Our results show that both the classifiers achieve good accuracy on the dataset.

  15. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method

    DEFF Research Database (Denmark)

    Nielsen, Morten; Lundegaard, Claus; Lund, Ole

    2007-01-01

    the correct alignment of a peptide in the binding groove a crucial part of identifying the core of an MHC class II binding motif. Here, we present a novel stabilization matrix alignment method, SMM-align, that allows for direct prediction of peptide:MHC binding affinities. The predictive performance...... of the method is validated on a large MHC class II benchmark data set covering 14 HLA-DR (human MHC) and three mouse H2-IA alleles. RESULTS: The predictive performance of the SMM-align method was demonstrated to be superior to that of the Gibbs sampler, TEPITOPE, SVRMHC, and MHCpred methods. Cross validation...... between peptide data set obtained from different sources demonstrated that direct incorporation of peptide length potentially results in over-fitting of the binding prediction method. Focusing on amino terminal peptide flanking residues (PFR), we demonstrate a consistent gain in predictive performance...

  16. A hybrid network-based method for the detection of disease-related genes

    Science.gov (United States)

    Cui, Ying; Cai, Meng; Dai, Yang; Stanley, H. Eugene

    2018-02-01

    Detecting disease-related genes is crucial in disease diagnosis and drug design. The accepted view is that neighbors of a disease-causing gene in a molecular network tend to cause the same or similar diseases, and network-based methods have been recently developed to identify novel hereditary disease-genes in available biomedical networks. Despite the steady increase in the discovery of disease-associated genes, there is still a large fraction of disease genes that remains under the tip of the iceberg. In this paper we exploit the topological properties of the protein-protein interaction (PPI) network to detect disease-related genes. We compute, analyze, and compare the topological properties of disease genes with non-disease genes in PPI networks. We also design an improved random forest classifier based on these network topological features, and a cross-validation test confirms that our method performs better than previous similar studies.

  17. Assessment of global and local region-based bilateral mammographic feature asymmetry to predict short-term breast cancer risk

    Science.gov (United States)

    Li, Yane; Fan, Ming; Cheng, Hu; Zhang, Peng; Zheng, Bin; Li, Lihua

    2018-01-01

    This study aims to develop and test a new imaging marker-based short-term breast cancer risk prediction model. An age-matched dataset of 566 screening mammography cases was used. All ‘prior’ images acquired in the two screening series were negative, while in the ‘current’ screening images, 283 cases were positive for cancer and 283 cases remained negative. For each case, two bilateral cranio-caudal view mammograms acquired from the ‘prior’ negative screenings were selected and processed by a computer-aided image processing scheme, which segmented the entire breast area into nine strip-based local regions, extracted the element regions using difference of Gaussian filters, and computed both global- and local-based bilateral asymmetrical image features. An initial feature pool included 190 features related to the spatial distribution and structural similarity of grayscale values, as well as of the magnitude and phase responses of multidirectional Gabor filters. Next, a short-term breast cancer risk prediction model based on a generalized linear model was built using an embedded stepwise regression analysis method to select features and a leave-one-case-out cross-validation method to predict the likelihood of each woman having image-detectable cancer in the next sequential mammography screening. The area under the receiver operating characteristic curve (AUC) values significantly increased from 0.5863  ±  0.0237 to 0.6870  ±  0.0220 when the model trained by the image features extracted from the global regions and by the features extracted from both the global and the matched local regions (p  =  0.0001). The odds ratio values monotonically increased from 1.00-8.11 with a significantly increasing trend in slope (p  =  0.0028) as the model-generated risk score increased. In addition, the AUC values were 0.6555  ±  0.0437, 0.6958  ±  0.0290, and 0.7054  ±  0.0529 for the three age groups of 37

  18. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.

    Science.gov (United States)

    Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H

    2017-07-01

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in

  19. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    Science.gov (United States)

    Choi, Ickwon; Chung, Amy W; Suscovich, Todd J; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J; Francis, Donald; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Alter, Galit; Ackerman, Margaret E; Bailey-Kellogg, Chris

    2015-04-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  20. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    Directory of Open Access Journals (Sweden)

    Ickwon Choi

    2015-04-01

    Full Text Available The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release. We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  1. Application of a General Computer Algorithm Based on the Group-Additivity Method for the Calculation of Two Molecular Descriptors at Both Ends of Dilution: Liquid Viscosity and Activity Coefficient in Water at Infinite Dilution

    Directory of Open Access Journals (Sweden)

    Rudolf Naef

    2017-12-01

    Full Text Available The application of a commonly used computer algorithm based on the group-additivity method for the calculation of the liquid viscosity coefficient at 293.15 K and the activity coefficient at infinite dilution in water at 298.15 K of organic molecules is presented. The method is based on the complete breakdown of the molecules into their constituting atoms, further subdividing them by their immediate neighborhood. A fast Gauss–Seidel fitting method using experimental data from literature is applied for the calculation of the atom groups’ contributions. Plausibility tests have been carried out on each of the calculations using a ten-fold cross-validation procedure which confirms the excellent predictive quality of the method. The goodness of fit (Q2 and the standard deviation (σ of the cross-validation calculations for the viscosity coefficient, expressed as log(η, was 0.9728 and 0.11, respectively, for 413 test molecules, and for the activity coefficient log(γ∞ the corresponding values were 0.9736 and 0.31, respectively, for 621 test compounds. The present approach has proven its versatility in that it enabled the simultaneous evaluation of the liquid viscosity of normal organic compounds as well as of ionic liquids.

  2. A Method of Particle Swarm Optimized SVM Hyper-spectral Remote Sensing Image Classification

    International Nuclear Information System (INIS)

    Liu, Q J; Jing, L H; Wang, L M; Lin, Q Z

    2014-01-01

    Support Vector Machine (SVM) has been proved to be suitable for classification of remote sensing image and proposed to overcome the Hughes phenomenon. Hyper-spectral sensors are intrinsically designed to discriminate among a broad range of land cover classes which may lead to high computational time in SVM mutil-class algorithms. Model selection for SVM involving kernel and the margin parameter values selection which is usually time-consuming, impacts training efficiency of SVM model and final classification accuracies of SVM hyper-spectral remote sensing image classifier greatly. Firstly, based on combinatorial optimization theory and cross-validation method, particle swarm algorithm is introduced to the optimal selection of SVM (PSSVM) kernel parameter σ and margin parameter C to improve the modelling efficiency of SVM model. Then an experiment of classifying AVIRIS in India Pine site of USA was performed for evaluating the novel PSSVM, as well as traditional SVM classifier with general Grid-Search cross-validation method (GSSVM). And then, evaluation indexes including SVM model training time, classification Overall Accuracy (OA) and Kappa index of both PSSVM and GSSVM are all analyzed quantitatively. It is demonstrated that OA of PSSVM on test samples and whole image are 85% and 82%, the differences with that of GSSVM are both within 0.08% respectively. And Kappa indexes reach 0.82 and 0.77, the differences with that of GSSVM are both within 0.001. While the modelling time of PSSVM can be only 1/10 of that of GSSVM, and the modelling. Therefore, PSSVM is an fast and accurate algorithm for hyper-spectral image classification and is superior to GSSVM

  3. The Naïve Overfitting Index Selection (NOIS): A new method to optimize model complexity for hyperspectral data

    Science.gov (United States)

    Rocha, Alby D.; Groen, Thomas A.; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Willemen, Louise

    2017-11-01

    The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that are only valid for the data used, and too complex to make inferences about the underlying process.

  4. Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

    Directory of Open Access Journals (Sweden)

    Mustafa Serter Uzer

    2013-01-01

    Full Text Available This paper offers a hybrid approach that uses the artificial bee colony (ABC algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.

  5. Breast cancer tumor classification using LASSO method selection approach

    International Nuclear Information System (INIS)

    Celaya P, J. M.; Ortiz M, J. A.; Martinez B, M. R.; Solis S, L. O.; Castaneda M, R.; Garza V, I.; Martinez F, M.; Ortiz R, J. M.

    2016-10-01

    Breast cancer is one of the leading causes of deaths worldwide among women. Early tumor detection is key in reducing breast cancer deaths and screening mammography is the widest available method for early detection. Mammography is the most common and effective breast cancer screening test. However, the rate of positive findings is very low, making the radiologic interpretation monotonous and biased toward errors. In an attempt to alleviate radiological workload, this work presents a computer-aided diagnosis (CAD x) method aimed to automatically classify tumor lesions into malign or benign as a means to a second opinion. The CAD x methos, extracts image features, and classifies the screening mammogram abnormality into one of two categories: subject at risk of having malignant tumor (malign), and healthy subject (benign). In this study, 143 abnormal segmentation s (57 malign and 86 benign) from the Breast Cancer Digital Repository (BCD R) public database were used to train and evaluate the CAD x system. Percentile-rank (p-rank) was used to standardize the data. Using the LASSO feature selection methodology, the model achieved a Leave-one-out-cross-validation area under the receiver operating characteristic curve (Auc) of 0.950. The proposed method has the potential to rank abnormal lesions with high probability of malignant findings aiding in the detection of potential malign cases as a second opinion to the radiologist. (Author)

  6. GLOBAL OPTIMIZATION METHODS FOR GRAVITATIONAL LENS SYSTEMS WITH REGULARIZED SOURCES

    International Nuclear Information System (INIS)

    Rogers, Adam; Fiege, Jason D.

    2012-01-01

    Several approaches exist to model gravitational lens systems. In this study, we apply global optimization methods to find the optimal set of lens parameters using a genetic algorithm. We treat the full optimization procedure as a two-step process: an analytical description of the source plane intensity distribution is used to find an initial approximation to the optimal lens parameters; the second stage of the optimization uses a pixelated source plane with the semilinear method to determine an optimal source. Regularization is handled by means of an iterative method and the generalized cross validation (GCV) and unbiased predictive risk estimator (UPRE) functions that are commonly used in standard image deconvolution problems. This approach simultaneously estimates the optimal regularization parameter and the number of degrees of freedom in the source. Using the GCV and UPRE functions, we are able to justify an estimation of the number of source degrees of freedom found in previous work. We test our approach by applying our code to a subset of the lens systems included in the SLACS survey.

  7. Breast cancer tumor classification using LASSO method selection approach

    Energy Technology Data Exchange (ETDEWEB)

    Celaya P, J. M.; Ortiz M, J. A.; Martinez B, M. R.; Solis S, L. O.; Castaneda M, R.; Garza V, I.; Martinez F, M.; Ortiz R, J. M., E-mail: morvymm@yahoo.com.mx [Universidad Autonoma de Zacatecas, Av. Ramon Lopez Velarde 801, Col. Centro, 98000 Zacatecas, Zac. (Mexico)

    2016-10-15

    Breast cancer is one of the leading causes of deaths worldwide among women. Early tumor detection is key in reducing breast cancer deaths and screening mammography is the widest available method for early detection. Mammography is the most common and effective breast cancer screening test. However, the rate of positive findings is very low, making the radiologic interpretation monotonous and biased toward errors. In an attempt to alleviate radiological workload, this work presents a computer-aided diagnosis (CAD x) method aimed to automatically classify tumor lesions into malign or benign as a means to a second opinion. The CAD x methos, extracts image features, and classifies the screening mammogram abnormality into one of two categories: subject at risk of having malignant tumor (malign), and healthy subject (benign). In this study, 143 abnormal segmentation s (57 malign and 86 benign) from the Breast Cancer Digital Repository (BCD R) public database were used to train and evaluate the CAD x system. Percentile-rank (p-rank) was used to standardize the data. Using the LASSO feature selection methodology, the model achieved a Leave-one-out-cross-validation area under the receiver operating characteristic curve (Auc) of 0.950. The proposed method has the potential to rank abnormal lesions with high probability of malignant findings aiding in the detection of potential malign cases as a second opinion to the radiologist. (Author)

  8. Estimation of Posterior Probabilities Using Multivariate Smoothing Splines and Generalized Cross-Validation.

    Science.gov (United States)

    1983-09-01

    Ciencia y Tecnologia -Mexico, by ONR under Contract No. N00014-77-C-0675, and by ARO under Contract No. DAAG29-80-K-0042. LUJ THE VIE~W, rTIJ. ’~v ’’~c...Department of Statis- tics. For financial support I thank the Consejo Nacional de Ciencia y Tecnologia - Mexico, and the Department of Statistics of the...from the context of the expression what they should be. The ia element (covariate) of an observations y will be denoted by ",(4) and all vectors will be

  9. Translation, cultural adaptation, cross-validation of the Turkish diabetes quality-of-life (DQOL) measure.

    Science.gov (United States)

    Yildirim, Aysegul; Akinci, Fevzi; Gozu, Hulya; Sargin, Haluk; Orbay, Ekrem; Sargin, Mehmet

    2007-06-01

    The aim of this study was to test the validity and reliability of the Turkish version of the diabetes quality of life (DQOL) questionnaire for use with patients with diabetes. Turkish version of the generic quality of life (QoL) scale 15D and DQOL, socio-demographics and clinical parameter characteristics were administered to 150 patients with type 2 diabetes. Study participants were randomly sampled from the Endocrinology and Diabetes Outpatient Department of Dr. Lutfi Kirdar Kartal Education and Research Hospital in Istanbul, Turkey. The Cronbach alpha coefficient of the overall DQOL scale was 0.89; the Cronbach alpha coefficient ranged from 0.80 to 0.94 for subscales. Distress, discomfort and its symptoms, depression, mobility, usual activities, and vitality on the 15 D scale had statistically significant correlations with social/vocational worry and diabetes-related worry on the DQOL scale indicating good convergent validity. Factor analysis identified four subscales: satisfaction", impact", "diabetes-related worry", and "social/vocational worry". Statistical analyses showed that the Turkish version of the DQOL is a valid and reliable instrument to measure disease related QoL in patients with diabetes. It is a simple and quick screening tool with about 15 +/- 5.8 min administration time for measuring QoL in this population.

  10. Cross validation of bi-modal health-related stress assessment

    NARCIS (Netherlands)

    van den Broek, Egon; van der Sluis, Frans; Dijkstra, Ton

    This study explores the feasibility of objective and ubiquitous stress assessment. 25 post-traumatic stress disorder patients participated in a controlled storytelling (ST) study and an ecologically valid reliving (RL) study. The two studies were meant to represent an early and a late therapy

  11. Cross-Validation of AFOQT Form S for Cyberspace Operations - Cyberspace Control

    Science.gov (United States)

    2016-04-20

    TECHNICIAN, ACADEMIC, VERBAL, and QUANTITATIVE. Thompson, Skinner , Gould, Alley, & Shore (2010) provided a full description of the subtest and the...Strategic Research and Assessment Branch. Thompson, N., Skinner , J., Gould, R. B., Alley, W., & Shore, W. (2010). Development of the Air Force

  12. Cross-validation of picture completion effort indices in personal injury litigants and disability claimants.

    Science.gov (United States)

    Davis, Jeremy J; McHugh, Tara S; Bagley, Amy D; Axelrod, Bradley N; Hanks, Robin A

    2011-12-01

    Picture Completion (PC) indices from the Wechsler Adult Intelligence Scale, Third Edition, were investigated as performance validity indicators (PVIs) in a sample referred for independent neuropsychological examination. Participants from an archival database were included in the study if they were between the ages of 18 and 65 and were administered at least two PVIs. Effort measure performance yielded groups that passed all or failed one measure (Pass; n= 95) and failed two or more PVIs (Fail-2; n= 61). The Pass group performed better on PC than the Fail-2 group. PC cut scores were compared in differentiating Pass and Fail-2 groups. PC raw score of ≤12 showed the best classification accuracy in this sample correctly classifying 91% of Pass and 41% of Fail-2 cases. Overall, PC indices show good specificity and low sensitivity for exclusive use as PVIs, demonstrating promise for use as adjunctive embedded measures.

  13. Cross-Validation of Levenson's Psychopathy Scale in a Sample of Federal Female Inmates

    Science.gov (United States)

    Brinkley, Chad A.; Diamond, Pamela M.; Magaletta, Philip R.; Heigel, Caron P.

    2008-01-01

    Levenson, Kiehl, and Fitzpatrick's Self-Report Psychopathy Scale (LSRPS) is evaluated to determine the factor structure and concurrent validity of the instrument among 430 federal female inmates. Confirmatory factor analysis fails to validate the expected 2-factor structure. Subsequent exploratory factor analysis reveals a 3-factor structure…

  14. A Cross-Validation Study of the School Attitude Assessment Survey (SAAS).

    Science.gov (United States)

    McCoach, D. Betsy

    Factors commonly associated with underachievement in the research literature include low self-concept, low self-motivation/self-regulation, negative attitude toward school, and negative peer influence. This study attempts to isolate these four factors within a secondary school population. The purpose of the study was to design a valid and reliable…

  15. Cross-Validation of the Emotion Awareness Questionnaire for Children in Three Populations

    Science.gov (United States)

    Lahaye, Magali; Mikolajczak, Moira; Rieffe, Carolien; Villanueva, Lidon; Van Broeck, Nady; Bodart, Eddy; Luminet, Olivier

    2011-01-01

    The main aim of the present study was to examine the cross-cultural equivalence of a newly developed questionnaire, the Emotion Awareness Questionnaire (EAQ30) that assesses emotional awareness of children through self-report. Participants were recruited in three countries: the Netherlands (N = 665), Spain (N = 464), and Belgium (N = 707),…

  16. PARAMETER SELECTION IN LEAST SQUARES-SUPPORT VECTOR MACHINES REGRESSION ORIENTED, USING GENERALIZED CROSS-VALIDATION

    Directory of Open Access Journals (Sweden)

    ANDRÉS M. ÁLVAREZ MEZA

    2012-01-01

    Full Text Available RESUMEN: En este trabajo, se propone una metodología para la selección automática de los parámetros libres de la técnica de regresión basada en mínimos cuadrados máquinas de vectores de soporte (LS-SVM, a partir de un análisis de validación cruzada generalizada multidimensional sobre el conjunto de ecuaciones lineales de LS-SVM. La técnica desarrollada no requiere de un conocimiento a priori por parte del usuario acerca de la influencia de los parámetros libres en los resultados. Se realizan experimentos sobre dos bases de datos artificiales y dos bases de datos reales. De acuerdo a los resultados obtenidos, se concluye que el algoritmo desarrollado calcula regresiones apropiadas con errores relativos competentes.

  17. Inter-hospital Cross-validation of Irregular Discharge Patterns for Young vs. Old Psychiatric Patients

    Science.gov (United States)

    Mozdzierz, Gerald J.; Davis, William E.

    1975-01-01

    Type of discharge (irregular vs. regular) and length of time hospitalized were used as unobtrusive measures of psychiatric patient acceptance of hospital treatment regime among two groups (18-27 years and 45 years and above) of patients. (Author)

  18. The Adolescent Religious Coping Scale: Development, Validation, and Cross-Validation

    Science.gov (United States)

    Bjorck, Jeffrey P.; Braese, Robert W.; Tadie, Joseph T.; Gililland, David D.

    2010-01-01

    Research literature on adolescent coping is growing, but typically such studies have ignored religious coping strategies and their potential impact on functioning. To address this lack, we developed the Adolescent Religious Coping Scale and used its seven subscales to examine the relationship between religious coping and emotional functioning. A…

  19. A cross-validation study of the TGMD-2: The case of an adolescent population.

    Science.gov (United States)

    Issartel, Johann; McGrane, Bronagh; Fletcher, Richard; O'Brien, Wesley; Powell, Danielle; Belton, Sarahjane

    2017-05-01

    This study proposes an extension of a widely used test evaluating fundamental movement skills proficiency to an adolescent population, with a specific emphasis on validity and reliability for this older age group. Cross-sectional observational study. A total of 844 participants (n=456 male, 12.03±0.49) participated in this study. The 12 fundamental movement skills of the TGMD-2 were assessed. Inter-rater reliability was examined to ensure a minimum of 95% consistency between coders. Confirmatory factor analysis was undertaken with a one-factor model (all 12 skills) and two-factor model (6 locomotor skills and 6 object-control skills) as proposed by Ulrich et al. (2000). The model fit was examined using χ 2 , TLI, CFI and RMSEA. Test-retest reliability was carried out with a subsample of 35 participants. The test-retest reliability reached Intraclass Correlation Coefficient of 0.78 (locomotor), 0.76 (object related) and 0.91 (gross motor skill proficiency). The confirmatory factor analysis did not display a good fit for either the one-factor or two-factor model due to a really low contribution of several skills. A reduction in the number of skills to just seven (run, gallop, hop, horizontal jump, bounce, kick and roll) revealed an overall good fit by TLI, CFI and RMSEA measures. The proposed new model offers the possibility of longitudinal studies to track the maturation of fundamental movement skills across the child and adolescent spectrum, while also giving researchers a valid assessment to tool to evaluate adolescent fundamental movement skills proficiency level. Copyright © 2016 Sports Medicine Australia. All rights reserved.

  20. Cross-Validation of the PAI Negative Distortion Scale for Feigned Mental Disorders: A Research Report

    Science.gov (United States)

    Rogers, Richard; Gillard, Nathan D.; Wooley, Chelsea N.; Kelsey, Katherine R.

    2013-01-01

    A major strength of the Personality Assessment Inventory (PAI) is its systematic assessment of response styles, including feigned mental disorders. Recently, Mogge, Lepage, Bell, and Ragatz developed and provided the initial validation for the Negative Distortion Scale (NDS). Using rare symptoms as its detection strategy for feigning, the…

  1. Cross-validation of independent ultra-low-frequency magnetic recording systems for active fault studies

    Science.gov (United States)

    Wang, Can; Bin, Chen; Christman, Lilianna E.; Glen, Jonathan M. G.; Klemperer, Simon L.; McPhee, Darcy K.; Kappler, Karl N.; Bleier, Tom E.; Dunson, J. Clark

    2018-04-01

    When working with ultra-low-frequency (ULF) magnetic datasets, as with most geophysical time-series data, it is important to be able to distinguish between cultural signals, internal instrument noise, and natural external signals with their induced telluric fields. This distinction is commonly attempted using simultaneously recorded data from a spatially remote reference site. Here, instead, we compared data recorded by two systems with different instrumental characteristics at the same location over the same time period. We collocated two independent ULF magnetic systems, one from the QuakeFinder network and the other from the United States Geological Survey (USGS)-Stanford network, in order to cross-compare their data, characterize data reproducibility, and characterize signal origin. In addition, we used simultaneous measurements at a remote geomagnetic observatory to distinguish global atmospheric signals from local cultural signals. We demonstrated that the QuakeFinder and USGS-Stanford systems have excellent coherence, despite their different sensors and digitizers. Rare instances of isolated signals recorded by only one system or only one sensor indicate that caution is needed when attributing specific recorded signal features to specific origins.[Figure not available: see fulltext.

  2. Repeated holdout Cross-Validation of Model to Estimate Risk of Lyme Disease by Landscape Attributes

    Science.gov (United States)

    We previously modeled Lyme disease (LD) risk at the landscape scale; here we evaluate the model's overall goodness-of-fit using holdout validation. Landscapes were characterized within road-bounded analysis units (AU). Observed LD cases (obsLD) were ascertained per AU. Data were ...

  3. Bi-national cross-validation of an evidence-based conduct problem prevention model.

    Science.gov (United States)

    Porta, Carolyn M; Bloomquist, Michael L; Garcia-Huidobro, Diego; Gutiérrez, Rafael; Vega, Leticia; Balch, Rosita; Yu, Xiaohui; Cooper, Daniel K

    2018-04-01

    To (a) explore the preferences of Mexican parents and Spanish-speaking professionals working with migrant Latino families in Minnesota regarding the Mexican-adapted brief model versus the original conduct problems intervention and (b) identifying the potential challenges, and preferred solutions, to implementation of a conduct problems preventive intervention. The core practice elements of a conduct problems prevention program originating in the United States were adapted for prevention efforts in Mexico. Three focus groups were conducted in the United States, with Latino parents (n = 24; 2 focus groups) and professionals serving Latino families (n = 9; 1 focus group), to compare and discuss the Mexican-adapted model and the original conduct problems prevention program. Thematic analysis was conducted on the verbatim focus group transcripts in the original language spoken. Participants preferred the Mexican-adapted model. The following key areas were identified for cultural adaptation when delivering a conduct problems prevention program with Latino families: recruitment/enrollment strategies, program delivery format, and program content (i.e., child skills training, parent skills training, child-parent activities, and child-parent support). For both models, strengths, concerns, barriers, and strategies for overcoming concerns and barriers were identified. We summarize recommendations offered by participants to strengthen the effective implementation of a conduct problems prevention model with Latino families in the United States. This project demonstrates the strength in binational collaboration to critically examine cultural adaptations of evidence-based prevention programs that could be useful to diverse communities, families, and youth in other settings. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  4. Cross-Validation of the Implementation Leadership Scale (ILS) in Child Welfare Service Organizations.

    Science.gov (United States)

    Finn, Natalie K; Torres, Elisa M; Ehrhart, Mark G; Roesch, Scott C; Aarons, Gregory A

    2016-08-01

    The Implementation Leadership Scale (ILS) is a brief, pragmatic, and efficient measure that can be used for research or organizational development to assess leader behaviors and actions that actively support effective implementation of evidence-based practices (EBPs). The ILS was originally validated with mental health clinicians. This study validates the ILS factor structure with providers in community-based organizations (CBOs) providing child welfare services. Participants were 214 service providers working in 12 CBOs that provide child welfare services. All participants completed the ILS, reporting on their immediate supervisor. Confirmatory factor analyses were conducted to examine the factor structure of the ILS. Internal consistency reliability and measurement invariance were also examined. Confirmatory factor analyses showed acceptable fit to the hypothesized first- and second-order factor structure. Internal consistency reliability was strong and there was partial measurement invariance for the first-order factor structure when comparing child welfare and mental health samples. The results support the use of the ILS to assess leadership for implementation of EBPs in child welfare organizations. © The Author(s) 2016.

  5. Cross-Validation of Numerical and Experimental Studies of Transitional Airfoil Performance

    DEFF Research Database (Denmark)

    Frere, Ariane; Hillewaert, Koen; Sarlak, Hamid

    2015-01-01

    The aerodynamic performance characteristic of airfoils are the main input for estimating wind turbine blade loading as well as annual energy production of wind farms. For transitional flow regimes these data are difficult to obtain, both experimentally as well as numerically, due to the very high...... sensitivity of the flow to perturbations, large scale separation and performance hysteresis. The objective of this work is to improve the understanding of the transitional airfoil flow performance by studying the S826 NREL airfoil at low Reynolds numbers (Re = 4:104 and 1:105) with two inherently different...

  6. An assessment of air pollutant exposure methods in Mexico City, Mexico.

    Science.gov (United States)

    Rivera-González, Luis O; Zhang, Zhenzhen; Sánchez, Brisa N; Zhang, Kai; Brown, Daniel G; Rojas-Bracho, Leonora; Osornio-Vargas, Alvaro; Vadillo-Ortega, Felipe; O'Neill, Marie S

    2015-05-01

    Geostatistical interpolation methods to estimate individual exposure to outdoor air pollutants can be used in pregnancy cohorts where personal exposure data are not collected. Our objectives were to a) develop four assessment methods (citywide average (CWA); nearest monitor (NM); inverse distance weighting (IDW); and ordinary Kriging (OK)), and b) compare daily metrics and cross-validations of interpolation models. We obtained 2008 hourly data from Mexico City's outdoor air monitoring network for PM10, PM2.5, O3, CO, NO2, and SO2 and constructed daily exposure metrics for 1,000 simulated individual locations across five populated geographic zones. Descriptive statistics from all methods were calculated for dry and wet seasons, and by zone. We also evaluated IDW and OK methods' ability to predict measured concentrations at monitors using cross validation and a coefficient of variation (COV). All methods were performed using SAS 9.3, except ordinary Kriging which was modeled using R's gstat package. Overall, mean concentrations and standard deviations were similar among the different methods for each pollutant. Correlations between methods were generally high (r=0.77 to 0.99). However, ranges of estimated concentrations determined by NM, IDW, and OK were wider than the ranges for CWA. Root mean square errors for OK were consistently equal to or lower than for the IDW method. OK standard errors varied considerably between pollutants and the computed COVs ranged from 0.46 (least error) for SO2 and PM10 to 3.91 (most error) for PM2.5. OK predicted concentrations measured at the monitors better than IDW and NM. Given the similarity in results for the exposure methods, OK is preferred because this method alone provides predicted standard errors which can be incorporated in statistical models. The daily estimated exposures calculated using these different exposure methods provide flexibility to evaluate multiple windows of exposure during pregnancy, not just trimester or

  7. Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours.

    Directory of Open Access Journals (Sweden)

    Monique A Ladds

    Full Text Available Constructing activity budgets for marine animals when they are at sea and cannot be directly observed is challenging, but recent advances in bio-logging technology offer solutions to this problem. Accelerometers can potentially identify a wide range of behaviours for animals based on unique patterns of acceleration. However, when analysing data derived from accelerometers, there are many statistical techniques available which when applied to different data sets produce different classification accuracies. We investigated a selection of supervised machine learning methods for interpreting behavioural data from captive otariids (fur seals and sea lions. We conducted controlled experiments with 12 seals, where their behaviours were filmed while they were wearing 3-axis accelerometers. From video we identified 26 behaviours that could be grouped into one of four categories (foraging, resting, travelling and grooming representing key behaviour states for wild seals. We used data from 10 seals to train four predictive classification models: stochastic gradient boosting (GBM, random forests, support vector machine using four different kernels and a baseline model: penalised logistic regression. We then took the best parameters from each model and cross-validated the results on the two seals unseen so far. We also investigated the influence of feature statistics (describing some characteristic of the seal, testing the models both with and without these. Cross-validation accuracies were lower than training accuracy, but the SVM with a polynomial kernel was still able to classify seal behaviour with high accuracy (>70%. Adding feature statistics improved accuracies across all models tested. Most categories of behaviour -resting, grooming and feeding-were all predicted with reasonable accuracy (52-81% by the SVM while travelling was poorly categorised (31-41%. These results show that model selection is important when classifying behaviour and that by using

  8. A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

    Directory of Open Access Journals (Sweden)

    Zekić-Sušac Marijana

    2014-09-01

    Full Text Available Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.

  9. Correlation- and covariance-supported normalization method for estimating orthodontic trainer treatment for clenching activity.

    Science.gov (United States)

    Akdenur, B; Okkesum, S; Kara, S; Günes, S

    2009-11-01

    In this study, electromyography signals sampled from children undergoing orthodontic treatment were used to estimate the effect of an orthodontic trainer on the anterior temporal muscle. A novel data normalization method, called the correlation- and covariance-supported normalization method (CCSNM), based on correlation and covariance between features in a data set, is proposed to provide predictive guidance to the orthodontic technique. The method was tested in two stages: first, data normalization using the CCSNM; second, prediction of normalized values of anterior temporal muscles using an artificial neural network (ANN) with a Levenberg-Marquardt learning algorithm. The data set consists of electromyography signals from right anterior temporal muscles, recorded from 20 children aged 8-13 years with class II malocclusion. The signals were recorded at the start and end of a 6-month treatment. In order to train and test the ANN, two-fold cross-validation was used. The CCSNM was compared with four normalization methods: minimum-maximum normalization, z score, decimal scaling, and line base normalization. In order to demonstrate the performance of the proposed method, prevalent performance-measuring methods, and the mean square error and mean absolute error as mathematical methods, the statistical relation factor R2 and the average deviation have been examined. The results show that the CCSNM was the best normalization method among other normalization methods for estimating the effect of the trainer.

  10. Machine Learning Methods for Prediction of CDK-Inhibitors

    Science.gov (United States)

    Ramana, Jayashree; Gupta, Dinesh

    2010-01-01

    Progression through the cell cycle involves the coordinated activities of a suite of cyclin/cyclin-dependent kinase (CDK) complexes. The activities of the complexes are regulated by CDK inhibitors (CDKIs). Apart from its role as cell cycle regulators, CDKIs are involved in apoptosis, transcriptional regulation, cell fate determination, cell migration and cytoskeletal dynamics. As the complexes perform crucial and diverse functions, these are important drug targets for tumour and stem cell therapeutic interventions. However, CDKIs are represented by proteins with considerable sequence heterogeneity and may fail to be identified by simple similarity search methods. In this work we have evaluated and developed machine learning methods for identification of CDKIs. We used different compositional features and evolutionary information in the form of PSSMs, from CDKIs and non-CDKIs for generating SVM and ANN classifiers. In the first stage, both the ANN and SVM models were evaluated using Leave-One-Out Cross-Validation and in the second stage these were tested on independent data sets. The PSSM-based SVM model emerged as the best classifier in both the stages and is publicly available through a user-friendly web interface at http://bioinfo.icgeb.res.in/cdkipred. PMID:20967128

  11. A method to determine the mammographic regions that show early changes due to the development of breast cancer

    Science.gov (United States)

    Karemore, Gopal; Nielsen, Mads; Karssemeijer, Nico; Brandt, Sami S.

    2014-11-01

    It is well understood nowadays that changes in the mammographic parenchymal pattern are an indicator of a risk of breast cancer and we have developed a statistical method that estimates the mammogram regions where the parenchymal changes, due to breast cancer, occur. This region of interest is computed from a score map by utilising the anatomical breast coordinate system developed in our previous work. The method also makes an automatic scale selection to avoid overfitting while the region estimates are computed by a nested cross-validation scheme. In this way, it is possible to recover those mammogram regions that show a significant difference in classification scores between the cancer and the control group. Our experiments suggested that the most significant mammogram region is the region behind the nipple and that can be justified by previous findings from other research groups. This result was conducted on the basis of the cross-validation experiments on independent training, validation and testing sets from the case-control study of 490 women, of which 245 women were diagnosed with breast cancer within a period of 2-4 years after the baseline mammograms. We additionally generalised the estimated region to another, mini-MIAS study and showed that the transferred region estimate gives at least a similar classification result when compared to the case where the whole breast region is used. In all, by following our method, one most likely improves both preclinical and follow-up breast cancer screening, but a larger study population will be required to test this hypothesis.

  12. A method to determine the mammographic regions that show early changes due to the development of breast cancer

    International Nuclear Information System (INIS)

    Karemore, Gopal; Nielsen, Mads; Brandt, Sami S; Karssemeijer, Nico

    2014-01-01

    It is well understood nowadays that changes in the mammographic parenchymal pattern are an indicator of a risk of breast cancer and we have developed a statistical method that estimates the mammogram regions where the parenchymal changes, due to breast cancer, occur. This region of interest is computed from a score map by utilising the anatomical breast coordinate system developed in our previous work. The method also makes an automatic scale selection to avoid overfitting while the region estimates are computed by a nested cross-validation scheme. In this way, it is possible to recover those mammogram regions that show a significant difference in classification scores between the cancer and the control group. Our experiments suggested that the most significant mammogram region is the region behind the nipple and that can be justified by previous findings from other research groups. This result was conducted on the basis of the cross-validation experiments on independent training, validation and testing sets from the case-control study of 490 women, of which 245 women were diagnosed with breast cancer within a period of 2–4 years after the baseline mammograms. We additionally generalised the estimated region to another, mini-MIAS study and showed that the transferred region estimate gives at least a similar classification result when compared to the case where the whole breast region is used. In all, by following our method, one most likely improves both preclinical and follow-up breast cancer screening, but a larger study population will be required to test this hypothesis. (paper)

  13. A quantitative structure- property relationship of gas chromatographic/mass spectrometric retention data of 85 volatile organic compounds as air pollutant materials by multivariate methods

    Directory of Open Access Journals (Sweden)

    Sarkhosh Maryam

    2012-05-01

    Full Text Available Abstract A quantitative structure-property relationship (QSPR study is suggested for the prediction of retention times of volatile organic compounds. Various kinds of molecular descriptors were calculated to represent the molecular structure of compounds. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR and artificial neural network (ANN. The stepwise regression was used for the selection of the variables which gives the best-fitted models. After variable selection ANN, MLR methods were used with leave-one-out cross validation for building the regression models. The prediction results are in very good agreement with the experimental values. MLR as the linear regression method shows good ability in the prediction of the retention times of the prediction set. This provided a new and effective method for predicting the chromatography retention index for the volatile organic compounds.

  14. Stochastic rainfall synthesis for urban applications using different regionalization methods

    Science.gov (United States)

    Callau Poduje, A. C.; Leimbach, S.; Haberlandt, U.

    2017-12-01

    The proper design and efficient operation of urban drainage systems require long and continuous rainfall series in a high temporal resolution. Unfortunately, these time series are usually available in a few locations and it is therefore suitable to develop a stochastic precipitation model to generate rainfall in locations without observations. The model presented is based on an alternating renewal process and involves an external and an internal structure. The members of these structures are described by probability distributions which are site specific. Different regionalization methods based on site descriptors are presented which are used for estimating the distributions for locations without observations. Regional frequency analysis, multiple linear regressions and a vine-copula method are applied for this purpose. An area located in the north-west of Germany is used to compare the different methods and involves a total of 81 stations with 5 min rainfall records. The site descriptors include information available for the whole region: position, topography and hydrometeorologic characteristics which are estimated from long term observations. The methods are compared directly by cross validation of different rainfall statistics. Given that the model is stochastic the evaluation is performed based on ensembles of many long synthetic time series which are compared with observed ones. The performance is as well indirectly evaluated by setting up a fictional urban hydrological system to test the capability of the different methods regarding flooding and overflow characteristics. The results show a good representation of the seasonal variability and good performance in reproducing the sample statistics of the rainfall characteristics. The copula based method shows to be the most robust of the three methods. Advantages and disadvantages of the different methods are presented and discussed.

  15. Assessment of forward head posture in females: observational and photogrammetry methods.

    Science.gov (United States)

    Salahzadeh, Zahra; Maroufi, Nader; Ahmadi, Amir; Behtash, Hamid; Razmjoo, Arash; Gohari, Mahmoud; Parnianpour, Mohamad

    2014-01-01

    There are different methods to assess forward head posture (FHP) but the accuracy and discrimination ability of these methods are not clear. Here, we want to compare three postural angles for FHP assessment and also study the discrimination accuracy of three photogrammetric methods to differentiate groups categorized based on observational method. All Seventy-eight healthy female participants (23 ± 2.63 years), were classified into three groups: moderate-severe FHP, slight FHP and non FHP based on observational postural assessment rules. Applying three photogrammetric methods - craniovertebral angle, head title angle and head position angle - to measure FHP objectively. One - way ANOVA test showed a significant difference in three categorized group's craniovertebral angle (P< 0.05, F=83.07). There was no dramatic difference in head tilt angle and head position angle methods in three groups. According to Linear Discriminate Analysis (LDA) results, the canonical discriminant function (Wilks'Lambda) was 0.311 for craniovertebral angle with 79.5% of cross-validated grouped cases correctly classified. Our results showed that, craniovertebral angle method may discriminate the females with moderate-severe and non FHP more accurate than head position angle and head tilt angle. The photogrammetric method had excellent inter and intra rater reliability to assess the head and cervical posture.

  16. QSAR Modeling of COX -2 Inhibitory Activity of Some Dihydropyridine and Hydroquinoline Derivatives Using Multiple Linear Regression (MLR) Method.

    Science.gov (United States)

    Akbari, Somaye; Zebardast, Tannaz; Zarghi, Afshin; Hajimahdi, Zahra

    2017-01-01

    COX-2 inhibitory activities of some 1,4-dihydropyridine and 5-oxo-1,4,5,6,7,8-hexahydroquinoline derivatives were modeled by quantitative structure-activity relationship (QSAR) using stepwise-multiple linear regression (SW-MLR) method. The built model was robust and predictive with correlation coefficient (R 2 ) of 0.972 and 0.531 for training and test groups, respectively. The quality of the model was evaluated by leave-one-out (LOO) cross validation (LOO correlation coefficient (Q 2 ) of 0.943) and Y-randomization. We also employed a leverage approach for the defining of applicability domain of model. Based on QSAR models results, COX-2 inhibitory activity of selected data set had correlation with BEHm6 (highest eigenvalue n. 6 of Burden matrix/weighted by atomic masses), Mor03u (signal 03/unweighted) and IVDE (Mean information content on the vertex degree equality) descriptors which derived from their structures.

  17. COMPUTER-BASED PREDICTION OF TOXICITY USING THE ELECTRON-CONFORMATIONAL METHOD. APPLICATION TO FRAGRANCE ALLERGENS AND OTHER ENVIRONMENTAL POLLUTANTS

    Directory of Open Access Journals (Sweden)

    Natalia N. Gorinchoy

    2012-06-01

    Full Text Available The electron-conformational (EC method is employed for the toxicophore (Tph identification and quantitative prediction of toxicity using the training set of 24 compounds that are considered as fragrance allergens. The values of a=LD50 in oral exposure of rats were chosen as a measure of toxicity. EC parameters are evaluated on the base of conformational analysis and ab initio electronic structure calculations (including solvent influence. The Tph consists of four sites which in this series of compounds are represented by three carbon and one oxygen atoms, but may be any other atoms that have the same electronic and geometric features within the tolerance limits. The regression model taking into consideration the Tph flexibility, anti-Tph shielding, and influence of out-of-Tph functional groups predicts well the experimental values of toxicity (R2 = 0.93 with a reasonable leaveone- out cross-validation.

  18. Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

    Directory of Open Access Journals (Sweden)

    Ferrández Oscar

    2012-07-01

    Full Text Available Abstract Background The increased use and adoption of Electronic Health Records (EHR causes a tremendous growth in digital information useful for clinicians, researchers and many other operational purposes. However, this information is rich in Protected Health Information (PHI, which severely restricts its access and possible uses. A number of investigators have developed methods for automatically de-identifying EHR documents by removing PHI, as specified in the Health Insurance Portability and Accountability Act “Safe Harbor” method. This study focuses on the evaluation of existing automated text de-identification methods and tools, as applied to Veterans Health Administration (VHA clinical documents, to assess which methods perform better with each category of PHI found in our clinical notes; and when new methods are needed to improve performance. Methods We installed and evaluated five text de-identification systems “out-of-the-box” using a corpus of VHA clinical documents. The systems based on machine learning methods were trained with the 2006 i2b2 de-identification corpora and evaluated with our VHA corpus, and also evaluated with a ten-fold cross-validation experiment using our VHA corpus. We counted exact, partial, and fully contained matches with reference annotations, considering each PHI type separately, or only one unique ‘PHI’ category. Performance of the systems was assessed using recall (equivalent to sensitivity and precision (equivalent to positive predictive value metrics, as well as the F2-measure. Results Overall, systems based on rules and pattern matching achieved better recall, and precision was always better with systems based on machine learning approaches. The highest “out-of-the-box” F2-measure was 67% for partial matches; the best precision and recall were 95% and 78%, respectively. Finally, the ten-fold cross validation experiment allowed for an increase of the F2-measure to 79% with partial matches

  19. Comparison of Flood Frequency Analysis Methods for Ungauged Catchments in France

    Directory of Open Access Journals (Sweden)

    Jean Odry

    2017-09-01

    Full Text Available The objective of flood frequency analysis (FFA is to associate flood intensity with a probability of exceedance. Many methods are currently employed for this, ranging from statistical distribution fitting to simulation approaches. In many cases the site of interest is actually ungauged, and a regionalisation scheme has to be associated with the FFA method, leading to a multiplication of the number of possible methods available. This paper presents the results of a wide-range comparison of FFA methods from statistical and simulation families associated with different regionalisation schemes based on regression, or spatial or physical proximity. The methods are applied to a set of 1535 French catchments, and a k-fold cross-validation procedure is used to consider the ungauged configuration. The results suggest that FFA from the statistical family largely relies on the regionalisation step, whereas the simulation-based method is more stable regarding regionalisation. This conclusion emphasises the difficulty of the regionalisation process. The results are also contrasted depending on the type of climate: the Mediterranean catchments tend to aggravate the differences between the methods.

  20. A large-scale benchmark of gene prioritization methods.

    Science.gov (United States)

    Guala, Dimitri; Sonnhammer, Erik L L

    2017-04-21

    In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.

  1. Statistical methods for transverse beam position diagnostics with higher order modes in third harmonic 3.9 GHz superconducting accelerating cavities at FLASH

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Pei, E-mail: pei.zhang@desy.de [School of Physics and Astronomy, The University of Manchester, Oxford Road, Manchester M13 9PL (United Kingdom); Deutsches Elektronen-Synchrotron (DESY), Notkestraße 85, D-22607 Hamburg (Germany); Cockcroft Institute of Science and Technology, Daresbury WA4 4AD (United Kingdom); Baboi, Nicoleta [Deutsches Elektronen-Synchrotron (DESY), Notkestraße 85, D-22607 Hamburg (Germany); Jones, Roger M. [School of Physics and Astronomy, The University of Manchester, Oxford Road, Manchester M13 9PL (United Kingdom); Cockcroft Institute of Science and Technology, Daresbury WA4 4AD (United Kingdom)

    2014-01-11

    Beam-excited higher order modes (HOMs) can be used to provide beam diagnostics. Here we focus on 3.9 GHz superconducting accelerating cavities. In particular we study dipole mode excitation and its application to beam position determinations. In order to extract beam position information, linear regression can be used. Due to a large number of sampling points in the waveforms, statistical methods are used to effectively reduce the dimension of the system, such as singular value decomposition (SVD) and k-means clustering. These are compared with the direct linear regression (DLR) on the entire waveforms. A cross-validation technique is used to study the sample independent precisions of the position predictions given by these three methods. A RMS prediction error in the beam position of approximately 50 μm can be achieved by DLR and SVD, while k-means clustering suggests 70 μm.

  2. Statistical methods for transverse beam position diagnostics with higher order modes in third harmonic 3.9 GHz superconducting accelerating cavities at FLASH

    CERN Document Server

    Zhang, P; Jones, R M

    2014-01-01

    Beam-excited higher order modes (HOM) can be used to provide beam diagnostics. Here we focus on 3.9 GHz superconducting accelerating cavities. In particular we study dipole mode excitation and its application to beam position determinations. In order to extract beam position information, linear regression can be used. Due to a large number of sampling points in the waveforms, statistical methods are used to effectively reduce the dimension of the system, such as singular value decomposition (SVD) and k-means clustering. These are compared with the direct linear regression (DLR) on the entire waveforms. A cross-validation technique is used to study the sample independent precisions of the position predictions given by these three methods. A RMS prediction error in the beam position of approximately 50 micron can be achieved by DLR and SVD, while k-means clustering suggests 70 micron.

  3. Day ahead price forecasting of electricity markets by a mixed data model and hybrid forecast method

    International Nuclear Information System (INIS)

    Amjady, Nima; Keynia, Farshid

    2008-01-01

    In a competitive electricity market, forecast of energy prices is a key information for the market participants. However, price signal usually has a complex behavior due to its nonlinearity, nonstationarity, and time variancy. In spite of all performed researches on this area in the recent years, there is still an essential need for more accurate and robust price forecast methods. In this paper, a combination of wavelet transform (WT) and a hybrid forecast method is proposed for this purpose. The hybrid method is composed of cascaded forecasters where each forecaster consists of a neural network (NN) and an evolutionary algorithms (EA). Both time domain and wavelet domain features are considered in a mixed data model for price forecast, in which the candidate input variables are refined by a feature selection technique. The adjustable parameters of the whole method are fine-tuned by a cross-validation technique. The proposed method is examined on PJM electricity market and compared with some of the most recent price forecast methods. (author)

  4. A method to screen obstructive sleep apnea using multi-variable non-intrusive measurements

    International Nuclear Information System (INIS)

    De Silva, S; Abeyratne, U R; Hukins, C

    2011-01-01

    Obstructive sleep apnea (OSA) is a serious sleep disorder. The current standard OSA diagnosis method is polysomnography (PSG) testing. PSG requires an overnight hospital stay while physically connected to 10–15 channels of measurement. PSG is expensive, inconvenient and requires the extensive involvement of a sleep technologist. As such, it is not suitable for community screening. OSA is a widespread disease and more than 80% of sufferers remain undiagnosed. Simplified, unattended and cheap OSA screening methods are urgently needed. Snoring is commonly associated with OSA but is not fully utilized in clinical diagnosis. Snoring contains pseudo-periodic packets of energy that produce characteristic vibrating sounds familiar to humans. In this paper, we propose a multi-feature vector that represents pitch information, formant information, a measure of periodic structure existence in snore episodes and the neck circumference of the subject to characterize OSA condition. Snore features were estimated from snore signals recorded in a sleep laboratory. The multi-feature vector was applied to a neural network for OSA/non-OSA classification and K-fold cross-validated using a random sub-sampling technique. We also propose a simple method to remove a specific class of background interference. Our method resulted in a sensitivity of 91 ± 6% and a specificity of 89 ± 5% for test data for AHI THRESHOLD = 15 for a database consisting of 51 subjects. This method has the potential as a non-intrusive, unattended technique to screen OSA using snore sound as the primary signal

  5. Diagnostic Method of Diabetes Based on Support Vector Machine and Tongue Images

    Directory of Open Access Journals (Sweden)

    Jianfeng Zhang

    2017-01-01

    Full Text Available Objective. The purpose of this research is to develop a diagnostic method of diabetes based on standardized tongue image using support vector machine (SVM. Methods. Tongue images of 296 diabetic subjects and 531 nondiabetic subjects were collected by the TDA-1 digital tongue instrument. Tongue body and tongue coating were separated by the division-merging method and chrominance-threshold method. With extracted color and texture features of the tongue image as input variables, the diagnostic model of diabetes with SVM was trained. After optimizing the combination of SVM kernel parameters and input variables, the influences of the combinations on the model were analyzed. Results. After normalizing parameters of tongue images, the accuracy rate of diabetes predication was increased from 77.83% to 78.77%. The accuracy rate and area under curve (AUC were not reduced after reducing the dimensions of tongue features with principal component analysis (PCA, while substantially saving the training time. During the training for selecting SVM parameters by genetic algorithm (GA, the accuracy rate of cross-validation was grown from 72% or so to 83.06%. Finally, we compare with several state-of-the-art algorithms, and experimental results show that our algorithm has the best predictive accuracy. Conclusions. The diagnostic method of diabetes on the basis of tongue images in Traditional Chinese Medicine (TCM is of great value, indicating the feasibility of digitalized tongue diagnosis.

  6. A Meta-Path-Based Prediction Method for Human miRNA-Target Association

    Directory of Open Access Journals (Sweden)

    Jiawei Luo

    2016-01-01

    Full Text Available MicroRNAs (miRNAs are short noncoding RNAs that play important roles in regulating gene expressing, and the perturbed miRNAs are often associated with development and tumorigenesis as they have effects on their target mRNA. Predicting potential miRNA-target associations from multiple types of genomic data is a considerable problem in the bioinformatics research. However, most of the existing methods did not fully use the experimentally validated miRNA-mRNA interactions. Here, we developed RMLM and RMLMSe to predict the relationship between miRNAs and their targets. RMLM and RMLMSe are global approaches as they can reconstruct the missing associations for all the miRNA-target simultaneously and RMLMSe demonstrates that the integration of sequence information can improve the performance of RMLM. In RMLM, we use RM measure to evaluate different relatedness between miRNA and its target based on different meta-paths; logistic regression and MLE method are employed to estimate the weight of different meta-paths. In RMLMSe, sequence information is utilized to improve the performance of RMLM. Here, we carry on fivefold cross validation and pathway enrichment analysis to prove the performance of our methods. The fivefold experiments show that our methods have higher AUC scores compared with other methods and the integration of sequence information can improve the performance of miRNA-target association prediction.

  7. BENCHMARK OF MACHINE LEARNING METHODS FOR CLASSIFICATION OF A SENTINEL-2 IMAGE

    Directory of Open Access Journals (Sweden)

    F. Pirotti

    2016-06-01

    Full Text Available Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels for testing and validating subsets. The classes used are the following: (i urban (ii sowable areas (iii water (iv tree plantations (v grasslands. Validation is carried out using three different approaches: (i using pixels from the training dataset (train, (ii using pixels from the training dataset and applying cross-validation with the k-fold method (kfold and (iii using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train, the whole control dataset (full and with k-fold cross-validation (kfold with ten folds. Results from validation of predictions of the whole dataset (full show the

  8. Bladder cancer treatment response assessment with radiomic, clinical, and radiologist semantic features

    Science.gov (United States)

    Gordon, Marshall N.; Cha, Kenny H.; Hadjiiski, Lubomir M.; Chan, Heang-Ping; Cohan, Richard H.; Caoili, Elaine M.; Paramagul, Chintana; Alva, Ajjai; Weizer, Alon Z.

    2018-02-01

    We are developing a decision support system for assisting clinicians in assessment of response to neoadjuvant chemotherapy for bladder cancer. Accurate treatment response assessment is crucial for identifying responders and improving quality of life for non-responders. An objective machine learning decision support system may help reduce variability and inaccuracy in treatment response assessment. We developed a predictive model to assess the likelihood that a patient will respond based on image and clinical features. With IRB approval, we retrospectively collected a data set of pre- and post- treatment CT scans along with clinical information from surgical pathology from 98 patients. A linear discriminant analysis (LDA) classifier was used to predict the likelihood that a patient would respond to treatment based on radiomic features extracted from CT urography (CTU), a radiologist's semantic feature, and a clinical feature extracted from surgical and pathology reports. The classification accuracy was evaluated using the area under the ROC curve (AUC) with a leave-one-case-out cross validation. The classification accuracy was compared for the systems based on radiomic features, clinical feature, and radiologist's semantic feature. For the system based on only radiomic features the AUC was 0.75. With the addition of clinical information from examination under anesthesia (EUA) the AUC was improved to 0.78. Our study demonstrated the potential of designing a decision support system to assist in treatment response assessment. The combination of clinical features, radiologist semantic features and CTU radiomic features improved the performance of the classifier and the accuracy of treatment response assessment.

  9. Perturbation methods

    CERN Document Server

    Nayfeh, Ali H

    2008-01-01

    1. Introduction 1 2. Straightforward Expansions and Sources of Nonuniformity 23 3. The Method of Strained Coordinates 56 4. The Methods of Matched and Composite Asymptotic Expansions 110 5. Variation of Parameters and Methods of Averaging 159 6. The Method of Multiple Scales 228 7. Asymptotic Solutions of Linear Equations 308 References and Author Index 387 Subject Index 417

  10. A novel method for in silico identification of regulatory SNPs in human genome.

    Science.gov (United States)

    Li, Rong; Zhong, Dexing; Liu, Ruiling; Lv, Hongqiang; Zhang, Xinman; Liu, Jun; Han, Jiuqiang

    2017-02-21

    Regulatory single nucleotide polymorphisms (rSNPs), kind of functional noncoding genetic variants, can affect gene expression in a regulatory way, and they are thought to be associated with increased susceptibilities to complex diseases. Here a novel computational approach to identify potential rSNPs is presented. Different from most other rSNPs finding methods which based on hypothesis that SNPs causing large allele-specific changes in transcription factor binding affinities are more likely to play regulatory functions, we use a set of documented experimentally verified rSNPs and nonfunctional background SNPs to train classifiers, so the discriminating features are found. To characterize variants, an extensive range of characteristics, such as sequence context, DNA structure and evolutionary conservation etc. are analyzed. Support vector machine is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that our method can achieve accuracy with sensitivity of ~78% and specificity of ~82%. Furthermore, our method performances better than some other algorithms based on aforementioned hypothesis in handling false positives. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnppredict/. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Discrimination of Rice with Different Pretreatment Methods by Using a Voltammetric Electronic Tongue

    Directory of Open Access Journals (Sweden)

    Li Wang

    2015-07-01

    Full Text Available In this study, an application of a voltammetric electronic tongue for discrimination and prediction of different varieties of rice was investigated. Different pretreatment methods were selected, which were subsequently used for the discrimination of different varieties of rice and prediction of unknown rice samples. To this aim, a voltammetric array of sensors based on metallic electrodes was used as the sensing part. The different samples were analyzed by cyclic voltammetry with two sample-pretreatment methods. Discriminant Factorial Analysis was used to visualize the different categories of rice samples; however, radial basis function (RBF artificial neural network with leave-one-out cross-validation method was employed for prediction modeling. The collected signal data were first compressed employing fast Fourier transform (FFT and then significant features were extracted from the voltammetric signals. The experimental results indicated that the sample solutions obtained by the non-crushed pretreatment method could efficiently meet the effect of discrimination and recognition. The satisfactory prediction results of voltammetric electronic tongue based on RBF artificial neural network were obtained with less than five-fold dilution of the sample solution. The main objective of this study was to develop primary research on the application of an electronic tongue system for the discrimination and prediction of solid foods and provide an objective assessment tool for the food industry.

  12. The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods.

    Science.gov (United States)

    Görgen, Kai; Hebart, Martin N; Allefeld, Carsten; Haynes, John-Dylan

    2017-12-27

    Standard neuroimaging data analysis based on traditional principles of experimental design, modelling, and statistical inference is increasingly complemented by novel analysis methods, driven e.g. by machine learning methods. While these novel approaches provide new insights into neuroimaging data, they often have unexpected properties, generating a growing literature on possible pitfalls. We propose to meet this challenge by adopting a habit of systematic testing of experimental design, analysis procedures, and statistical inference. Specifically, we suggest to apply the analysis method used for experimental data also to aspects of the experimental design, simulated confounds, simulated null data, and control data. We stress the importance of keeping the analysis method the same in main and test analyses, because only this way possible confounds and unexpected properties can be reliably detected and avoided. We describe and discuss this Same Analysis Approach in detail, and demonstrate it in two worked examples using multivariate decoding. With these examples, we reveal two sources of error: A mismatch between counterbalancing (crossover designs) and cross-validation which leads to systematic below-chance accuracies, and linear decoding of a nonlinear effect, a difference in variance. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. The Integrative Method Based on the Module-Network for Identifying Driver Genes in Cancer Subtypes

    Directory of Open Access Journals (Sweden)

    Xinguo Lu

    2018-01-01

    Full Text Available With advances in next-generation sequencing(NGS technologies, a large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data. Breast cancer is known as a heterogeneous disease. The identification of subtype-specific driver genes is critical to guide the diagnosis, assessment of prognosis and treatment of breast cancer. We developed an integrated frame based on gene expression profiles and copy number variation (CNV data to identify breast cancer subtype-specific driver genes. In this frame, we employed statistical machine-learning method to select gene subsets and utilized an module-network analysis method to identify potential candidate driver genes. The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes. To validate specificity of the driver genes, the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes. The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods.

  14. Geostatistical methods in evaluating spatial variability of groundwater quality in Al-Kharj Region, Saudi Arabia

    Science.gov (United States)

    Al-Omran, Abdulrasoul M.; Aly, Anwar A.; Al-Wabel, Mohammad I.; Al-Shayaa, Mohammad S.; Sallam, Abdulazeam S.; Nadeem, Mahmoud E.

    2017-11-01

    The analyses of 180 groundwater samples of Al-Kharj, Saudi Arabia, recorded that most groundwaters are unsuitable for drinking uses due to high salinity; however, they can be used for irrigation with some restriction. The electric conductivity of studied groundwater ranged between 1.05 and 10.15 dS m-1 with an average of 3.0 dS m-1. Nitrate was also found in high concentration in some groundwater. Piper diagrams revealed that the majority of water samples are magnesium-calcium/sulfate-chloride water type. The Gibbs's diagram revealed that the chemical weathering of rock-forming minerals and evaporation are influencing the groundwater chemistry. A kriging method was used for predicting spatial distribution of salinity (EC dS m-1) and NO3 - (mg L-1) in Al-Kharj's groundwater using data of 180 different locations. After normalization of data, variogram was drawn, for selecting suitable model for fitness on experimental variogram, less residual sum of squares value was used. Then cross-validation and root mean square error were used to select the best method for interpolation. The kriging method was found suitable methods for groundwater interpolation and management using either GS+ or ArcGIS.

  15. Distillation methods

    International Nuclear Information System (INIS)

    Konecny, C.

    1975-01-01

    Two main methods of separation using the distillation method are given and evaluated, namely evaporation and distillation in carrier gas flow. Two basic apparatus are described for illustrating the methods used. The use of the distillation method in radiochemistry is documented by a number of examples of the separation of elements in elemental state, volatile halogenides and oxides. Tables give a survey of distillation methods used for the separation of the individual elements and give conditions under which this separation takes place. The suitability of the use of distillation methods in radiochemistry is discussed with regard to other separation methods. (L.K.)

  16. galerkin's methods

    African Journals Online (AJOL)

    user

    The assumed deflection shapes used in the approximate methods such as in the Galerkin's method were normally ... to direct compressive forces Nx, was derived by Navier. [3]. ..... tend to give higher frequency and stiffness, as well as.

  17. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    Science.gov (United States)

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  18. AO–MW–PLS method applied to rapid quantification of teicoplanin with near-infrared spectroscopy

    Directory of Open Access Journals (Sweden)

    Jiemei Chen

    2017-01-01

    Full Text Available Teicoplanin (TCP is an important lipoglycopeptide antibiotic produced by fermenting Actinoplanes teichomyceticus. The change in TCP concentration is important to measure in the fermentation process. In this study, a reagent-free and rapid quantification method for TCP in the TCP–Tris–HCl mixture samples was developed using near-infrared (NIR spectroscopy by focusing our attention on the fermentation process for TCP. The absorbance optimization (AO partial least squares (PLS was proposed and integrated with the moving window (MW PLS, which is called AO–MW–PLS method, to select appropriate wavebands. A model set that includes various wavebands that were equivalent to the optimal AO–MW–PLS waveband was proposed based on statistical considerations. The public region of all equivalent wavebands was just one of the equivalent wavebands. The obtained public regions were 1540–1868nm for TCP and 1114–1310nm for Tris. The root-mean-square error and correlation coefficient for leave-one-out cross validation were 0.046mg mL−1 and 0.9998mg mL−1 for TCP, and 0.235mg mL−1 and 0.9986mg mL−1 for Tris, respectively. All the models achieved highly accurate prediction effects, and the selected wavebands provided valuable references for designing specialized spectrometers. This study provided a valuable reference for further application of the proposed methods to TCP fermentation broth and to other spectroscopic analysis fields.

  19. A new cascade NN based method to short-term load forecast in deregulated electricity market

    International Nuclear Information System (INIS)

    Kouhi, Sajjad; Keynia, Farshid

    2013-01-01

    Highlights: • We are proposed a new hybrid cascaded NN based method and WT to short-term load forecast in deregulated electricity market. • An efficient preprocessor consist of normalization and shuffling of signals is presented. • In order to select the best inputs, a two-stage feature selection is presented. • A new cascaded structure consist of three cascaded NNs is used as forecaster. - Abstract: Short-term load forecasting (STLF) is a major discussion in efficient operation of power systems. The electricity load is a nonlinear signal with time dependent behavior. The area of electricity load forecasting has still essential need for more accurate and stable load forecast algorithm. To improve the accuracy of prediction, a new hybrid forecast strategy based on cascaded neural network is proposed for STLF. This method is consists of wavelet transform, an intelligent two-stage feature selection, and cascaded neural network. The feature selection is used to remove the irrelevant and redundant inputs. The forecast engine is composed of three cascaded neural network (CNN) structure. This cascaded structure can be efficiently extract input/output mapping function of the nonlinear electricity load data. Adjustable parameters of the intelligent feature selection and CNN is fine-tuned by a kind of cross-validation technique. The proposed STLF is tested on PJM and New York electricity markets. It is concluded from the result, the proposed algorithm is a robust forecast method

  20. A QSAR Study of Environmental Estrogens Based on a Novel Variable Selection Method

    Directory of Open Access Journals (Sweden)

    Aiqian Zhang

    2012-05-01

    Full Text Available A large number of descriptors were employed to characterize the molecular structure of 53 natural, synthetic, and environmental chemicals which are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones and may thus pose a serious threat to the health of humans and wildlife. In this work, a robust quantitative structure-activity relationship (QSAR model with a novel variable selection method has been proposed for the effective estrogens. The variable selection method is based on variable interaction (VSMVI with leave-multiple-out cross validation (LMOCV to select the best subset. During variable selection, model construction and assessment, the Organization for Economic Co-operation and Development (OECD principles for regulation of QSAR acceptability were fully considered, such as using an unambiguous multiple-linear regression (MLR algorithm to build the model, using several validation methods to assessment the performance of the model, giving the define of applicability domain and analyzing the outliers with the results of molecular docking. The performance of the QSAR model indicates that the VSMVI is an effective, feasible and practical tool for rapid screening of the best subset from large molecular descriptors.

  1. A fast all-in-one method for automated post-processing of PIV data.

    Science.gov (United States)

    Garcia, Damien

    2011-05-01

    Post-processing of PIV (particle image velocimetry) data typically contains three following stages: validation of the raw data, replacement of spurious and missing vectors, and some smoothing. A robust post-processing technique that carries out these steps simultaneously is proposed. The new all-in-one method (DCT-PLS), based on a penalized least squares approach (PLS), combines the use of the discrete cosine transform (DCT) and the generalized cross-validation, thus allowing fast unsupervised smoothing of PIV data. The DCT-PLS was compared with conventional methods, including the normalized median test, for post-processing of simulated and experimental raw PIV velocity fields. The DCT-PLS was shown to be more efficient than the usual methods, especially in the presence of clustered outliers. It was also demonstrated that the DCT-PLS can easily deal with a large amount of missing data. Because the proposed algorithm works in any dimension, the DCT-PLS is also suitable for post-processing of volumetric three-component PIV data.

  2. A fast all-in-one method for automated post-processing of PIV data

    Science.gov (United States)

    Garcia, Damien

    2013-01-01

    Post-processing of PIV (particle image velocimetry) data typically contains three following stages: validation of the raw data, replacement of spurious and missing vectors, and some smoothing. A robust post-processing technique that carries out these steps simultaneously is proposed. The new all-in-one method (DCT-PLS), based on a penalized least squares approach (PLS), combines the use of the discrete cosine transform (DCT) and the generalized cross-validation, thus allowing fast unsupervised smoothing of PIV data. The DCT-PLS was compared with conventional methods, including the normalized median test, for post-processing of simulated and experimental raw PIV velocity fields. The DCT-PLS was shown to be more efficient than the usual methods, especially in the presence of clustered outliers. It was also demonstrated that the DCT-PLS can easily deal with a large amount of missing data. Because the proposed algorithm works in any dimension, the DCT-PLS is also suitable for post-processing of volumetric three-component PIV data. PMID:24795497

  3. Highway Travel Time Prediction Using Sparse Tensor Completion Tactics and K-Nearest Neighbor Pattern Matching Method

    Directory of Open Access Journals (Sweden)

    Jiandong Zhao

    2018-01-01

    Full Text Available Remote transportation microwave sensor (RTMS technology is being promoted for China’s highways. The distance is about 2 to 5 km between RTMSs, which leads to missing data and data sparseness problems. These two problems seriously restrict the accuracy of travel time prediction. Aiming at the data-missing problem, based on traffic multimode characteristics, a tensor completion method is proposed to recover the lost RTMS speed and volume data. Aiming at the data sparseness problem, virtual sensor nodes are set up between real RTMS nodes, and the two-dimensional linear interpolation and piecewise method are applied to estimate the average travel time between two nodes. Next, compared with the traditional K-nearest neighbor method, an optimal KNN method is proposed for travel time prediction. optimization is made in three aspects. Firstly, the three original state vectors, that is, speed, volume, and time of the day, are subdivided into seven periods. Secondly, the traffic congestion level is added as a new state vector. Thirdly, the cross-validation method is used to calibrate the K value to improve the adaptability of the KNN algorithm. Based on the data collected from Jinggangao highway, all the algorithms are validated. The results show that the proposed method can improve data quality and prediction precision of travel time.

  4. Nonparametric method for genomics-based prediction of performance of quantitative traits involving epistasis in plant breeding.

    Directory of Open Access Journals (Sweden)

    Xiaochun Sun

    Full Text Available Genomic selection (GS procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA and reproducing kernel Hilbert spaces (RKHS regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.

  5. Nonparametric method for genomics-based prediction of performance of quantitative traits involving epistasis in plant breeding.

    Science.gov (United States)

    Sun, Xiaochun; Ma, Ping; Mumm, Rita H

    2012-01-01

    Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.

  6. An ensemble method for predicting subnuclear localizations from primary protein structures.

    Directory of Open Access Journals (Sweden)

    Guo Sheng Han

    Full Text Available BACKGROUND: Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. METHODOLOGY/PRINCIPAL FINDINGS: A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. CONCLUSIONS: It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method

  7. Mining Method

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Young Shik; Lee, Kyung Woon; Kim, Oak Hwan; Kim, Dae Kyung [Korea Institute of Geology Mining and Materials, Taejon (Korea, Republic of)

    1996-12-01

    The reducing coal market has been enforcing the coal industry to make exceptional rationalization and restructuring efforts since the end of the eighties. To the competition from crude oil and natural gas has been added the growing pressure from rising wages and rising production cost as the workings get deeper. To improve the competitive position of the coal mines against oil and gas through cost reduction, studies to improve mining system have been carried out. To find fields requiring improvements most, the technologies using in Tae Bak Colliery which was selected one of long running mines were investigated and analyzed. The mining method appeared the field needing improvements most to reduce the production cost. The present method, so-called inseam roadway caving method presently is using to extract the steep and thick seam. However, this method has several drawbacks. To solve the problems, two mining methods are suggested for a long term and short term method respectively. Inseam roadway caving method with long-hole blasting method is a variety of the present inseam roadway caving method modified by replacing timber sets with steel arch sets and the shovel loaders with chain conveyors. And long hole blasting is introduced to promote caving. And pillar caving method with chock supports method uses chock supports setting in the cross-cut from the hanging wall to the footwall. Two single chain conveyors are needed. One is installed in front of chock supports to clear coal from the cutting face. The other is installed behind the supports to transport caved coal from behind. This method is superior to the previous one in terms of safety from water-inrushes, production rate and productivity. The only drawback is that it needs more investment. (author). 14 tabs., 34 figs.

  8. Projection Methods

    DEFF Research Database (Denmark)

    Wagner, Falko Jens; Poulsen, Mikael Zebbelin

    1999-01-01

    When trying to solve a DAE problem of high index with more traditional methods, it often causes instability in some of the variables, and finally leads to breakdown of convergence and integration of the solution. This is nicely shown in [ESF98, p. 152 ff.].This chapter will introduce projection...... methods as a way of handling these special problems. It is assumed that we have methods for solving normal ODE systems and index-1 systems....

  9. Discipline methods

    OpenAIRE

    Maria Kikila; Ioannis Koutelekos

    2012-01-01

    Child discipline is one of the most important elements of successful parenting. As discipline is defined the process that help children to learn appropriate behaviors and make good choices. Aim: The aim of the present study was to review the literature about the discipline methods. The method οf this study included bibliography research from both the review and the research literature, mainly in the pubmed data base which referred to the discipline methods. Results: In the literature it is ci...

  10. Maintenance methods

    International Nuclear Information System (INIS)

    Sanchis, H.; Aucher, P.

    1990-01-01

    The maintenance method applied at the Hague is summarized. The method was developed in order to solve problems relating to: the different specialist fields, the need for homogeneity in the maintenance work, the equipment diversity, the increase of the materials used at the Hague's new facilities. The aim of the method is to create a knowhow formalism, to facilitate maintenance, to ensure the running of the operations and to improve the estimation of the maintenance cost. One of the method's difficulties is the demonstration of the profitability of the maintenance operations [fr

  11. An Assessment of Mean Areal Precipitation Methods on Simulated Stream Flow: A SWAT Model Performance Assessment

    Directory of Open Access Journals (Sweden)

    Sean Zeiger

    2017-06-01

    Full Text Available Accurate mean areal precipitation (MAP estimates are essential input forcings for hydrologic models. However, the selection of the most accurate method to estimate MAP can be daunting because there are numerous methods to choose from (e.g., proximate gauge, direct weighted average, surface-fitting, and remotely sensed methods. Multiple methods (n = 19 were used to estimate MAP with precipitation data from 11 distributed monitoring sites, and 4 remotely sensed data sets. Each method was validated against the hydrologic model simulated stream flow using the Soil and Water Assessment Tool (SWAT. SWAT was validated using a split-site method and the observed stream flow data from five nested-scale gauging sites in a mixed-land-use watershed of the central USA. Cross-validation results showed the error associated with surface-fitting and remotely sensed methods ranging from −4.5 to −5.1%, and −9.8 to −14.7%, respectively. Split-site validation results showed the percent bias (PBIAS values that ranged from −4.5 to −160%. Second order polynomial functions especially overestimated precipitation and subsequent stream flow simulations (PBIAS = −160 in the headwaters. The results indicated that using an inverse-distance weighted, linear polynomial interpolation or multiquadric function method to estimate MAP may improve SWAT model simulations. Collectively, the results highlight the importance of spatially distributed observed hydroclimate data for precipitation and subsequent steam flow estimations. The MAP methods demonstrated in the current work can be used to reduce hydrologic model uncertainty caused by watershed physiographic differences.

  12. DDR: Efficient computational method to predict drug–target interactions using graph mining and machine learning approaches

    KAUST Repository

    Olayan, Rawan S.

    2017-11-23

    Motivation Finding computationally drug-target interactions (DTIs) is a convenient strategy to identify new DTIs at low cost with reasonable accuracy. However, the current DTI prediction methods suffer the high false positive prediction rate. Results We developed DDR, a novel method that improves the DTI prediction accuracy. DDR is based on the use of a heterogeneous graph that contains known DTIs with multiple similarities between drugs and multiple similarities between target proteins. DDR applies non-linear similarity fusion method to combine different similarities. Before fusion, DDR performs a pre-processing step where a subset of similarities is selected in a heuristic process to obtain an optimized combination of similarities. Then, DDR applies a random forest model using different graph-based features extracted from the DTI heterogeneous graph. Using five repeats of 10-fold cross-validation, three testing setups, and the weighted average of area under the precision-recall curve (AUPR) scores, we show that DDR significantly reduces the AUPR score error relative to the next best start-of-the-art method for predicting DTIs by 34% when the drugs are new, by 23% when targets are new, and by 34% when the drugs and the targets are known but not all DTIs between them are not known. Using independent sources of evidence, we verify as correct 22 out of the top 25 DDR novel predictions. This suggests that DDR can be used as an efficient method to identify correct DTIs.

  13. High-performance liquid chromatography method for the determination of hydrogen peroxide present or released in teeth bleaching kits and hair cosmetic products.

    Science.gov (United States)

    Gimeno, Pascal; Bousquet, Claudine; Lassu, Nelly; Maggio, Annie-Françoise; Civade, Corinne; Brenier, Charlotte; Lempereur, Laurent

    2015-03-25

    This manuscript presents an HPLC/UV method for the determination of hydrogen peroxide present or released in teeth bleaching products and hair products. The method is based on an oxidation of triphenylphosphine into triphenylphosphine oxide by hydrogen peroxide. Triphenylphosphine oxide formed is quantified by HPLC/UV. Validation data were obtained using the ISO 12787 standard approach, particularly adapted when it is not possible to make reconstituted sample matrices. For comparative purpose, hydrogen peroxide was also determined using ceric sulfate titrimetry for both types of products. For hair products, a cross validation of both ceric titrimetric method and HPLC/UV method using the cosmetic 82/434/EEC directive (official iodometric titration method) was performed. Results obtained for 6 commercialized teeth whitening products and 5 hair products point out similar hydrogen peroxide contain using either the HPLC/UV method or ceric sulfate titrimetric method. For hair products, results were similar to the hydrogen peroxide content using the cosmetic 82/434/EEC directive method and for the HPLC/UV method, mean recoveries obtained on spiked samples, using the ISO 12787 standard, ranges from 100% to 110% with a RSDhydrogen peroxide contents higher than the regulated limit. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Spectroscopic methods

    International Nuclear Information System (INIS)

    Ivanovich, M.; Murray, A.

    1992-01-01

    The principles involved in the interaction of nuclear radiation with matter are described, as are the principles behind methods of radiation detection. Different types of radiation detectors are described and methods of detection such as alpha, beta and gamma spectroscopy, neutron activation analysis are presented. Details are given of measurements of uranium-series disequilibria. (UK)

  15. Linear and nonlinear methods in modeling the aqueous solubility of organic compounds.

    Science.gov (United States)

    Catana, Cornel; Gao, Hua; Orrenius, Christian; Stouten, Pieter F W

    2005-01-01

    Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.

  16. BacHbpred: Support Vector Machine Methods for the Prediction of Bacterial Hemoglobin-Like Proteins

    Directory of Open Access Journals (Sweden)

    MuthuKrishnan Selvaraj

    2016-01-01

    Full Text Available The recent upsurge in microbial genome data has revealed that hemoglobin-like (HbL proteins may be widely distributed among bacteria and that some organisms may carry more than one HbL encoding gene. However, the discovery of HbL proteins has been limited to a small number of bacteria only. This study describes the prediction of HbL proteins and their domain classification using a machine learning approach. Support vector machine (SVM models were developed for predicting HbL proteins based upon amino acid composition (AC, dipeptide composition (DC, hybrid method (AC + DC, and position specific scoring matrix (PSSM. In addition, we introduce for the first time a new prediction method based on max to min amino acid residue (MM profiles. The average accuracy, standard deviation (SD, false positive rate (FPR, confusion matrix, and receiver operating characteristic (ROC were analyzed. We also compared the performance of our proposed models in homology detection databases. The performance of the different approaches was estimated using fivefold cross-validation techniques. Prediction accuracy was further investigated through confusion matrix and ROC curve analysis. All experimental results indicate that the proposed BacHbpred can be a perspective predictor for determination of HbL related proteins. BacHbpred, a web tool, has been developed for HbL prediction.

  17. Characterization of mammographic masses based on level set segmentation with new image features and patient information

    International Nuclear Information System (INIS)

    Shi Jiazheng; Sahiner, Berkman; Chan Heangping; Ge Jun; Hadjiiski, Lubomir; Helvie, Mark A.; Nees, Alexis; Wu Yita; Wei Jun; Zhou Chuan; Zhang Yiheng; Cui Jing

    2008-01-01

    Computer-aided diagnosis (CAD) for characterization of mammographic masses as malignant or benign has the potential to assist radiologists in reducing the biopsy rate without increasing false negatives. The purpose of this study was to develop an automated method for mammographic mass segmentation and explore new image based features in combination with patient information in order to improve the performance of mass characterization. The authors' previous CAD system, which used the active contour segmentation, and morphological, textural, and spiculation features, has achieved promising results in mass characterization. The new CAD system is based on the level set method and includes two new types of image features related to the presence of microcalcifications with the mass and abruptness of the mass margin, and patient age. A linear discriminant analysis (LDA) classifier with stepwise feature selection was used to merge the extracted features into a classification score. The classification accuracy was evaluated using the area under the receiver operating characteristic curve. The authors' primary data set consisted of 427 biopsy-proven masses (200 malignant and 227 benign) in 909 regions of interest (ROIs) (451 malignant and 458 benign) from multiple mammographic views. Leave-one-case-out resampling was used for training and testing. The new CAD system based on the level set segmentation and the new mammographic feature space achieved a view-based A z value of 0.83±0.01. The improvement compared to the previous CAD system was statistically significant (p=0.02). When patient age was included in the new CAD system, view-based and case-based A z values were 0.85±0.01 and 0.87±0.02, respectively. The study also demonstrated the consistency of the newly developed CAD system by evaluating the statistics of the weights of the LDA classifiers in leave-one-case-out classification. Finally, an independent test on the publicly available digital database for screening

  18. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

    KAUST Repository

    Abusamra, Heba

    2013-05-01

    Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in

  19. A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.

    Directory of Open Access Journals (Sweden)

    Domonkos Tikk

    Full Text Available The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein-protein interactions (PPIs reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study

  20. A genotypic method for determining HIV-2 coreceptor usage enables epidemiological studies and clinical decision support.

    Science.gov (United States)

    Döring, Matthias; Borrego, Pedro; Büch, Joachim; Martins, Andreia; Friedrich, Georg; Camacho, Ricardo Jorge; Eberle, Josef; Kaiser, Rolf; Lengauer, Thomas; Taveira, Nuno; Pfeifer, Nico

    2016-12-20

    CCR5-coreceptor antagonists can be used for treating HIV-2 infected individuals. Before initiating treatment with coreceptor antagonists, viral coreceptor usage should be determined to ensure that the virus can use only the CCR5 coreceptor (R5) and cannot evade the drug by using the CXCR4 coreceptor (X4-capable). However, until now, no online tool for the genotypic identification of HIV-2 coreceptor usage had been available. Furthermore, there is a lack of knowledge on the determinants of HIV-2 coreceptor usage. Therefore, we developed a data-driven web service for the prediction of HIV-2 coreceptor usage from the V3 loop of the HIV-2 glycoprotein and used the tool to identify novel discriminatory features of X4-capable variants. Using 10 runs of tenfold cross validation, we selected a linear support vector machine (SVM) as the model for geno2pheno[coreceptor-hiv2], because it outperformed the other SVMs with an area under the ROC curve (AUC) of 0.95. We found that SVMs were highly accurate in identifying HIV-2 coreceptor usage, attaining sensitivities of 73.5% and specificities of 96% during tenfold nested cross validation. The predictive performance of SVMs was not significantly different (p value 0.37) from an existing rules-based approach. Moreover, geno2pheno[coreceptor-hiv2] achieved a predictive accuracy of 100% and outperformed the existing approach on an independent data set containing nine new isolates with corresponding phenotypic measurements of coreceptor usage. geno2pheno[coreceptor-hiv2] could not only reproduce the established markers of CXCR4-usage, but also revealed novel markers: the substitutions 27K, 15G, and 8S were significantly predictive of CXCR4 usage. Furthermore, SVMs trained on the amino-acid sequences of the V1 and V2 loops were also quite accurate in predicting coreceptor usage (AUCs of 0.84 and 0.65, respectively). In this study, we developed geno2pheno[coreceptor-hiv2], the first online tool for the prediction of HIV-2 coreceptor

  1. Method Mixins

    DEFF Research Database (Denmark)

    Ernst, Erik

    2002-01-01

    . Method mixins use shared name spaces to transfer information between caller and callee, as opposed to traditional invocation which uses parameters and returned results. This relieves a caller from dependencies on the callee, and it allows direct transfer of information further down the call stack, e......The procedure call mechanism has conquered the world of programming, with object-oriented method invocation being a procedure call in context of an object. This paper presents an alternative, method mixin invocations, that is optimized for flexible creation of composite behavior, where traditional...

  2. Electrocardiogram ST-Segment Morphology Delineation Method Using Orthogonal Transformations.

    Directory of Open Access Journals (Sweden)

    Miha Amon

    Full Text Available Differentiation between ischaemic and non-ischaemic transient ST segment events of long term ambulatory electrocardiograms is a persisting weakness in present ischaemia detection systems. Traditional ST segment level measuring is not a sufficiently precise technique due to the single point of measurement and severe noise which is often present. We developed a robust noise resistant orthogonal-transformation based delineation method, which allows tracing the shape of transient ST segment morphology changes from the entire ST segment in terms of diagnostic and morphologic feature-vector time series, and also allows further analysis. For these purposes, we developed a new Legendre Polynomials based Transformation (LPT of ST segment. Its basis functions have similar shapes to typical transient changes of ST segment morphology categories during myocardial ischaemia (level, slope and scooping, thus providing direct insight into the types of time domain morphology changes through the LPT feature-vector space. We also generated new Karhunen and Lo ève Transformation (KLT ST segment basis functions using a robust covariance matrix constructed from the ST segment pattern vectors derived from the Long Term ST Database (LTST DB. As for the delineation of significant transient ischaemic and non-ischaemic ST segment episodes, we present a study on the representation of transient ST segment morphology categories, and an evaluation study on the classification power of the KLT- and LPT-based feature vectors to classify between ischaemic and non-ischaemic ST segment episodes of the LTST DB. Classification accuracy using the KLT and LPT feature vectors was 90% and 82%, respectively, when using the k-Nearest Neighbors (k = 3 classifier and 10-fold cross-validation. New sets of feature-vector time series for both transformations were derived for the records of the LTST DB which is freely available on the PhysioNet website and were contributed to the LTST DB. The

  3. Method Mixins

    DEFF Research Database (Denmark)

    Ernst, Erik

    2002-01-01

    invocation is optimized for as-is reuse of existing behavior. Tight coupling reduces flexibility, and traditional invocation tightly couples transfer of information and transfer of control. Method mixins decouple these two kinds of transfer, thereby opening the doors for new kinds of abstraction and reuse......The procedure call mechanism has conquered the world of programming, with object-oriented method invocation being a procedure call in context of an object. This paper presents an alternative, method mixin invocations, that is optimized for flexible creation of composite behavior, where traditional....... Method mixins use shared name spaces to transfer information between caller and callee, as opposed to traditional invocation which uses parameters and returned results. This relieves a caller from dependencies on the callee, and it allows direct transfer of information further down the call stack, e...

  4. Dosimetry methods

    DEFF Research Database (Denmark)

    McLaughlin, W.L.; Miller, A.; Kovacs, A.

    2003-01-01

    Chemical and physical radiation dosimetry methods, used for the measurement of absorbed dose mainly during the practical use of ionizing radiation, are discussed with respect to their characteristics and fields of application....

  5. Method Mixins

    DEFF Research Database (Denmark)

    Ernst, Erik

    2005-01-01

    The world of programming has been conquered by the procedure call mechanism, including object-oriented method invocation which is a procedure call in context of an object. This paper presents an alternative, method mixin invocations, that is optimized for flexible creation of composite behavior, ...... the call stack, e.g., to a callee's callee. The mechanism has been implemented in the programming language gbeta. Variants of the mechanism could be added to almost any imperative programming language.......The world of programming has been conquered by the procedure call mechanism, including object-oriented method invocation which is a procedure call in context of an object. This paper presents an alternative, method mixin invocations, that is optimized for flexible creation of composite behavior...

  6. Orthogonal analytical methods for botanical standardization: determination of green tea catechins by qNMR and LC-MS/MS.

    Science.gov (United States)

    Napolitano, José G; Gödecke, Tanja; Lankin, David C; Jaki, Birgit U; McAlpine, James B; Chen, Shao-Nong; Pauli, Guido F

    2014-05-01

    The development of analytical methods for parallel characterization of multiple phytoconstituents is essential to advance the quality control of herbal products. While chemical standardization is commonly carried out by targeted analysis using gas or liquid chromatography-based methods, more universal approaches based on quantitative (1)H NMR (qHNMR) measurements are being used increasingly in the multi-targeted assessment of these complex mixtures. The present study describes the development of a 1D qHNMR-based method for simultaneous identification and quantification of green tea constituents. This approach utilizes computer-assisted (1)H iterative Full Spin Analysis (HiFSA) and enables rapid profiling of seven catechins in commercial green tea extracts. The qHNMR results were cross-validated against quantitative profiles obtained with an orthogonal LC-MS/MS method. The relative strengths and weaknesses of both approaches are discussed, with special emphasis on the role of identical reference standards in qualitative and quantitative analyses. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Simultaneous Detemination of Atorvastatin Calcium and Amlodipine Besylate by Spectrophotometry and Multivariate Calibration Methods in Pharmaceutical Formulations

    Directory of Open Access Journals (Sweden)

    Amir H. M. Sarrafi

    2011-01-01

    Full Text Available Resolution of binary mixture of atorvastatin (ATV and amlodipine (AML with minimum sample pretreatment and without analyte separation has been successfully achieved using a rapid method based on partial least square analysis of UV–spectral data. Multivariate calibration modeling procedures, traditional partial least squares (PLS-2, interval partial least squares (iPLS and synergy partial least squares (siPLS, were applied to select a spectral range that provided the lowest prediction error in comparison to the full-spectrum model. The simultaneous determination of both analytes was possible by PLS processing of sample absorbance between 220-425 nm. The correlation coefficients (R and root mean squared error of cross validation (RMSECV for ATV and AML in synthetic mixture were 0.9991, 0.9958 and 0.4538, 0.2411 in best siPLS models respectively. The optimized method has been used for determination of ATV and AML in amostatin commercial tablets. The proposed method are simple, fast, inexpensive and do not need any separation or preparation methods.

  8. A Novel Hybrid Intelligent Indoor Location Method for Mobile Devices by Zones Using Wi-Fi Signals.

    Science.gov (United States)

    Castañón-Puga, Manuel; Salazar, Abby Stephanie; Aguilar, Leocundo; Gaxiola-Pacheco, Carelia; Licea, Guillermo

    2015-12-02

    The increasing use of mobile devices in indoor spaces brings challenges to location methods. This work presents a hybrid intelligent method based on data mining and Type-2 fuzzy logic to locate mobile devices in an indoor space by zones using Wi-Fi signals from selected access points (APs). This approach takes advantage of wireless local area networks (WLANs) over other types of architectures and implements the complete method in a mobile application using the developed tools. Besides, the proposed approach is validated by experimental data obtained from case studies and the cross-validation technique. For the purpose of generating the fuzzy rules that conform to the Takagi-Sugeno fuzzy system structure, a semi-supervised data mining technique called subtractive clustering is used. This algorithm finds centers of clusters from the radius map given by the collected signals from APs. Measurements of Wi-Fi signals can be noisy due to several factors mentioned in this work, so this method proposed the use of Type-2 fuzzy logic for modeling and dealing with such uncertain information.

  9. A Novel Hybrid Intelligent Indoor Location Method for Mobile Devices by Zones Using Wi-Fi Signals

    Directory of Open Access Journals (Sweden)

    Manuel Castañón–Puga

    2015-12-01

    Full Text Available The increasing use of mobile devices in indoor spaces brings challenges to location methods. This work presents a hybrid intelligent method based on data mining and Type-2 fuzzy logic to locate mobile devices in an indoor space by zones using Wi-Fi signals from selected access points (APs. This approach takes advantage of wireless local area networks (WLANs over other types of architectures and implements the complete method in a mobile application using the developed tools. Besides, the proposed approach is validated by experimental data obtained from case studies and the cross-validation technique. For the purpose of generating the fuzzy rules that conform to the Takagi–Sugeno fuzzy system structure, a semi-supervised data mining technique called subtractive clustering is used. This algorithm finds centers of clusters from the radius map given by the collected signals from APs. Measurements of Wi-Fi signals can be noisy due to several factors mentioned in this work, so this method proposed the use of Type-2 fuzzy logic for modeling and dealing with such uncertain information.

  10. A new classification method for MALDI imaging mass spectrometry data acquired on formalin-fixed paraffin-embedded tissue samples.

    Science.gov (United States)

    Boskamp, Tobias; Lachmund, Delf; Oetjen, Janina; Cordero Hernandez, Yovany; Trede, Dennis; Maass, Peter; Casadonte, Rita; Kriegsmann, Jörg; Warth, Arne; Dienemann, Hendrik; Weichert, Wilko; Kriegsmann, Mark

    2017-07-01

    Matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI IMS) shows a high potential for applications in histopathological diagnosis, and in particular for supporting tumor typing and subtyping. The development of such applications requires the extraction of spectral fingerprints that are relevant for the given tissue and the identification of biomarkers associated with these spectral patterns. We propose a novel data analysis method based on the extraction of characteristic spectral patterns (CSPs) that allow automated generation of classification models for spectral data. Formalin-fixed paraffin embedded (FFPE) tissue samples from N=445 patients assembled on 12 tissue microarrays were analyzed. The method was applied to discriminate primary lung and pancreatic cancer, as well as adenocarcinoma and squamous cell carcinoma of the lung. A classification accuracy of 100% and 82.8%, resp., could be achieved on core level, assessed by cross-validation. The method outperformed the more conventional classification method based on the extraction of individual m/z values in the first application, while achieving a comparable accuracy in the second. LC-MS/MS peptide identification demonstrated that the spectral features present in selected CSPs correspond to peptides relevant for the respective classification. This article is part of a Special Issue entitled: MALDI Imaging, edited by Dr. Corinna Henkel and Prof. Peter Hoffmann. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. The feasibility of using explicit method for linear correction of the particle size variation using NIR Spectroscopy combined with PLS2regression method

    Science.gov (United States)

    Yulia, M.; Suhandy, D.

    2018-03-01

    NIR spectra obtained from spectral data acquisition system contains both chemical information of samples as well as physical information of the samples, such as particle size and bulk density. Several methods have been established for developing calibration models that can compensate for sample physical information variations. One common approach is to include physical information variation in the calibration model both explicitly and implicitly. The objective of this study was to evaluate the feasibility of using explicit method to compensate the influence of different particle size of coffee powder in NIR calibration model performance. A number of 220 coffee powder samples with two different types of coffee (civet and non-civet) and two different particle sizes (212 and 500 µm) were prepared. Spectral data was acquired using NIR spectrometer equipped with an integrating sphere for diffuse reflectance measurement. A discrimination method based on PLS-DA was conducted and the influence of different particle size on the performance of PLS-DA was investigated. In explicit method, we add directly the particle size as predicted variable results in an X block containing only the NIR spectra and a Y block containing the particle size and type of coffee. The explicit inclusion of the particle size into the calibration model is expected to improve the accuracy of type of coffee determination. The result shows that using explicit method the quality of the developed calibration model for type of coffee determination is a little bit superior with coefficient of determination (R2) = 0.99 and root mean square error of cross-validation (RMSECV) = 0.041. The performance of the PLS2 calibration model for type of coffee determination with particle size compensation was quite good and able to predict the type of coffee in two different particle sizes with relatively high R2 pred values. The prediction also resulted in low bias and RMSEP values.

  12. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

    KAUST Repository

    Abusamra, Heba

    2013-11-01

    Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

  13. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

    KAUST Repository

    Abusamra, Heba

    2013-01-01

    Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

  14. Ensemble Methods

    Science.gov (United States)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been

  15. Method Mixins

    DEFF Research Database (Denmark)

    Ernst, Erik

    2005-01-01

    The world of programming has been conquered by the procedure call mechanism, including object-oriented method invocation which is a procedure call in context of an object. This paper presents an alternative, method mixin invocations, that is optimized for flexible creation of composite behavior...... of abstraction and reuse. Method mixins use shared name spaces to transfer information between caller and callee, as opposed to traditional invocation which uses parameters and returned results. This relieves the caller from dependencies on the callee, and it allows direct transfer of information further down...... the call stack, e.g., to a callee's callee. The mechanism has been implemented in the programming language gbeta. Variants of the mechanism could be added to almost any imperative programming language....

  16. Statistical methods

    CERN Document Server

    Szulc, Stefan

    1965-01-01

    Statistical Methods provides a discussion of the principles of the organization and technique of research, with emphasis on its application to the problems in social statistics. This book discusses branch statistics, which aims to develop practical ways of collecting and processing numerical data and to adapt general statistical methods to the objectives in a given field.Organized into five parts encompassing 22 chapters, this book begins with an overview of how to organize the collection of such information on individual units, primarily as accomplished by government agencies. This text then

  17. Sieve methods

    CERN Document Server

    Halberstam, Heine

    2011-01-01

    Derived from the techniques of analytic number theory, sieve theory employs methods from mathematical analysis to solve number-theoretical problems. This text by a noted pair of experts is regarded as the definitive work on the subject. It formulates the general sieve problem, explores the theoretical background, and illustrates significant applications.""For years to come, Sieve Methods will be vital to those seeking to work in the subject, and also to those seeking to make applications,"" noted prominent mathematician Hugh Montgomery in his review of this volume for the Bulletin of the Ameri

  18. Characterization methods

    Energy Technology Data Exchange (ETDEWEB)

    Glass, J.T. [North Carolina State Univ., Raleigh (United States)

    1993-01-01

    Methods discussed in this compilation of notes and diagrams are Raman spectroscopy, scanning electron microscopy, transmission electron microscopy, and other surface analysis techniques (auger electron spectroscopy, x-ray photoelectron spectroscopy, electron energy loss spectroscopy, and scanning tunnelling microscopy). A comparative evaluation of different techniques is performed. In-vacuo and in-situ analyses are described.

  19. Digital Methods

    NARCIS (Netherlands)

    Rogers, R.

    2013-01-01

    In Digital Methods, Richard Rogers proposes a methodological outlook for social and cultural scholarly research on the Web that seeks to move Internet research beyond the study of online culture. It is not a toolkit for Internet research, or operating instructions for a software package; it deals

  20. Evaluation of spatial and spatiotemporal estimation methods in simulation of precipitation variability patterns

    Science.gov (United States)

    Bayat, Bardia; Zahraie, Banafsheh; Taghavi, Farahnaz; Nasseri, Mohsen

    2013-08-01

    Identification of spatial and spatiotemporal precipitation variations plays an important role in different hydrological applications such as missing data estimation. In this paper, the results of Bayesian maximum entropy (BME) and ordinary kriging (OK) are compared for modeling spatial and spatiotemporal variations of annual precipitation with and without incorporating elevation variations. The study area of this research is Namak Lake watershed located in the central part of Iran with an area of approximately 90,000 km2. The BME and OK methods have been used to model the spatial and spatiotemporal variations of precipitation in this watershed, and their performances have been evaluated using cross-validation statistics. The results of the case study have shown the superiority of BME over OK in both spatial and spatiotemporal modes. The results have shown that BME estimates are less biased and more accurate than OK. The improvements in the BME estimates are mostly related to incorporating hard and soft data in the estimation process, which resulted in more detailed and reliable results. Estimation error variance for BME results is less than OK estimations in the study area in both spatial and spatiotemporal modes.

  1. e-Bitter: Bitterant Prediction by the Consensus Voting From the Machine-Learning Methods.

    Science.gov (United States)

    Zheng, Suqing; Jiang, Mengying; Zhao, Chengwei; Zhu, Rui; Hu, Zhicheng; Xu, Yong; Lin, Fu

    2018-01-01

    In-silico bitterant prediction received the considerable attention due to the expensive and laborious experimental-screening of the bitterant. In this work, we collect the fully experimental dataset containing 707 bitterants and 592 non-bitterants, which is distinct from the fully or partially hypothetical non-bitterant dataset used in the previous works. Based on this experimental dataset, we harness the consensus votes from the multiple machine-learning methods (e.g., deep learning etc.) combined with the molecular fingerprint to build the bitter/bitterless classification models with five-fold cross-validation, which are further inspected by the Y-randomization test and applicability domain analysis. One of the best consensus models affords the accuracy, precision, specificity, sensitivity, F1-score, and Matthews correlation coefficient (MCC) of 0.929, 0.918, 0.898, 0.954, 0.936, and 0.856 respectively on our test set. For the automatic prediction of bitterant, a graphic program "e-Bitter" is developed for the convenience of users via the simple mouse click. To our best knowledge, it is for the first time to adopt the consensus model for the bitterant prediction and develop the first free stand-alone software for the experimental food scientist.

  2. e-Bitter: Bitterant Prediction by the Consensus Voting From the Machine-Learning Methods

    Directory of Open Access Journals (Sweden)

    Suqing Zheng

    2018-03-01

    Full Text Available In-silico bitterant prediction received the considerable attention due to the expensive and laborious experimental-screening of the bitterant. In this work, we collect the fully experimental dataset containing 707 bitterants and 592 non-bitterants, which is distinct from the fully or partially hypothetical non-bitterant dataset used in the previous works. Based on this experimental dataset, we harness the consensus votes from the multiple machine-learning methods (e.g., deep learning etc. combined with the molecular fingerprint to build the bitter/bitterless classification models with five-fold cross-validation, which are further inspected by the Y-randomization test and applicability domain analysis. One of the best consensus models affords the accuracy, precision, specificity, sensitivity, F1-score, and Matthews correlation coefficient (MCC of 0.929, 0.918, 0.898, 0.954, 0.936, and 0.856 respectively on our test set. For the automatic prediction of bitterant, a graphic program “e-Bitter” is developed for the convenience of users via the simple mouse click. To our best knowledge, it is for the first time to adopt the consensus model for the bitterant prediction and develop the first free stand-alone software for the experimental food scientist.

  3. Integrating Symbolic and Statistical Methods for Testing Intelligent Systems Applications to Machine Learning and Computer Vision

    Energy Technology Data Exchange (ETDEWEB)

    Jha, Sumit Kumar [University of Central Florida, Orlando; Pullum, Laura L [ORNL; Ramanathan, Arvind [ORNL

    2016-01-01

    Embedded intelligent systems ranging from tiny im- plantable biomedical devices to large swarms of autonomous un- manned aerial systems are becoming pervasive in our daily lives. While we depend on the flawless functioning of such intelligent systems, and often take their behavioral correctness and safety for granted, it is notoriously difficult to generate test cases that expose subtle errors in the implementations of machine learning algorithms. Hence, the validation of intelligent systems is usually achieved by studying their behavior on representative data sets, using methods such as cross-validation and bootstrapping.In this paper, we present a new testing methodology for studying the correctness of intelligent systems. Our approach uses symbolic decision procedures coupled with statistical hypothesis testing to. We also use our algorithm to analyze the robustness of a human detection algorithm built using the OpenCV open-source computer vision library. We show that the human detection implementation can fail to detect humans in perturbed video frames even when the perturbations are so small that the corresponding frames look identical to the naked eye.

  4. EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

    Science.gov (United States)

    Lian, Yao; Ge, Meng; Pan, Xian-Ming

    2014-12-19

    B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

  5. Fast and low-cost method for VBES bathymetry generation in coastal areas

    Science.gov (United States)

    Sánchez-Carnero, N.; Aceña, S.; Rodríguez-Pérez, D.; Couñago, E.; Fraile, P.; Freire, J.

    2012-12-01

    Sea floor topography is key information in coastal area management. Nowadays, LiDAR and multibeam technologies provide accurate bathymetries in those areas; however these methodologies are yet too expensive for small customers (fishermen associations, small research groups) willing to keep a periodic surveillance of environmental resources. In this paper, we analyse a simple methodology for vertical beam echosounder (VBES) bathymetric data acquisition and postprocessing, using low-cost means and free customizable tools such as ECOSONS and gvSIG (that is compared with industry standard ArcGIS). Echosounder data was filtered, resampled and, interpolated (using kriging or radial basis functions). Moreover, the presented methodology includes two data correction processes: Monte Carlo simulation, used to reduce GPS errors, and manually applied bathymetric line transformations, both improving the obtained results. As an example, we present the bathymetry of the Ría de Cedeira (Galicia, NW Spain), a good testbed area for coastal bathymetry methodologies given its extension and rich topography. The statistical analysis, performed by direct ground-truthing, rendered an upper bound of 1.7 m error, at 95% confidence level, and 0.7 m r.m.s. (cross-validation provided 30 cm and 25 cm, respectively). The methodology presented is fast and easy to implement, accurate outside transects (accuracy can be estimated), and can be used as a low-cost periodical monitoring method.

  6. Prediction of Nepsilon-acetylation on internal lysines implemented in Bayesian Discriminant Method.

    Science.gov (United States)

    Li, Ao; Xue, Yu; Jin, Changjiang; Wang, Minghui; Yao, Xuebiao

    2006-12-01

    Protein acetylation is an important and reversible post-translational modification (PTM), and it governs a variety of cellular dynamics and plasticity. Experimental identification of acetylation sites is labor-intensive and often limited by the availability of reagents such as acetyl-specific antibodies and optimization of enzymatic reactions. Computational analyses may facilitate the identification of potential acetylation sites and provide insights into further experimentation. In this manuscript, we present a novel protein acetylation prediction program named PAIL, prediction of acetylation on internal lysines, implemented in a BDM (Bayesian Discriminant Method) algorithm. The accuracies of PAIL are 85.13%, 87.97%, and 89.21% at low, medium, and high thresholds, respectively. Both Jack-Knife validation and n-fold cross-validation have been performed to show that PAIL is accurate and robust. Taken together, we propose that PAIL is a novel predictor for identification of protein acetylation sites and may serve as an important tool to study the function of protein acetylation. PAIL has been implemented in PHP and is freely available on a web server at: http://bioinformatics.lcd-ustc.org/pail.

  7. Prediction of Nε-acetylation on internal lysines implemented in Bayesian Discriminant Method

    Science.gov (United States)

    Li, Ao; Xue, Yu; Jin, Changjiang; Wang, Minghui; Yao, Xuebiao

    2007-01-01

    Protein acetylation is an important and reversible post-translational modification (PTM), and it governs a variety of cellular dynamics and plasticity. Experimental identification of acetylation sites is labor-intensive and often limited by the availability reagents such as acetyl-specific antibodies and optimization of enzymatic reactions. Computational analyses may facilitate the identification of potential acetylation sites and provide insights into further experimentation. In this manuscript, we present a novel protein acetylation prediction program named PAIL, prediction of acetylation on internal lysines, implemented in a BDM (Bayesian Discriminant Method) algorithm. The accuracies of PAIL are 85.13%, 87.97% and 89.21% at low, medium and high thresholds, respectively. Both Jack-Knife validation and n-fold cross validation have been performed to show that PAIL is accurate and robust. Taken together, we propose that PAIL is a novel predictor for identification of protein acetylation sites and may serve as an important tool to study the function of protein acetylation. PAIL has been implemented in PHP and is freely available on a web server at: http://bioinformatics.lcd-ustc.org/pail. PMID:17045240

  8. e-Bitter: Bitterant Prediction by the Consensus Voting From the Machine-learning Methods

    Science.gov (United States)

    Zheng, Suqing; Jiang, Mengying; Zhao, Chengwei; Zhu, Rui; Hu, Zhicheng; Xu, Yong; Lin, Fu

    2018-03-01

    In-silico bitterant prediction received the considerable attention due to the expensive and laborious experimental-screening of the bitterant. In this work, we collect the fully experimental dataset containing 707 bitterants and 592 non-bitterants, which is distinct from the fully or partially hypothetical non-bitterant dataset used in the previous works. Based on this experimental dataset, we harness the consensus votes from the multiple machine-learning methods (e.g., deep learning etc.) combined with the molecular fingerprint to build the bitter/bitterless classification models with five-fold cross-validation, which are further inspected by the Y-randomization test and applicability domain analysis. One of the best consensus models affords the accuracy, precision, specificity, sensitivity, F1-score, and Matthews correlation coefficient (MCC) of 0.929, 0.918, 0.898, 0.954, 0.936, and 0.856 respectively on our test set. For the automatic prediction of bitterant, a graphic program “e-Bitter” is developed for the convenience of users via the simple mouse click. To our best knowledge, it is for the first time to adopt the consensus model for the bitterant prediction and develop the first free stand-alone software for the experimental food scientist.

  9. A Method for Aileron Actuator Fault Diagnosis Based on PCA and PGC-SVM

    Directory of Open Access Journals (Sweden)

    Wei-Li Qin

    2016-01-01

    Full Text Available Aileron actuators are pivotal components for aircraft flight control system. Thus, the fault diagnosis of aileron actuators is vital in the enhancement of the reliability and fault tolerant capability. This paper presents an aileron actuator fault diagnosis approach combining principal component analysis (PCA, grid search (GS, 10-fold cross validation (CV, and one-versus-one support vector machine (SVM. This method is referred to as PGC-SVM and utilizes the direct drive valve input, force motor current, and displacement feedback signal to realize fault detection and location. First, several common faults of aileron actuators, which include force motor coil break, sensor coil break, cylinder leakage, and amplifier gain reduction, are extracted from the fault quadrantal diagram; the corresponding fault mechanisms are analyzed. Second, the data feature extraction is performed with dimension reduction using PCA. Finally, the GS and CV algorithms are employed to train a one-versus-one SVM for fault classification, thus obtaining the optimal model parameters and assuring the generalization of the trained SVM, respectively. To verify the effectiveness of the proposed approach, four types of faults are introduced into the simulation model established by AMESim and Simulink. The results demonstrate its desirable diagnostic performance which outperforms that of the traditional SVM by comparison.

  10. Spatial Interpolation of Reference Evapotranspiration in India: Comparison of IDW and Kriging Methods

    Science.gov (United States)

    Hodam, Sanayanbi; Sarkar, Sajal; Marak, Areor G. R.; Bandyopadhyay, A.; Bhadra, A.

    2017-12-01

    In the present study, to understand the spatial distribution characteristics of the ETo over India, spatial interpolation was performed on the means of 32 years (1971-2002) monthly data of 131 India Meteorological Department stations uniformly distributed over the country by two methods, namely, inverse distance weighted (IDW) interpolation and kriging. Kriging was found to be better while developing the monthly surfaces during cross-validation. However, in station-wise validation, IDW performed better than kriging in almost all the cases, hence is recommended for spatial interpolation of ETo and its governing meteorological parameters. This study also checked if direct kriging of FAO-56 Penman-Monteith (PM) (Allen et al. in Crop evapotranspiration—guidelines for computing crop water requirements, Irrigation and drainage paper 56, Food and Agriculture Organization of the United Nations (FAO), Rome, 1998) point ETo produced comparable results against ETo estimated with individually kriged weather parameters (indirect kriging). Indirect kriging performed marginally well compared to direct kriging. Point ETo values were extended to areal ETo values by IDW and FAO-56 PM mean ETo maps for India were developed to obtain sufficiently accurate ETo estimates at unknown locations.

  11. Benchmarking Methods and Data Sets for Ligand Enrichment Assessment in Virtual Screening

    Science.gov (United States)

    Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2014-01-01

    Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. “analogue bias”, “artificial enrichment” and “false negative”. In addition, we introduced our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylase (HDAC) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The Leave-One-Out Cross-Validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased in terms of property matching, ROC curves and AUCs. PMID:25481478

  12. Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.

    Science.gov (United States)

    Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-01-01

    Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. Quality assurance using outlier detection on an automatic segmentation method for the cerebellar peduncles

    Science.gov (United States)

    Li, Ke; Ye, Chuyang; Yang, Zhen; Carass, Aaron; Ying, Sarah H.; Prince, Jerry L.

    2016-03-01

    Cerebellar peduncles (CPs) are white matter tracts connecting the cerebellum to other brain regions. Automatic segmentation methods of the CPs have been proposed for studying their structure and function. Usually the performance of these methods is evaluated by comparing segmentation results with manual delineations (ground truth). However, when a segmentation method is run on new data (for which no ground truth exists) it is highly desirable to efficiently detect and assess algorithm failures so that these cases can be excluded from scientific analysis. In this work, two outlier detection methods aimed to assess the performance of an automatic CP segmentation algorithm are presented. The first one is a univariate non-parametric method using a box-whisker plot. We first categorize automatic segmentation results of a dataset of diffusion tensor imaging (DTI) scans from 48 subjects as either a success or a failure. We then design three groups of features from the image data of nine categorized failures for failure detection. Results show that most of these features can efficiently detect the true failures. The second method—supervised classification—was employed on a larger DTI dataset of 249 manually categorized subjects. Four classifiers—linear discriminant analysis (LDA), logistic regression (LR), support vector machine (SVM), and random forest classification (RFC)—were trained using the designed features and evaluated using a leave-one-out cross validation. Results show that the LR performs worst among the four classifiers and the other three perform comparably, which demonstrates the feasibility of automatically detecting segmentation failures using classification methods.

  14. A multi-label learning based kernel automatic recommendation method for support vector machine.

    Science.gov (United States)

    Zhang, Xueying; Song, Qinbao

    2015-01-01

    Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance.

  15. A novel method to discover fluoroquinolone antibiotic resistance (qnr genes in fragmented nucleotide sequences

    Directory of Open Access Journals (Sweden)

    Boulund Fredrik

    2012-12-01

    Full Text Available Abstract Background Broad-spectrum fluoroquinolone antibiotics are central in modern health care and are used to treat and prevent a wide range of bacterial infections. The recently discovered qnr genes provide a mechanism of resistance with the potential to rapidly spread between bacteria using horizontal gene transfer. As for many antibiotic resistance genes present in pathogens today, qnr genes are hypothesized to originate from environmental bacteria. The vast amount of data generated by shotgun metagenomics can therefore be used to explore the diversity of qnr genes in more detail. Results In this paper we describe a new method to identify qnr genes in nucleotide sequence data. We show, using cross-validation, that the method has a high statistical power of correctly classifying sequences from novel classes of qnr genes, even for fragments as short as 100 nucleotides. Based on sequences from public repositories, the method was able to identify all previously reported plasmid-mediated qnr genes. In addition, several fragments from novel putative qnr genes were identified in metagenomes. The method was also able to annotate 39 chromosomal variants of which 11 have previously not been reported in literature. Conclusions The method described in this paper significantly improves the sensitivity and specificity of identification and annotation of qnr genes in nucleotide sequence data. The predicted novel putative qnr genes in the metagenomic data support the hypothesis of a large and uncharacterized diversity within this family of resistance genes in environmental bacterial communities. An implementation of the method is freely available at http://bioinformatics.math.chalmers.se/qnr/.

  16. Survey and Zoning of Soil Physical and Chemical Properties Using Geostatistical Methods in GIS (Case Study: Miankangi Region in Sistan

    Directory of Open Access Journals (Sweden)

    M. Hashemi

    2017-02-01

    Full Text Available Introduction: In order to provide a database, it is essential having access to accurate information on soil spatial variation for soil sustainable management such as proper application of fertilizers. Spatial variations in soil properties are common but it is important for understanding these changes, particularly in agricultural lands for careful planning and land management. Materials and Methods: To this end, in winter 1391, 189 undisturbed soil samples (0-30 cm depth in a regular lattice with a spacing of 500 m were gathered from the surface of Miankangi land, Sistan plain, and their physical and chemical properties were studied. The land area of the region is about 4,500 hectares; the average elevation of studied area is 489.2 meters above sea level with different land uses. Soil texture was measured by the hydrometer methods (11, Also EC and pH (39, calcium carbonate equivalent (37 and the saturation percentage of soils were determined. Kriging, Co-Kriging, Inverse Distance Weighting and Local Polynomial Interpolation techniques were evaluated to produce a soil characteristics map of the study area zoning and to select the best geostatistical methods. Cross-validation techniques and Root Mean Square Error (RMSE were used. Results and Discussion: Normalized test results showed that all of the soil properties except calcium carbonate and soil clay content had normal distribution. In addition, the results of correlation test showed that the soil saturation percentage was positively correlated with silt content (r=0.43 and p

  17. New theory of discriminant analysis after R. Fisher advanced research by the feature selection method for microarray data

    CERN Document Server

    Shinmura, Shuichi

    2016-01-01

    This is the first book to compare eight LDFs by different types of datasets, such as Fisher’s iris data, medical data with collinearities, Swiss banknote data that is a linearly separable data (LSD), student pass/fail determination using student attributes, 18 pass/fail determinations using exam scores, Japanese automobile data, and six microarray datasets (the datasets) that are LSD. We developed the 100-fold cross-validation for the small sample method (Method 1) instead of the LOO method. We proposed a simple model selection procedure to choose the best model having minimum M2 and Revised IP-OLDF based on MNM criterion was found to be better than other M2s in the above datasets. We compared two statistical LDFs and six MP-based LDFs. Those were Fisher’s LDF, logistic regression, three SVMs, Revised IP-OLDF, and another two OLDFs. Only a hard-margin SVM (H-SVM) and Revised IP-OLDF could discriminate LSD theoretically (Problem 2). We solved the defect of the generalized inverse matrices (Problem 3). For ...

  18. Introducing conjoint analysis method into delayed lotteries studies: its validity and time stability are higher than in adjusting.

    Science.gov (United States)

    Białek, Michał; Markiewicz, Łukasz; Sawicki, Przemysław

    2015-01-01

    The delayed lotteries are much more common in everyday life than are pure lotteries. Usually, we need to wait to find out the outcome of the risky decision (e.g., investing in a stock market, engaging in a relationship). However, most research has studied the time discounting and probability discounting in isolation using the methodologies designed specifically to track changes in one parameter. Most commonly used method is adjusting, but its reported validity and time stability in research on discounting are suboptimal. The goal of this study was to introduce the novel method for analyzing delayed lotteries-conjoint analysis-which hypothetically is more suitable for analyzing individual preferences in this area. A set of two studies compared the conjoint analysis with adjusting. The results suggest that individual parameters of discounting strength estimated with conjoint have higher predictive value (Study 1 and 2), and they are more stable over time (Study 2) compared to adjusting. We discuss these findings, despite the exploratory character of reported studies, by suggesting that future research on delayed lotteries should be cross-validated using both methods.

  19. Introducing conjoint analysis method into delayed lotteries studies: Its validity and time stability are higher than in adjusting

    Directory of Open Access Journals (Sweden)

    Michal eBialek

    2015-01-01

    Full Text Available The delayed lotteries are much more common in everyday life than are pure lotteries. Usually, we need to wait to find out the outcome of the risky decision (e.g., investing in a stock market, engaging in a relationship. However, most research has studied the time discounting and probability discounting in isolation using the methodologies designed specifically to track changes in one parameter. Most commonly used method is adjusting, but its reported validity and time stability in research on discounting are suboptimal.The goal of this study was to introduce the novel method for analyzing delayed lotteries - conjoint analysis - which hypothetically is more suitable for analyzing individual preferences in this area. A set of two studies compared the conjoint analysis with adjusting. The results suggest that individual parameters of discounting strength estimated with conjoint have higher predictive value (Study 1 & 2, and they are more stable over time (Study 2 compared to adjusting. We discuss these findings, despite the exploratory character of reported studies, by suggesting that future research on delayed lotteries should be cross-validated using both methods.

  20. Comparison of Three Supervised Learning Methods for Digital Soil Mapping: Application to a Complex Terrain in the Ecuadorian Andes

    Directory of Open Access Journals (Sweden)

    Martin Hitziger

    2014-01-01

    Full Text Available A digital soil mapping approach is applied to a complex, mountainous terrain in the Ecuadorian Andes. Relief features are derived from a digital elevation model and used as predictors for topsoil texture classes sand, silt, and clay. The performance of three statistical learning methods is compared: linear regression, random forest, and stochastic gradient boosting of regression trees. In linear regression, a stepwise backward variable selection procedure is applied and overfitting is controlled by minimizing Mallow’s Cp. For random forest and boosting, the effect of predictor selection and tuning procedures is assessed. 100-fold repetitions of a 5-fold cross-validation of the selected modelling procedures are employed for validation, uncertainty assessment, and method comparison. Absolute assessment of model performance is achieved by comparing the prediction error of the selected method and the mean. Boosting performs best, providing predictions that are reliably better than the mean. The median reduction of the root mean square error is around 5%. Elevation is the most important predictor. All models clearly distinguish ridges and slopes. The predicted texture patterns are interpreted as result of catena sequences (eluviation of fine particles on slope shoulders and landslides (mixing up mineral soil horizons on slopes.

  1. Missing in space: an evaluation of imputation methods for missing data in spatial analysis of risk factors for type II diabetes.

    Science.gov (United States)

    Baker, Jannah; White, Nicole; Mengersen, Kerrie

    2014-11-20

    Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.

  2. Comparing the applicability of some geostatistical methods to predict the spatial distribution of topsoil Calcium Carbonate in part of farmland of Zanjan Province

    Science.gov (United States)

    Sarmadian, Fereydoon; Keshavarzi, Ali

    2010-05-01

    Most of soils in iran, were located in the arid and semi-arid regions and have high pH (more than 7) and high amount of calcium carbonate and this problem cause to their calcification.In calcareous soils, plant growing and production is difficult. Most part of this problem, in relation to high pH and high concentration of calcium ion that cause to fixation and unavailability of elements which were dependent to pH, especially Phosphorous and some micro nutrients such as Fe, Zn, Mn and Cu. Prediction of soil calcium carbonate in non-sampled areas and mapping the calcium carbonate variability in order to sustainable management of soil fertility is very important.So, this research was done with the aim of evaluation and analyzing spatial variability of topsoil calcium carbonate as an aspect of soil fertility and plant nutrition, comparing geostatistical methods such as kriging and co-kriging and mapping topsoil calcium carbonate. For geostatistical analyzing, sampling was done with stratified random method and soil samples from 0 to 15 cm depth were collected with auger within 23 locations.In co-kriging method, salinity data was used as auxiliary variable. For comparing and evaluation of geostatistical methods, cross validation were used by statistical parameters of RMSE. The results showed that co-kriging method has the highest correlation coefficient and less RMSE and has the higher accuracy than kriging method to prediction of calcium carbonate content in non-sampled areas.

  3. Chromatographic methods

    International Nuclear Information System (INIS)

    Marhol, M.; Stary, J.

    1975-01-01

    The characteristics are given of chromatographic separation and the methods are listed. Methods and data on materials used in partition, adsorption, precipitation and ion exchange chromatography are listed and conditions are described under which ion partition takes place. Special attention is devoted to ion exchange chromatography where tables are given to show the course of values of the partition coefficients of different ions in dependence on the concentration of agents and the course of equilibrium sorptions on different materials in dependence on the solution pH. A theoretical analysis is given and the properties of the most widely used ion exchangers are listed. Experimental conditions and apparatus used for each type of chromatography are listed. (L.K.)

  4. SELF REPORT ASSESSMENT OF ANXIETY - A CROSS VALIDATION OF THE LEHRER WOOLFOLK ANXIETY SYMPTOM QUESTIONNAIRE IN 3 POPULATIONS

    NARCIS (Netherlands)

    SCHOLING, A; EMMELKAMP, PMG

    This study was meant to investigate the psychometric properties and clinical utility of the Lehrer Woolfolk Anxiety Symptom Questionnaire (LWASQ), an instrument for assessment of somatic, behavioral and cognitive aspects of anxiety. Confirmatory factor analysis on data from social phobics (n = 108),

  5. Structure-independent cross-validation between residual dipolar couplings originating from internal and external orienting media

    International Nuclear Information System (INIS)

    Barbieri, Renato; Bertini, Ivano; Lee, Yong-Min; Luchinat, Claudio; Velders, Aldrik H.

    2002-01-01

    Lanthanide-substituted calcium binding proteins are known to partially orient in high magnetic fields. Orientation provides residual dipolar couplings (rdc's). Two of these systems, Tm 3+ - and Dy 3+ -substituted calbindin D 9k , dissolved in an external orienting medium (nonionic liquid crystalline phase) provide rdc values which are the sum of those induced by the lanthanides and by the liquid crystalline phase on the native calcium binding protein. This structure-independent check shows the innocence of the orienting medium with respect to the structure of the protein in solution. Furthermore, the simultaneous use of lanthanide substitution and external orienting media provides a further effective tool to control and tune the orientation tensor

  6. Sex and age specific prediction formulas for estimating body composition from bioelectrical impedance : a cross-validation study

    NARCIS (Netherlands)

    Deurenberg, P.; van der Kooy, K; Leenen, R; Weststrate, J A; Seidell, J C

    In 827 male and female subjects, with a large variation in body composition and an age range of 7-83 years, body composition was measured by densitometry, anthropometry and bioelectrical impedance. The relationship between densitometrically determined fat free mass (FFM) with body impedance (R),

  7. Two-receiver measurements of phase velocity: cross-validation of ambient-noise and earthquake-based observations

    NARCIS (Netherlands)

    Kästle, Emanuel D.; Soomro, Riaz; Weemstra, C.; Boschi, Lapo; Meier, Thomas

    2016-01-01

    Phase velocities derived from ambient-noise cross-correlation are compared with phase velocities calculated from cross-correlations of waveform recordings of teleseismic earthquakes whose epicentres are approximately on the station–station great circle. The comparison is conducted both for Rayleigh

  8. Cross-validation of biomarkers for the early differential diagnosis and prognosis of dementia in a clinical setting

    International Nuclear Information System (INIS)

    Perani, Daniela; Cerami, Chiara; Caminiti, Silvia Paola; Santangelo, Roberto; Coppi, Elisabetta; Ferrari, Laura; Magnani, Giuseppe; Pinto, Patrizia; Passerini, Gabriella; Falini, Andrea; Iannaccone, Sandro; Cappa, Stefano Francesco; Comi, Giancarlo; Gianolli, Luigi

    2016-01-01

    The aim of this study was to evaluate the supportive role of molecular and structural biomarkers (CSF protein levels, FDG PET and MRI) in the early differential diagnosis of dementia in a large sample of patients with neurodegenerative dementia, and in determining the risk of disease progression in subjects with mild cognitive impairment (MCI). We evaluated the supportive role of CSF Aβ 42 , t-Tau, p-Tau levels, conventional brain MRI and visual assessment of FDG PET SPM t-maps in the early diagnosis of dementia and the evaluation of MCI progression. Diagnosis based on molecular biomarkers showed the best fit with the final diagnosis at a long follow-up. FDG PET SPM t-maps had the highest diagnostic accuracy in Alzheimer's disease and in the differential diagnosis of non-Alzheimer's disease dementias. The p-tau/Aβ 42 ratio was the only CSF biomarker providing a significant classification rate for Alzheimer's disease. An Alzheimer's disease-positive metabolic pattern as shown by FDG PET SPM in MCI was the best predictor of conversion to Alzheimer's disease. In this clinical setting, FDG PET SPM t-maps and the p-tau/Aβ 42 ratio improved clinical diagnostic accuracy, supporting the importance of these biomarkers in the emerging diagnostic criteria for Alzheimer's disease dementia. FDG PET using SPM t-maps had the highest predictive value by identifying hypometabolic patterns in different neurodegenerative dementias and normal brain metabolism in MCI, confirming its additional crucial exclusionary role. (orig.)

  9. Cross-validation of biomarkers for the early differential diagnosis and prognosis of dementia in a clinical setting

    Energy Technology Data Exchange (ETDEWEB)

    Perani, Daniela [Vita-Salute San Raffaele University, Milan (Italy); San Raffaele Scientific Institute, Division of Neuroscience, Milan (Italy); San Raffaele Hospital, Nuclear Medicine Unit, Milan (Italy); Cerami, Chiara [Vita-Salute San Raffaele University, Milan (Italy); San Raffaele Scientific Institute, Division of Neuroscience, Milan (Italy); San Raffaele Hospital, Clinical Neuroscience Department, Milan (Italy); Caminiti, Silvia Paola [Vita-Salute San Raffaele University, Milan (Italy); San Raffaele Scientific Institute, Division of Neuroscience, Milan (Italy); Santangelo, Roberto; Coppi, Elisabetta; Ferrari, Laura; Magnani, Giuseppe [San Raffaele Hospital, Department of Neurology, Milan (Italy); Pinto, Patrizia [Papa Giovanni XXIII Hospital, Department of Neurology, Bergamo (Italy); Passerini, Gabriella [Servizio di Medicina di Laboratorio OSR, Milan (Italy); Falini, Andrea [Vita-Salute San Raffaele University, Milan (Italy); San Raffaele Scientific Institute, Division of Neuroscience, Milan (Italy); San Raffaele Hospital, CERMAC - Department of Neuroradiology, Milan (Italy); Iannaccone, Sandro [San Raffaele Hospital, Clinical Neuroscience Department, Milan (Italy); Cappa, Stefano Francesco [San Raffaele Scientific Institute, Division of Neuroscience, Milan (Italy); IUSS Pavia, Pavia (Italy); Comi, Giancarlo [Vita-Salute San Raffaele University, Milan (Italy); San Raffaele Hospital, Department of Neurology, Milan (Italy); Gianolli, Luigi [San Raffaele Hospital, Nuclear Medicine Unit, Milan (Italy)

    2016-03-15

    The aim of this study was to evaluate the supportive role of molecular and structural biomarkers (CSF protein levels, FDG PET and MRI) in the early differential diagnosis of dementia in a large sample of patients with neurodegenerative dementia, and in determining the risk of disease progression in subjects with mild cognitive impairment (MCI). We evaluated the supportive role of CSF Aβ{sub 42}, t-Tau, p-Tau levels, conventional brain MRI and visual assessment of FDG PET SPM t-maps in the early diagnosis of dementia and the evaluation of MCI progression. Diagnosis based on molecular biomarkers showed the best fit with the final diagnosis at a long follow-up. FDG PET SPM t-maps had the highest diagnostic accuracy in Alzheimer's disease and in the differential diagnosis of non-Alzheimer's disease dementias. The p-tau/Aβ{sub 42} ratio was the only CSF biomarker providing a significant classification rate for Alzheimer's disease. An Alzheimer's disease-positive metabolic pattern as shown by FDG PET SPM in MCI was the best predictor of conversion to Alzheimer's disease. In this clinical setting, FDG PET SPM t-maps and the p-tau/Aβ{sub 42} ratio improved clinical diagnostic accuracy, supporting the importance of these biomarkers in the emerging diagnostic criteria for Alzheimer's disease dementia. FDG PET using SPM t-maps had the highest predictive value by identifying hypometabolic patterns in different neurodegenerative dementias and normal brain metabolism in MCI, confirming its additional crucial exclusionary role. (orig.)

  10. Cross-validation and refinement of the Stoffenmanager as a first tier exposure assessment tool for REACH

    NARCIS (Netherlands)

    Schinkel, J.; Fransman, W.; Heussen, H.; Kromhout, H.; Marquart, H.; Tielemans, E.

    2010-01-01

    Objectives: For regulatory risk assessment under REACH a tiered approach is proposed in which the first tier models should provide a conservative exposure estimate that can discriminate between scenarios which are of concern and those which are not. The Stoffenmanager is mentioned as a first tier

  11. Cross-validation and refinement of the Stoffenmanager as a first tier exposure assessment tool for REACH.

    NARCIS (Netherlands)

    Schinkel, J.; Fransman, W.; Heussen, H.; Kromhout, H.; Marquart, H.; Tielemans, E.

    2010-01-01

    OBJECTIVES: For regulatory risk assessment under REACH a tiered approach is proposed in which the first tier models should provide a conservative exposure estimate that can discriminate between scenarios which are of concern and those which are not. The Stoffenmanager is mentioned as a first tier

  12. Numerical methods

    CERN Document Server

    Dahlquist, Germund

    1974-01-01

    ""Substantial, detailed and rigorous . . . readers for whom the book is intended are admirably served."" - MathSciNet (Mathematical Reviews on the Web), American Mathematical Society.Practical text strikes fine balance between students' requirements for theoretical treatment and needs of practitioners, with best methods for large- and small-scale computing. Prerequisites are minimal (calculus, linear algebra, and preferably some acquaintance with computer programming). Text includes many worked examples, problems, and an extensive bibliography.

  13. ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction

    Science.gov (United States)

    2013-01-01

    Background Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case–control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification. Results We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual’s continental and sub-continental ancestry. To predict an individual’s continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control’s λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of

  14. A computational method based on the integration of heterogeneous networks for predicting disease-gene associations.

    Directory of Open Access Journals (Sweden)

    Xingli Guo

    Full Text Available The identification of disease-causing genes is a fundamental challenge in human health and of great importance in improving medical care, and provides a better understanding of gene functions. Recent computational approaches based on the interactions among human proteins and disease similarities have shown their power in tackling the issue. In this paper, a novel systematic and global method that integrates two heterogeneous networks for prioritizing candidate disease-causing genes is provided, based on the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein interactions. In this method, the association score function between a query disease and a candidate gene is defined as the weighted sum of all the association scores between similar diseases and neighbouring genes. Moreover, the topological correlation of these two heterogeneous networks can be incorporated into the definition of the score function, and finally an iterative algorithm is designed for this issue. This method was tested with 10-fold cross-validation on all 1,126 diseases that have at least a known causal gene, and it ranked the correct gene as one of the top ten in 622 of all the 1,428 cases, significantly outperforming a state-of-the-art method called PRINCE. The results brought about by this method were applied to study three multi-factorial disorders: breast cancer, Alzheimer disease and diabetes mellitus type 2, and some suggestions of novel causal genes and candidate disease-causing subnetworks were provided for further investigation.

  15. Sampling methods

    International Nuclear Information System (INIS)

    Loughran, R.J.; Wallbrink, P.J.; Walling, D.E.; Appleby, P.G.

    2002-01-01

    Methods for the collection of soil samples to determine levels of 137 Cs and other fallout radionuclides, such as excess 210 Pb and 7 Be, will depend on the purposes (aims) of the project, site and soil characteristics, analytical capacity, the total number of samples that can be analysed and the sample mass required. The latter two will depend partly on detector type and capabilities. A variety of field methods have been developed for different field conditions and circumstances over the past twenty years, many of them inherited or adapted from soil science and sedimentology. The use of them inherited or adapted from soil science and sedimentology. The use of 137 Cs in erosion studies has been widely developed, while the application of fallout 210 Pb and 7 Be is still developing. Although it is possible to measure these nuclides simultaneously, it is common for experiments to designed around the use of 137 Cs along. Caesium studies typically involve comparison of the inventories found at eroded or sedimentation sites with that of a 'reference' site. An accurate characterization of the depth distribution of these fallout nuclides is often required in order to apply and/or calibrate the conversion models. However, depending on the tracer involved, the depth distribution, and thus the sampling resolution required to define it, differs. For example, a depth resolution of 1 cm is often adequate when using 137 Cs. However, fallout 210 Pb and 7 Be commonly has very strong surface maxima that decrease exponentially with depth, and fine depth increments are required at or close to the soil surface. Consequently, different depth incremental sampling methods are required when using different fallout radionuclides. Geomorphic investigations also frequently require determination of the depth-distribution of fallout nuclides on slopes and depositional sites as well as their total inventories

  16. Differentiating invasive and pre-invasive lung cancer by quantitative analysis of histopathologic images

    Science.gov (United States)

    Zhou, Chuan; Sun, Hongliu; Chan, Heang-Ping; Chughtai, Aamer; Wei, Jun; Hadjiiski, Lubomir; Kazerooni, Ella

    2018-02-01

    We are developing automated radiopathomics method for diagnosis of lung nodule subtypes. In this study, we investigated the feasibility of using quantitative methods to analyze the tumor nuclei and cytoplasm in pathologic wholeslide images for the classification of pathologic subtypes of invasive nodules and pre-invasive nodules. We developed a multiscale blob detection method with watershed transform (MBD-WT) to segment the tumor cells. Pathomic features were extracted to characterize the size, morphology, sharpness, and gray level variation in each segmented nucleus and the heterogeneity patterns of tumor nuclei and cytoplasm. With permission of the National Lung Screening Trial (NLST) project, a data set containing 90 digital haematoxylin and eosin (HE) whole-slide images from 48 cases was used in this study. The 48 cases contain 77 regions of invasive subtypes and 43 regions of pre-invasive subtypes outlined by a pathologist on the HE images using the pathological tumor region description provided by NLST as reference. A logistic regression model (LRM) was built using leave-one-case-out resampling and receiver operating characteristic (ROC) analysis for classification of invasive and pre-invasive subtypes. With 11 selected features, the LRM achieved a test area under the ROC curve (AUC) value of 0.91+/-0.03. The results demonstrated that the pathologic invasiveness of lung adenocarcinomas could be categorized with high accuracy using pathomics analysis.

  17. Decontaminating method

    International Nuclear Information System (INIS)

    Furukawa, Toshiharu; Shibuya, Kiichiro.

    1985-01-01

    Purpose: To provide a method of eliminating radioactive contaminations capable of ease treatment for decontaminated liquid wastes and grinding materials. Method: Those organic grinding materials such as fine wall nuts shell pieces cause no secondary contaminations since they are softer as compared with inorganic grinding materials, less pulverizable upon collision against the surface to be treated, being capable of reusing and producing no fine scattering powder. In addition, they can be treated by burning. The organic grinding material and water are sprayed by a nozzle to the surface to be treated, and decontaminated liquid wastes are separated into solid components mainly composed of organic grinding materials and liquid components mainly composed of water by filtering. The thus separated solid components are recovered in a storage tank for reuse as the grinding material and, after repeating use, subjected to burning treatment. While on the other hand, water is recovered into a storage tank and, after repeating use, purified by passing through an ion exchange resin-packed column and decontaminated to discharge. (Horiuchi, T.)

  18. SVMTriP: a method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity.

    Directory of Open Access Journals (Sweden)

    Bo Yao

    Full Text Available Identifying protein surface regions preferentially recognizable by antibodies (antigenic epitopes is at the heart of new immuno-diagnostic reagent discovery and vaccine design, and computational methods for antigenic epitope prediction provide crucial means to serve this purpose. Many linear B-cell epitope prediction methods were developed, such as BepiPred, ABCPred, AAP, BCPred, BayesB, BEOracle/BROracle, and BEST, towards this goal. However, effective immunological research demands more robust performance of the prediction method than what the current algorithms could provide. In this work, a new method to predict linear antigenic epitopes is developed; Support Vector Machine has been utilized by combining the Tri-peptide similarity and Propensity scores (SVMTriP. Applied to non-redundant B-cell linear epitopes extracted from IEDB, SVMTriP achieves a sensitivity of 80.1% and a precision of 55.2% with a five-fold cross-validation. The AUC value is 0.702. The combination of similarity and propensity of tri-peptide subsequences can improve the prediction performance for linear B-cell epitopes. Moreover, SVMTriP is capable of recognizing viral peptides from a human protein sequence background. A web server based on our method is constructed for public use. The server and all datasets used in the current study are available at http://sysbio.unl.edu/SVMTriP.

  19. A systematic study of genome context methods: calibration, normalization and combination

    Directory of Open Access Journals (Sweden)

    Dale Joseph M

    2010-10-01

    Full Text Available Abstract Background Genome context methods have been introduced in the last decade as automatic methods to predict functional relatedness between genes in a target genome using the patterns of existence and relative locations of the homologs of those genes in a set of reference genomes. Much work has been done in the application of these methods to different bioinformatics tasks, but few papers present a systematic study of the methods and their combination necessary for their optimal use. Results We present a thorough study of the four main families of genome context methods found in the literature: phylogenetic profile, gene fusion, gene cluster, and gene neighbor. We find that for most organisms the gene neighbor method outperforms the phylogenetic profile method by as much as 40% in sensitivity, being competitive with the gene cluster method at low sensitivities. Gene fusion is generally the worst performing of the four methods. A thorough exploration of the parameter space for each method is performed and results across different target organisms are presented. We propose the use of normalization procedures as those used on microarray data for the genome context scores. We show that substantial gains can be achieved from the use of a simple normalization technique. In particular, the sensitivity of the phylogenetic profile method is improved by around 25% after normalization, resulting, to our knowledge, on the best-performing phylogenetic profile system in the literature. Finally, we show results from combining the various genome context methods into a single score. When using a cross-validation procedure to train the combiners, with both original and normalized scores as input, a decision tree combiner results in gains of up to 20% with respect to the gene neighbor method. Overall, this represents a gain of around 15% over what can be considered the state of the art in this area: the four original genome context methods combined using a

  20. CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods.

    Science.gov (United States)

    Zhang, Li; Ai, Haixin; Chen, Wen; Yin, Zimo; Hu, Huan; Zhu, Junfeng; Zhao, Jian; Zhao, Qi; Liu, Hongsheng

    2017-05-18

    Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).

  1. A Bipartite Network-based Method for Prediction of Long Non-coding RNA–protein Interactions

    Directory of Open Access Journals (Sweden)

    Mengqu Ge

    2016-02-01

    Full Text Available As one large class of non-coding RNAs (ncRNAs, long ncRNAs (lncRNAs have gained considerable attention in recent years. Mutations and dysfunction of lncRNAs have been implicated in human disorders. Many lncRNAs exert their effects through interactions with the corresponding RNA-binding proteins. Several computational approaches have been developed, but only few are able to perform the prediction of these interactions from a network-based point of view. Here, we introduce a computational method named lncRNA–protein bipartite network inference (LPBNI. LPBNI aims to identify potential lncRNA–interacting proteins, by making full use of the known lncRNA–protein interactions. Leave-one-out cross validation (LOOCV test shows that LPBNI significantly outperforms other network-based methods, including random walk (RWR and protein-based collaborative filtering (ProCF. Furthermore, a case study was performed to demonstrate the performance of LPBNI using real data in predicting potential lncRNA–interacting proteins.

  2. Prediction of gas chromatography/electron capture detector retention times of chlorinated pesticides, herbicides, and organohalides by multivariate chemometrics methods

    International Nuclear Information System (INIS)

    Ghasemi, Jahanbakhsh; Asadpour, Saeid; Abdolmaleki, Azizeh

    2007-01-01

    A quantitative structure-retention relationship (QSRR) study, has been carried out on the gas chromatograph/electron capture detector (GC/ECD) system retention times (t R s) of 38 diverse chlorinated pesticides, herbicides, and organohalides by using molecular structural descriptors. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR) and partial least squares (PLS) regression. The stepwise regression using SPSS was used for the selection of the variables that resulted in the best-fitted models. Appropriate models with low standard errors and high correlation coefficients were obtained. Three types of molecular descriptors including electronic, steric and thermodynamic were used to develop a quantitative relationship between the retention times and structural properties. MLR and PLS analysis has been carried out to derive the best QSRR models. After variables selection, MLR and PLS methods used with leave-one-out cross validation for building the regression models. The predictive quality of the QSRR models were tested for an external prediction set of 12 compounds randomly chosen from 38 compounds. The PLS regression method was used to model the structure-retention relationships, more accurately. However, the results surprisingly showed more or less the same quality for MLR and PLS modeling according to squared regression coefficients R 2 which were 0.951 and 0.948 for MLR and PLS, respectively

  3. A method to assess the influence of individual player performance distribution on match outcome in team sports.

    Science.gov (United States)

    Robertson, Sam; Gupta, Ritu; McIntosh, Sam

    2016-10-01

    This study developed a method to determine whether the distribution of individual player performances can be modelled to explain match outcome in team sports, using Australian Rules football as an example. Player-recorded values (converted to a percentage of team total) in 11 commonly reported performance indicators were obtained for all regular season matches played during the 2014 Australian Football League season, with team totals also recorded. Multiple features relating to heuristically determined percentiles for each performance indicator were then extracted for each team and match, along with the outcome (win/loss). A generalised estimating equation model comprising eight key features was developed, explaining match outcome at a median accuracy of 63.9% under 10-fold cross-validation. Lower 75th, 90th and 95th percentile values for team goals and higher 25th and 50th percentile values for disposals were linked with winning. Lower 95th and higher 25th percentile values for Inside 50s and Marks, respectively, were also important contributors. These results provide evidence supporting team strategies which aim to obtain an even spread of goal scorers in Australian Rules football. The method developed in this investigation could be used to quantify the importance of individual contributions to overall team performance in team sports.

  4. Study of the method of water-injected meat identifying based on low-field nuclear magnetic resonance

    Science.gov (United States)

    Xu, Jianmei; Lin, Qing; Yang, Fang; Zheng, Zheng; Ai, Zhujun

    2018-01-01

    The aim of this study to apply low-field nuclear magnetic resonance technique was to study regular variation of the transverse relaxation spectral parameters of water-injected meat with the proportion of water injection. Based on this, the method of one-way ANOVA and discriminant analysis was used to analyse the differences between these parameters in the capacity of distinguishing water-injected proportion, and established a model for identifying water-injected meat. The results show that, except for T 21b, T 22e and T 23b, the other parameters of the T 2 relaxation spectrum changed regularly with the change of water-injected proportion. The ability of different parameters to distinguish water-injected proportion was different. Based on S, P 22 and T 23m as the prediction variable, the Fisher model and the Bayes model were established by discriminant analysis method, qualitative and quantitative classification of water-injected meat can be realized. The rate of correct discrimination of distinguished validation and cross validation were 88%, the model was stable.

  5. Classification of edible oils and modeling of their physico-chemical properties by chemometric methods using mid-IR spectroscopy

    Science.gov (United States)

    Luna, Aderval S.; da Silva, Arnaldo P.; Ferré, Joan; Boqué, Ricard

    This research work describes two studies for the classification and characterization of edible oils and its quality parameters through Fourier transform mid infrared spectroscopy (FT-mid-IR) together with chemometric methods. The discrimination of canola, sunflower, corn and soybean oils was investigated using SVM-DA, SIMCA and PLS-DA. Using FT-mid-IR, DPLS was able to classify 100% of the samples from the validation set, but SIMCA and SVM-DA were not. The quality parameters: refraction index and relative density of edible oils were obtained from reference methods. Prediction models for FT-mid-IR spectra were calculated for these quality parameters using partial least squares (PLS) and support vector machines (SVM). Several preprocessing alternatives (first derivative, multiplicative scatter correction, mean centering, and standard normal variate) were investigated. The best result for the refraction index was achieved with SVM as well as for the relative density except when the preprocessing combination of mean centering and first derivative was used. For both of quality parameters, the best results obtained for the figures of merit expressed by the root mean square error of cross validation (RMSECV) and prediction (RMSEP) were equal to 0.0001.

  6. Spatial interpolation and radiological mapping of ambient gamma dose rate by using artificial neural networks and fuzzy logic methods.

    Science.gov (United States)

    Yeşilkanat, Cafer Mert; Kobya, Yaşar; Taşkın, Halim; Çevik, Uğur

    2017-09-01

    The aim of this study was to determine spatial risk dispersion of ambient gamma dose rate (AGDR) by using both artificial neural network (ANN) and fuzzy logic (FL) methods, compare the performances of methods, make dose estimations for intermediate stations with no previous measurements and create dose rate risk maps of the study area. In order to determine the dose distribution by using artificial neural networks, two main networks and five different network structures were used; feed forward ANN; Multi-layer perceptron (MLP), Radial basis functional neural network (RBFNN), Quantile regression neural network (QRNN) and recurrent ANN; Jordan networks (JN), Elman networks (EN). In the evaluation of estimation performance obtained for the test data, all models appear to give similar results. According to the cross-validation results obtained for explaining AGDR distribution, Pearson's r coefficients were calculated as 0.94, 0.91, 0.89, 0.91, 0.91 and 0.92 and RMSE values were calculated as 34.78, 43.28, 63.92, 44.86, 46.77 and 37.92 for MLP, RBFNN, QRNN, JN, EN and FL, respectively. In addition, spatial risk maps showing distributions of AGDR of the study area were created by all models and results were compared with geological, topological and soil structure. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Computer-Aided Diagnosis for Breast Ultrasound Using Computerized BI-RADS Features and Machine Learning Methods.

    Science.gov (United States)

    Shan, Juan; Alam, S Kaisar; Garra, Brian; Zhang, Yingtao; Ahmed, Tahira

    2016-04-01

    This work identifies effective computable features from the Breast Imaging Reporting and Data System (BI-RADS), to develop a computer-aided diagnosis (CAD) system for breast ultrasound. Computerized features corresponding to ultrasound BI-RADs categories were designed and tested using a database of 283 pathology-proven benign and malignant lesions. Features were selected based on classification performance using a "bottom-up" approach for different machine learning methods, including decision tree, artificial neural network, random forest and support vector machine. Using 10-fold cross-validation on the database of 283 cases, the highest area under the receiver operating characteristic (ROC) curve (AUC) was 0.84 from a support vector machine with 77.7% overall accuracy; the highest overall accuracy, 78.5%, was from a random forest with the AUC 0.83. Lesion margin and orientation were optimum features common to all of the different machine learning methods. These features can be used in CAD systems to help distinguish benign from worrisome lesions. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. All rights reserved.

  8. WELDING METHOD

    Science.gov (United States)

    Cornell, A.A.; Dunbar, J.V.; Ruffner, J.H.

    1959-09-29

    A semi-automatic method is described for the weld joining of pipes and fittings which utilizes the inert gasshielded consumable electrode electric arc welding technique, comprising laying down the root pass at a first peripheral velocity and thereafter laying down the filler passes over the root pass necessary to complete the weld by revolving the pipes and fittings at a second peripheral velocity different from the first peripheral velocity, maintaining the welding head in a fixed position as to the specific direction of revolution, while the longitudinal axis of the welding head is disposed angularly in the direction of revolution at amounts between twenty minutas and about four degrees from the first position.

  9. Casting methods

    Science.gov (United States)

    Marsden, Kenneth C.; Meyer, Mitchell K.; Grover, Blair K.; Fielding, Randall S.; Wolfensberger, Billy W.

    2012-12-18

    A casting device includes a covered crucible having a top opening and a bottom orifice, a lid covering the top opening, a stopper rod sealing the bottom orifice, and a reusable mold having at least one chamber, a top end of the chamber being open to and positioned below the bottom orifice and a vacuum tap into the chamber being below the top end of the chamber. A casting method includes charging a crucible with a solid material and covering the crucible, heating the crucible, melting the material, evacuating a chamber of a mold to less than 1 atm absolute through a vacuum tap into the chamber, draining the melted material into the evacuated chamber, solidifying the material in the chamber, and removing the solidified material from the chamber without damaging the chamber.

  10. Radiochemical methods

    International Nuclear Information System (INIS)

    Geary, W.J.

    1986-01-01

    This little volume is one of an extended series of basic textbooks on analytical chemistry produced by the Analytical Chemistry by Open Learning project in the UK. Prefatory sections explain its mission, and how to use the Open Learning format. Seventeen specific sections organized into five chaptrs begin with a general discussion of nuclear properties, types, and laws of nuclear decay and proceeds to specific discussions of three published papers (reproduced in their entirety) giving examples of radiochemical methods which were discussed in the previous chapter. Each section begins with an overview, contains one or more practical problems (called self-assessment questions or SAQ's), and concludes with a summary and a list of objectives for the student. Following the main body are answers to the SAQ's, and several tables of physical constants, SI prefixes, etc. A periodic table graces the inside back cover

  11. [Establishment of the Mathematical Model for PMI Estimation Using FTIR Spectroscopy and Data Mining Method].

    Science.gov (United States)

    Wang, L; Qin, X C; Lin, H C; Deng, K F; Luo, Y W; Sun, Q R; Du, Q X; Wang, Z Y; Tuo, Y; Sun, J H

    2018-02-01

    To analyse the relationship between Fourier transform infrared (FTIR) spectrum of rat's spleen tissue and postmortem interval (PMI) for PMI estimation using FTIR spectroscopy combined with data mining method. Rats were sacrificed by cervical dislocation, and the cadavers were placed at 20 ℃. The FTIR spectrum data of rats' spleen tissues were taken and measured at different time points. After pretreatment, the data was analysed by data mining method. The absorption peak intensity of rat's spleen tissue spectrum changed with the PMI, while the absorption peak position was unchanged. The results of principal component analysis (PCA) showed that the cumulative contribution rate of the first three principal components was 96%. There was an obvious clustering tendency for the spectrum sample at each time point. The methods of partial least squares discriminant analysis (PLS-DA) and support vector machine classification (SVMC) effectively divided the spectrum samples with different PMI into four categories (0-24 h, 48-72 h, 96-120 h and 144-168 h). The determination coefficient ( R ²) of the PMI estimation model established by PLS regression analysis was 0.96, and the root mean square error of calibration (RMSEC) and root mean square error of cross validation (RMSECV) were 9.90 h and 11.39 h respectively. In prediction set, the R ² was 0.97, and the root mean square error of prediction (RMSEP) was 10.49 h. The FTIR spectrum of the rat's spleen tissue can be effectively analyzed qualitatively and quantitatively by the combination of FTIR spectroscopy and data mining method, and the classification and PLS regression models can be established for PMI estimation. Copyright© by the Editorial Department of Journal of Forensic Medicine.

  12. [Application of optimized parameters SVM based on photoacoustic spectroscopy method in fault diagnosis of power transformer].

    Science.gov (United States)

    Zhang, Yu-xin; Cheng, Zhi-feng; Xu, Zheng-ping; Bai, Jing

    2015-01-01

    In order to solve the problems such as complex operation, consumption for the carrier gas and long test period in traditional power transformer fault diagnosis approach based on dissolved gas analysis (DGA), this paper proposes a new method which is detecting 5 types of characteristic gas content in transformer oil such as CH4, C2H2, C2H4, C2H6 and H2 based on photoacoustic Spectroscopy and C2H2/C2H4, CH4/H2, C2H4/C2H6 three-ratios data are calculated. The support vector machine model was constructed using cross validation method under five support vector machine functions and four kernel functions, heuristic algorithms were used in parameter optimization for penalty factor c and g, which to establish the best SVM model for the highest fault diagnosis accuracy and the fast computing speed. Particles swarm optimization and genetic algorithm two types of heuristic algorithms were comparative studied in this paper for accuracy and speed in optimization. The simulation result shows that SVM model composed of C-SVC, RBF kernel functions and genetic algorithm obtain 97. 5% accuracy in test sample set and 98. 333 3% accuracy in train sample set, and genetic algorithm was about two times faster than particles swarm optimization in computing speed. The methods described in this paper has many advantages such as simple operation, non-contact measurement, no consumption for the carrier gas, long test period, high stability and sensitivity, the result shows that the methods described in this paper can instead of the traditional transformer fault diagnosis by gas chromatography and meets the actual project needs in transformer fault diagnosis.

  13. Moment methods and Lanczos methods

    International Nuclear Information System (INIS)

    Whitehead, R.R.

    1980-01-01

    In contrast to many of the speakers at this conference I am less interested in average properties of nuclei than in detailed spectroscopy. I will try to show, however, that the two are very closely connected and that shell-model calculations may be used to give a great deal of information not normally associated with the shell-model. It has been demonstrated clearly to us that the level spacing fluctuations in nuclear spectra convey very little physical information. This is true when the fluctuations are averaged over the entire spectrum but not if one's interest is in the lowest few states, whose spacings are relatively large. If one wishes to calculate a ground state (say) accurately, that is with an error much smaller than the excitation energy of the first excited state, very high moments, μ/sub n/, n approx. 200, are needed. As I shall show, we use such moments as a matter of course, albeit without actually calculating them; in fact I will try to show that, if at all possible, the actual calculations of moments is to be avoided like the plague. At the heart of the new shell-model methods embodied in the Glasgow shell-model program and one or two similar ones is the so-called Lanczos method and this, it turns out, has many deep and subtle connections with the mathematical theory of moments. It is these connections that I will explore here

  14. A PLS-based extractive spectrophotometric method for simultaneous determination of carbamazepine and carbamazepine-10,11-epoxide in plasma and comparison with HPLC

    Science.gov (United States)

    Hemmateenejad, Bahram; Rezaei, Zahra; Khabnadideh, Soghra; Saffari, Maryam

    2007-11-01

    Carbamazepine (CBZ) undergoes enzyme biotransformation through epoxidation with the formation of its metabolite, carbamazepine-10,11-epoxide (CBZE). A simple chemometrics-assisted spectrophotometric method has been proposed for simultaneous determination of CBZ and CBZE in plasma. A liquid extraction procedure was operated to separate the analytes from plasma, and the UV absorbance spectra of the resultant solutions were subjected to partial least squares (PLS) regression. The optimum number of PLS latent variables was selected according to the PRESS values of leave-one-out cross-validation. A HPLC method was also employed for comparison. The respective mean recoveries for analysis of CBZ and CBZE in synthetic mixtures were 102.57 (±0.25)% and 103.00 (±0.09)% for PLS and 99.40 (±0.15)% and 102.20 (±0.02)%. The concentrations of CBZ and CBZE were also determined in five patients using the PLS and HPLC methods. The results showed that the data obtained by PLS were comparable with those obtained by HPLC method.

  15. On method

    Directory of Open Access Journals (Sweden)

    Frederik Kortlandt

    2018-01-01

    Full Text Available The basis of linguistic reconstruction is the comparative method, which starts from the assumption that there is “a stronger affinity, both in the roots of verbs and in the forms of grammar, than could possibly have been produced by accident”, implying the existence of a common source (thus Sir William Jones in 1786. It follows that there must be a possible sequence of developments from the reconstructed system to the attested data. These developments must have been either phonetically regular or analogical. The latter type of change requires a model and a motivation. A theory which does not account for the data in terms of sound laws and well-motivated analogical changes is not a linguistic reconstruction but philosophical speculation.The pre-laryngealist idea that any Proto-Indo-European long vowel became acute in Balto-Slavic is a typical example of philosophical speculation contradicted by the comparative evidence. Other examples are spontaneous glottalization (Jasanoff’s “acute assignment”, unattested anywhere in the world, Jasanoff’s trimoraic long vowels, Eichner’s law, Osthoff’s law, and Szemerényi’s law, which is an instance of circular reasoning. The Balto-Slavic acute continues the Proto-Indo-European laryngeals and the glottalic feature of the traditional Proto-Indo-European “unaspirated voiced” obstruents (Winter’s law. My reconstruction of Proto-Indo-European glottalic obstruents is based on direct evidence from Indo-Iranian, Armenian, Baltic and Germanic and indirect evidence from Indo-Iranian, Greek, Latin and Slavic.

  16. A support vector machine designed to identify breasts at high risk using multi-probe generated REIS signals: a preliminary assessment

    Science.gov (United States)

    Gur, David; Zheng, Bin; Lederman, Dror; Dhurjaty, Sreeram; Sumkin, Jules; Zuley, Margarita

    2010-02-01

    A new resonance-frequency based electronic impedance spectroscopy (REIS) system with multi-probes, including one central probe and six external probes that are designed to contact the breast skin in a circular form with a radius of 60 millimeters to the central ("nipple") probe, has been assembled and installed in our breast imaging facility. We are conducting a prospective clinical study to test the performance of this REIS system in identifying younger women (detection of a highly suspicious breast lesion and 50 were determined negative during mammography screening. REIS output signal sweeps that we used to compute an initial feature included both amplitude and phase information representing differences between corresponding (matched) EIS signal values acquired from the left and right breasts. A genetic algorithm was applied to reduce the feature set and optimize a support vector machine (SVM) to classify the REIS examinations into "biopsy recommended" and "non-biopsy" recommended groups. Using the leave-one-case-out testing method, the classification performance as measured by the area under the receiver operating characteristic (ROC) curve was 0.816 +/- 0.042. This pilot analysis suggests that the new multi-probe-based REIS system could potentially be used as a risk stratification tool to identify pre-screened young women who are at higher risk of having or developing breast cancer.

  17. Applying a radiomics approach to predict prognosis of lung cancer patients

    Science.gov (United States)

    Emaminejad, Nastaran; Yan, Shiju; Wang, Yunzhi; Qian, Wei; Guan, Yubao; Zheng, Bin

    2016-03-01

    Radiomics is an emerging technology to decode tumor phenotype based on quantitative analysis of image features computed from radiographic images. In this study, we applied Radiomics concept to investigate the association among the CT image features of lung tumors, which are either quantitatively computed or subjectively rated by radiologists, and two genomic biomarkers namely, protein expression of the excision repair cross-complementing 1 (ERCC1) genes and a regulatory subunit of ribonucleotide reductase (RRM1), in predicting disease-free survival (DFS) of lung cancer patients after surgery. An image dataset involving 94 patients was used. Among them, 20 had cancer recurrence within 3 years, while 74 patients remained DFS. After tumor segmentation, 35 image features were computed from CT images. Using the Weka data mining software package, we selected 10 non-redundant image features. Applying a SMOTE algorithm to generate synthetic data to balance case numbers in two DFS ("yes" and "no") groups and a leave-one-case-out training/testing method, we optimized and compared a number of machine learning classifiers using (1) quantitative image (QI) features, (2) subjective rated (SR) features, and (3) genomic biomarkers (GB). Data analyses showed relatively lower correlation among the QI, SR and GB prediction results (with Pearson correlation coefficients 0.5). Among them, using QI yielded the highest performance.

  18. ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins

    Directory of Open Access Journals (Sweden)

    Raghava Gajendra PS

    2008-11-01

    Full Text Available Abstract Background The expansion of raw protein sequence databases in the post genomic era and availability of fresh annotated sequences for major localizations particularly motivated us to introduce a new improved version of our previously forged eukaryotic subcellular localizations prediction method namely "ESLpred". Since, subcellular localization of a protein offers essential clues about its functioning, hence, availability of localization predictor would definitely aid and expedite the protein deciphering studies. However, robustness of a predictor is highly dependent on the superiority of dataset and extracted protein attributes; hence, it becomes imperative to improve the performance of presently available method using latest dataset and crucial input features. Results Here, we describe augmentation in the prediction performance obtained for our most popular ESLpred method using new crucial features as an input to Support Vector Machine (SVM. In addition, recently available, highly non-redundant dataset encompassing three kingdoms specific protein sequence sets; 1198 fungi sequences, 2597 from animal and 491 plant sequences were also included in the present study. First, using the evolutionary information in the form of profile composition along with whole and N-terminal sequence composition as an input feature vector of 440 dimensions, overall accuracies of 72.7, 75.8 and 74.5% were achieved respectively after five-fold cross-validation. Further, enhancement in performance was observed when similarity search based results were coupled with whole and N-terminal sequence composition along with profile composition by yielding overall accuracies of 75.9, 80.8, 76.6% respectively; best accuracies reported till date on the same datasets. Conclusion These results provide confidence about the reliability and accurate prediction of SVM modules generated in the present study using sequence and profile compositions along with similarity search

  19. Automated lesion detection on MRI scans using combined unsupervised and supervised methods

    International Nuclear Information System (INIS)

    Guo, Dazhou; Fridriksson, Julius; Fillmore, Paul; Rorden, Christopher; Yu, Hongkai; Zheng, Kang; Wang, Song

    2015-01-01

    Accurate and precise detection of brain lesions on MR images (MRI) is paramount for accurately relating lesion location to impaired behavior. In this paper, we present a novel method to automatically detect brain lesions from a T1-weighted 3D MRI. The proposed method combines the advantages of both unsupervised and supervised methods. First, unsupervised methods perform a unified segmentation normalization to warp images from the native space into a standard space and to generate probability maps for different tissue types, e.g., gray matter, white matter and fluid. This allows us to construct an initial lesion probability map by comparing the normalized MRI to healthy control subjects. Then, we perform non-rigid and reversible atlas-based registration to refine the probability maps of gray matter, white matter, external CSF, ventricle, and lesions. These probability maps are combined with the normalized MRI to construct three types of features, with which we use supervised methods to train three support vector machine (SVM) classifiers for a combined classifier. Finally, the combined classifier is used to accomplish lesion detection. We tested this method using T1-weighted MRIs from 60 in-house stroke patients. Using leave-one-out cross validation, the proposed method can achieve an average Dice coefficient of 73.1 % when compared to lesion maps hand-delineated by trained neurologists. Furthermore, we tested the proposed method on the T1-weighted MRIs in the MICCAI BRATS 2012 dataset. The proposed method can achieve an average Dice coefficient of 66.5 % in comparison to the expert annotated tumor maps provided in MICCAI BRATS 2012 dataset. In addition, on these two test datasets, the proposed method shows competitive performance to three state-of-the-art methods, including Stamatakis et al., Seghier et al., and Sanjuan et al. In this paper, we introduced a novel automated procedure for lesion detection from T1-weighted MRIs by combining both an unsupervised and a

  20. Development of nondestructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Sang Dae; Lohumi, Santosh; Cho, Byoung Kwan [Dept. of Biosystems Machinery Engineering, Chungnam National University, Daejeon (Korea, Republic of); Kim, Moon Sung [United States Department of Agriculture Agricultural Research Service, Washington (United States); Lee, Soo Hee [Life and Technology Co.,Ltd., Hwasung (Korea, Republic of)

    2014-08-15

    This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The R{sup 2}{sub c} and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.

  1. An entropy regularization method applied to the identification of wave distribution function for an ELF hiss event

    Science.gov (United States)

    Prot, Olivier; SantolíK, OndřEj; Trotignon, Jean-Gabriel; Deferaudy, Hervé

    2006-06-01

    An entropy regularization algorithm (ERA) has been developed to compute the wave-energy density from electromagnetic field measurements. It is based on the wave distribution function (WDF) concept. To assess its suitability and efficiency, the algorithm is applied to experimental data that has already been analyzed using other inversion techniques. The FREJA satellite data that is used consists of six spectral matrices corresponding to six time-frequency points of an ELF hiss-event spectrogram. The WDF analysis is performed on these six points and the results are compared with those obtained previously. A statistical stability analysis confirms the stability of the solutions. The WDF computation is fast and without any prespecified parameters. The regularization parameter has been chosen in accordance with the Morozov's discrepancy principle. The Generalized Cross Validation and L-curve criterions are then tentatively used to provide a fully data-driven method. However, these criterions fail to determine a suitable value of the regularization parameter. Although the entropy regularization leads to solutions that agree fairly well with those already published, some differences are observed, and these are discussed in detail. The main advantage of the ERA is to return the WDF that exhibits the largest entropy and to avoid the use of a priori models, which sometimes seem to be more accurate but without any justification.

  2. A novel hybrid method of beta-turn identification in protein using binary logistic regression and neural network.

    Science.gov (United States)

    Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz

    2012-01-01

    From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.

  3. Fractal dimension to classify the heart sound recordings with KNN and fuzzy c-mean clustering methods

    Science.gov (United States)

    Juniati, D.; Khotimah, C.; Wardani, D. E. K.; Budayasa, K.

    2018-01-01

    The heart abnormalities can be detected from heart sound. A heart sound can be heard directly with a stethoscope or indirectly by a phonocardiograph, a machine of the heart sound recording. This paper presents the implementation of fractal dimension theory to make a classification of phonocardiograms into a normal heart sound, a murmur, or an extrasystole. The main algorithm used to calculate the fractal dimension was Higuchi’s Algorithm. There were two steps to make a classification of phonocardiograms, feature extraction, and classification. For feature extraction, we used Discrete Wavelet Transform to decompose the signal of heart sound into several sub-bands depending on the selected level. After the decomposition process, the signal was processed using Fast Fourier Transform (FFT) to determine the spectral frequency. The fractal dimension of the FFT output was calculated using Higuchi Algorithm. The classification of fractal dimension of all phonocardiograms was done with KNN and Fuzzy c-mean clustering methods. Based on the research results, the best accuracy obtained was 86.17%, the feature extraction by DWT decomposition level 3 with the value of kmax 50, using 5-fold cross validation and the number of neighbors was 5 at K-NN algorithm. Meanwhile, for fuzzy c-mean clustering, the accuracy was 78.56%.

  4. runjags: An R Package Providing Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS

    Directory of Open Access Journals (Sweden)

    Matthew J. Denwood

    2016-07-01

    Full Text Available The runjags package provides a set of interface functions to facilitate running Markov chain Monte Carlo models in JAGS from within R. Automated calculation of appropriate convergence and sample length diagnostics, user-friendly access to commonly used graphical outputs and summary statistics, and parallelized methods of running JAGS are provided. Template model specifications can be generated using a standard lme4-style formula interface to assist users less familiar with the BUGS syntax. Automated simulation study functions are implemented to facilitate model performance assessment, as well as drop-k type cross-validation studies, using high performance computing clusters such as those provided by parallel. A module extension for JAGS is also included within runjags, providing the Pareto family of distributions and a series of minimally-informative priors including the DuMouchel and half-Cauchy priors. This paper outlines the primary functions of this package, and gives an illustration of a simulation study to assess the sensitivity of two equivalent model formulations to different prior distributions.

  5. Development of nondestructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression

    International Nuclear Information System (INIS)

    Lee, Sang Dae; Lohumi, Santosh; Cho, Byoung Kwan; Kim, Moon Sung; Lee, Soo Hee

    2014-01-01

    This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The R 2 c and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.

  6. An Analysis of Dynamic Instability on TC-Like Vortex Using the Regularization-Based Eigenmode Linear Superposition Method

    Directory of Open Access Journals (Sweden)

    Shuang Liu

    2018-01-01

    Full Text Available In this paper, the eigenmode linear superposition (ELS method based on the regularization is used to discuss the distributions of all eigenmodes and the role of their instability to the intensity and structure change in TC-like vortex. Results show that the regularization approach can overcome the ill-posed problem occurring in solving mode weight coefficients as the ELS method are applied to analyze the impacts of dynamic instability on the intensity and structure change of TC-like vortex. The Generalized Cross-validation (GCV method and the L curve method are used to determine the regularization parameters, and the results of the two approaches are compared. It is found that the results based on the GCV method are closer to the given initial condition in the solution of the inverse problem of the vortex system. Then, the instability characteristic of the hollow vortex as the basic state are examined based on the linear barotropic shallow water equations. It is shown that the wavenumber distribution of system instability obtained from the ELS method is well consistent with that of the numerical analysis based on the norm mode. On the other hand, the evolution of the hollow vortex are discussed using the product of each eigenmode and its corresponding weight coefficient. Results show that the intensity and structure change of the system are mainly affected by the dynamic instability in the early stage of disturbance development, and the most unstable mode has a dominant role in the growth rate and the horizontal distribution of intense disturbance in the near-core region. Moreover, the wave structure of the most unstable mode possesses typical characteristics of mixed vortex Rossby-inertio-gravity waves (VRIGWs.

  7. Rapid discrimination between buffalo and cow milk and detection of adulteration of buffalo milk with cow milk using synchronous fluorescence spectroscopy in combination with multivariate methods.

    Science.gov (United States)

    Durakli Velioglu, Serap; Ercioglu, Elif; Boyaci, Ismail Hakki

    2017-05-01

    This research paper describes the potential of synchronous fluorescence (SF) spectroscopy for authentication of buffalo milk, a favourable raw material in the production of some premium dairy products. Buffalo milk is subjected to fraudulent activities like many other high priced foodstuffs. The current methods widely used for the detection of adulteration of buffalo milk have various disadvantages making them unattractive for routine analysis. Thus, the aim of the present study was to assess the potential of SF spectroscopy in combination with multivariate methods for rapid discrimination between buffalo and cow milk and detection of the adulteration of buffalo milk with cow milk. SF spectra of cow and buffalo milk samples were recorded between 400-550 nm excitation range with Δλ of 10-100 nm, in steps of 10 nm. The data obtained for ∆λ = 10 nm were utilised to classify the samples using principal component analysis (PCA), and detect the adulteration level of buffalo milk with cow milk using partial least square (PLS) methods. Successful discrimination of samples and detection of adulteration of buffalo milk with limit of detection value (LOD) of 6% are achieved with the models having root mean square error of calibration (RMSEC) and the root mean square error of cross-validation (RMSECV) and root mean square error of prediction (RMSEP) values of 2, 7, and 4%, respectively. The results reveal the potential of SF spectroscopy for rapid authentication of buffalo milk.

  8. Quantitative analysis of drug distribution by ambient mass spectrometry imaging method with signal extinction normalization strategy and inkjet-printing technology.

    Science.gov (United States)

    Luo, Zhigang; He, Jingjing; He, Jiuming; Huang, Lan; Song, Xiaowei; Li, Xin; Abliz, Zeper

    2018-03-01

    Quantitative mass spectrometry imaging (MSI) is a robust approach that provides both quantitative and spatial information for drug candidates' research. However, because of complicated signal suppression and interference, acquiring accurate quantitative information from MSI data remains a challenge, especially for whole-body tissue sample. Ambient MSI techniques using spray-based ionization appear to be ideal for pharmaceutical quantitative MSI analysis. However, it is more challenging, as it involves almost no sample preparation and is more susceptible to ion suppression/enhancement. Herein, based on our developed air flow-assisted desorption electrospray ionization (AFADESI)-MSI technology, an ambient quantitative MSI method was introduced by integrating inkjet-printing technology with normalization of the signal extinction coefficient (SEC) using the target compound itself. The method utilized a single calibration curve to quantify multiple tissue types. Basic blue 7 and an antitumor drug candidate (S-(+)-deoxytylophorinidine, CAT) were chosen to initially validate the feasibility and reliability of the quantitative MSI method. Rat tissue sections (heart, kidney, and brain) administered with CAT was then analyzed. The quantitative MSI analysis results were cross-validated by LC-MS/MS analysis data of the same tissues. The consistency suggests that the approach is able to fast obtain the quantitative MSI data without introducing interference into the in-situ environment of the tissue sample, and is potential to provide a high-throughput, economical and reliable approach for drug discovery and development. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. An optimized data fusion method and its application to improve lateral boundary conditions in winter for Pearl River Delta regional PM2.5 modeling, China

    Science.gov (United States)

    Huang, Zhijiong; Hu, Yongtao; Zheng, Junyu; Zhai, Xinxin; Huang, Ran

    2018-05-01

    Lateral boundary conditions (LBCs) are essential for chemical transport models to simulate regional transport; however they often contain large uncertainties. This study proposes an optimized data fusion approach to reduce the bias of LBCs by fusing gridded model outputs, from which the daughter domain's LBCs are derived, with ground-level measurements. The optimized data fusion approach follows the framework of a previous interpolation-based fusion method but improves it by using a bias kriging method to correct the spatial bias in gridded model outputs. Cross-validation shows that the optimized approach better estimates fused fields in areas with a large number of observations compared to the previous interpolation-based method. The optimized approach was applied to correct LBCs of PM2.5 concentrations for simulations in the Pearl River Delta (PRD) region as a case study. Evaluations show that the LBCs corrected by data fusion improve in-domain PM2.5 simulations in terms of the magnitude and temporal variance. Correlation increases by 0.13-0.18 and fractional bias (FB) decreases by approximately 3%-15%. This study demonstrates the feasibility of applying data fusion to improve regional air quality modeling.

  10. A MACHINE-LEARNING METHOD TO INFER FUNDAMENTAL STELLAR PARAMETERS FROM PHOTOMETRIC LIGHT CURVES

    International Nuclear Information System (INIS)

    Miller, A. A.; Bloom, J. S.; Richards, J. W.; Starr, D. L.; Lee, Y. S.; Butler, N. R.; Tokarz, S.; Smith, N.; Eisner, J. A.

    2015-01-01

    A fundamental challenge for wide-field imaging surveys is obtaining follow-up spectroscopic observations: there are >10 9 photometrically cataloged sources, yet modern spectroscopic surveys are limited to ∼few× 10 6 targets. As we approach the Large Synoptic Survey Telescope era, new algorithmic solutions are required to cope with the data deluge. Here we report the development of a machine-learning framework capable of inferring fundamental stellar parameters (T eff , log g, and [Fe/H]) using photometric-brightness variations and color alone. A training set is constructed from a systematic spectroscopic survey of variables with Hectospec/Multi-Mirror Telescope. In sum, the training set includes ∼9000 spectra, for which stellar parameters are measured using the SEGUE Stellar Parameters Pipeline (SSPP). We employed the random forest algorithm to perform a non-parametric regression that predicts T eff , log g, and [Fe/H] from photometric time-domain observations. Our final optimized model produces a cross-validated rms error (RMSE) of 165 K, 0.39 dex, and 0.33 dex for T eff , log g, and [Fe/H], respectively. Examining the subset of sources for which the SSPP measurements are most reliable, the RMSE reduces to 125 K, 0.37 dex, and 0.27 dex, respectively, comparable to what is achievable via low-resolution spectroscopy. For variable stars this represents a ≈12%-20% improvement in RMSE relative to models trained with single-epoch photometric colors. As an application of our method, we estimate stellar parameters for ∼54,000 known variables. We argue that this method may convert photometric time-domain surveys into pseudo-spectrographic engines, enabling the construction of extremely detailed maps of the Milky Way, its structure, and history

  11. Optical coherence elastography (OCE) as a method for identifying benign and malignant prostate biopsies

    Science.gov (United States)

    Li, Chunhui; Guan, Guangying; Ling, Yuting; Lang, Stephen; Wang, Ruikang K.; Huang, Zhihong; Nabi, Ghulam

    2015-03-01

    Objectives. Prostate cancer is the most frequently diagnosed malignancy in men. Digital rectal examination (DRE) - a known clinical tool based on alteration in the mechanical properties of tissues due to cancer has traditionally been used for screening prostate cancer. Essentially, DRE estimates relative stiffness of cancerous and normal prostate tissue. Optical coherence elastography (OCE) are new optical imaging techniques capable of providing cross-sectional imaging of tissue microstructure as well as elastogram in vivo and in real time. In this preliminary study, OCE was used in the setting of the human prostate biopsies ex vivo, and the images acquired were compared with those obtained using standard histopathologic methods. Methods. 120 prostate biopsies were obtained by TRUS guided needle biopsy procedures from 9 patients with clinically suspected cancer of the prostate. The biopsies were approximately 0.8mm in diameter and 12mm in length, and prepared in Formalin solution. Quantitative assessment of biopsy samples using OCE was obtained in kilopascals (kPa) before histopathologic evaluation. The results obtained from OCE and standard histopathologic evaluation were compared provided the cross-validation. Sensitivity, specificity, and positive and negative predictive values were calculated for OCE (histopathology was a reference standard). Results. OCE could provide quantitative elasticity properties of prostate biopsies within benign prostate tissue, prostatic intraepithelial neoplasia, atypical hyperplasia and malignant prostate cancer. Data analysed showed that the sensitivity and specificity of OCE for PCa detection were 1 and 0.91, respectively. PCa had significantly higher stiffness values compared to benign tissues, with a trend of increasing in stiffness with increasing of malignancy. Conclusions. Using OCE, microscopic resolution elastogram is promising in diagnosis of human prostatic diseases. Further studies using this technique to improve the

  12. Diagnosis of human malignancies using laser-induced breakdown spectroscopy in combination with chemometric methods

    Science.gov (United States)

    Chen, Xue; Li, Xiaohui; Yu, Xin; Chen, Deying; Liu, Aichun

    2018-01-01

    Diagnosis of malignancies is a challenging clinical issue. In this work, we present quick and robust diagnosis and discrimination of lymphoma and multiple myeloma (MM) using laser-induced breakdown spectroscopy (LIBS) conducted on human serum samples, in combination with chemometric methods. The serum samples collected from lymphoma and MM cancer patients and healthy controls were deposited on filter papers and ablated with a pulsed 1064 nm Nd:YAG laser. 24 atomic lines of Ca, Na, K, H, O, and N were selected for malignancy diagnosis. Principal component analysis (PCA), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and k nearest neighbors (kNN) classification were applied to build the malignancy diagnosis and discrimination models. The performances of the models were evaluated using 10-fold cross validation. The discrimination accuracy, confusion matrix and receiver operating characteristic (ROC) curves were obtained. The values of area under the ROC curve (AUC), sensitivity and specificity at the cut-points were determined. The kNN model exhibits the best performances with overall discrimination accuracy of 96.0%. Distinct discrimination between malignancies and healthy controls has been achieved with AUC, sensitivity and specificity for healthy controls all approaching 1. For lymphoma, the best discrimination performance values are AUC = 0.990, sensitivity = 0.970 and specificity = 0.956. For MM, the corresponding values are AUC = 0.986, sensitivity = 0.892 and specificity = 0.994. The results show that the serum-LIBS technique can serve as a quick, less invasive and robust method for diagnosis and discrimination of human malignancies.

  13. GSHSite: exploiting an iteratively statistical method to identify s-glutathionylation sites with substrate specificity.

    Directory of Open Access Journals (Sweden)

    Yi-Ju Chen

    Full Text Available S-glutathionylation, the covalent attachment of a glutathione (GSH to the sulfur atom of cysteine, is a selective and reversible protein post-translational modification (PTM that regulates protein activity, localization, and stability. Despite its implication in the regulation of protein functions and cell signaling, the substrate specificity of cysteine S-glutathionylation remains unknown. Based on a total of 1783 experimentally identified S-glutathionylation sites from mouse macrophages, this work presents an informatics investigation on S-glutathionylation sites including structural factors such as the flanking amino acids composition and the accessible surface area (ASA. TwoSampleLogo presents that positively charged amino acids flanking the S-glutathionylated cysteine may influence the formation of S-glutathionylation in closed three-dimensional environment. A statistical method is further applied to iteratively detect the conserved substrate motifs with statistical significance. Support vector machine (SVM is then applied to generate predictive model considering the substrate motifs. According to five-fold cross-validation, the SVMs trained with substrate motifs could achieve an enhanced sensitivity, specificity, and accuracy, and provides a promising performance in an independent test set. The effectiveness of the proposed method is demonstrated by the correct identification of previously reported S-glutathionylation sites of mouse thioredoxin (TXN and human protein tyrosine phosphatase 1b (PTP1B. Finally, the constructed models are adopted to implement an effective web-based tool, named GSHSite (http://csb.cse.yzu.edu.tw/GSHSite/, for identifying uncharacterized GSH substrate sites on the protein sequences.

  14. Classification of forest development stages from national low-density lidar datasets: a comparison of machine learning methods

    Directory of Open Access Journals (Sweden)

    R. Valbuena

    2016-02-01

    Full Text Available The area-based method has become a widespread approach in airborne laser scanning (ALS, being mainly employed for the estimation of continuous variables describing forest attributes: biomass, volume, density, etc. However, to date, classification methods based on machine learning, which are fairly common in other remote sensing fields, such as land use / land cover classification using multispectral sensors, have been largely overseen in forestry applications of ALS. In this article, we wish to draw the attention on statistical methods predicting discrete responses, for supervised classification of ALS datasets. A wide spectrum of approaches are reviewed: discriminant analysis (DA using various classifiers –maximum likelihood, minimum volume ellipsoid, naïve Bayes–, support vector machine (SVM, artificial neural networks (ANN, random forest (RF and nearest neighbour (NN methods. They are compared in the context of a classification of forest areas into development classes (DC used in practical silvicultural management in Finland, using their low-density national ALS dataset. We observed that RF and NN had the most balanced error matrices, with cross-validated predictions which were mainly unbiased for all DCs. Although overall accuracies were higher for SVM and ANN, their results were very dissimilar across DCs, and they can therefore be only advantageous if certain DCs are targeted. DA methods underperformed in comparison to other alternatives, and were only advantageous for the detection of seedling stands. These results show that, besides the well demonstrated capacity of ALS for quantifying forest stocks, there is a great deal of potential for predicting categorical variables in general, and forest types in particular. In conclusion, we consider that the presented methodology shall also be adapted to the type of forest classes that can be relevant to Mediterranean ecosystems, opening a range of possibilities for future research, in which

  15. Communication: On the consistency of approximate quantum dynamics simulation methods for vibrational spectra in the condensed phase.

    Science.gov (United States)

    Rossi, Mariana; Liu, Hanchao; Paesani, Francesco; Bowman, Joel; Ceriotti, Michele

    2014-11-14

    Including quantum mechanical effects on the dynamics of nuclei in the condensed phase is challenging, because the complexity of exact methods grows exponentially with the number of quantum degrees of freedom. Efforts to circumvent these limitations can be traced down to two approaches: methods that treat a small subset of the degrees of freedom with rigorous quantum mechanics, considering the rest of the system as a static or classical environment, and methods that treat the whole system quantum mechanically, but using approximate dynamics. Here, we perform a systematic comparison between these two philosophies for the description of quantum effects in vibrational spectroscopy, taking the Embedded Local Monomer model and a mixed quantum-classical model as representatives of the first family of methods, and centroid molecular dynamics and thermostatted ring polymer molecular dynamics as examples of the latter. We use as benchmarks D2O doped with HOD and pure H2O at three distinct thermodynamic state points (ice Ih at 150 K, and the liquid at 300 K and 600 K), modeled with the simple q-TIP4P/F potential energy and dipole moment surfaces. With few exceptions the different techniques yield IR absorption frequencies that are consistent with one another within a few tens of cm(-1). Comparison with classical molecular dynamics demonstrates the importance of nuclear quantum effects up to the highest temperature, and a detailed discussion of the discrepancies between the various methods let us draw some (circumstantial) conclusions about the impact of the very different approximations that underlie them. Such cross validation between radically different approaches could indicate a way forward to further improve the state of the art in simulations of condensed-phase quantum dynamics.

  16. A comparative study of fuzzy target selection methods in direct marketing

    NARCIS (Netherlands)

    Costa Sousa, da J.M.; Kaymak, U.; Madeira, S.

    2002-01-01

    Target selection in direct marketing is an important data mining problem for which fuzzy modeling can be used. The paper compares several fuzzy modeling techniques applied to target selection based on recency, frequency and monetary value measures. The comparison uses cross validation applied to

  17. Identification and Severity Determination of Wheat Stripe Rust and Wheat Leaf Rust Based on Hyperspectral Data Acquired Using a Black-Paper-Based Measuring Method

    Science.gov (United States)

    Ruan, Liu; Wang, Rui; Liu, Qi; Ma, Zhanhong; Li, Xiaolong; Cheng, Pei; Wang, Haiguang

    2016-01-01

    It is important to implement detection and assessment of plant diseases based on remotely sensed data for disease monitoring and control. Hyperspectral data of healthy leaves, leaves in incubation period and leaves in diseased period of wheat stripe rust and wheat leaf rust were collected under in-field conditions using a black-paper-based measuring method developed in this study. After data preprocessing, the models to identify the diseases were built using distinguished partial least squares (DPLS) and support vector machine (SVM), and the disease severity inversion models of stripe rust and the disease severity inversion models of leaf rust were built using quantitative partial least squares (QPLS) and support vector regression (SVR). All the models were validated by using leave-one-out cross validation and external validation. The diseases could be discriminated using both distinguished partial least squares and support vector machine with the accuracies of more than 99%. For each wheat rust, disease severity levels were accurately retrieved using both the optimal QPLS models and the optimal SVR models with the coefficients of determination (R2) of more than 0.90 and the root mean square errors (RMSE) of less than 0.15. The results demonstrated that identification and severity evaluation of stripe rust and leaf rust at the leaf level could be implemented based on the hyperspectral data acquired using the developed method. A scientific basis was provided for implementing disease monitoring by using aerial and space remote sensing technologies. PMID:27128464

  18. Analytical method development of nifedipine and its degradants binary mixture using high performance liquid chromatography through a quality by design approach

    Science.gov (United States)

    Choiri, S.; Ainurofiq, A.; Ratri, R.; Zulmi, M. U.

    2018-03-01

    Nifedipin (NIF) is a photo-labile drug that easily degrades when it exposures a sunlight. This research aimed to develop of an analytical method using a high-performance liquid chromatography and implemented a quality by design approach to obtain effective, efficient, and validated analytical methods of NIF and its degradants. A 22 full factorial design approach with a curvature as a center point was applied to optimize of the analytical condition of NIF and its degradants. Mobile phase composition (MPC) and flow rate (FR) as factors determined on the system suitability parameters. The selected condition was validated by cross-validation using a leave one out technique. Alteration of MPC affected on time retention significantly. Furthermore, an increase of FR reduced the tailing factor. In addition, the interaction of both factors affected on an increase of the theoretical plates and resolution of NIF and its degradants. The selected analytical condition of NIF and its degradants has been validated at range 1 – 16 µg/mL that had good linearity, precision, accuration and efficient due to an analysis time within 10 min.

  19. Identification and Severity Determination of Wheat Stripe Rust and Wheat Leaf Rust Based on Hyperspectral Data Acquired Using a Black-Paper-Based Measuring Method.

    Science.gov (United States)

    Wang, Hui; Qin, Feng; Ruan, Liu; Wang, Rui; Liu, Qi; Ma, Zhanhong; Li, Xiaolong; Cheng, Pei; Wang, Haiguang

    2016-01-01

    It is important to implement detection and assessment of plant diseases based on remotely sensed data for disease monitoring and control. Hyperspectral data of healthy leaves, leaves in incubation period and leaves in diseased period of wheat stripe rust and wheat leaf rust were collected under in-field conditions using a black-paper-based measuring method developed in this study. After data preprocessing, the models to identify the diseases were built using distinguished partial least squares (DPLS) and support vector machine (SVM), and the disease severity inversion models of stripe rust and the disease severity inversion models of leaf rust were built using quantitative partial least squares (QPLS) and support vector regression (SVR). All the models were validated by using leave-one-out cross validation and external validation. The diseases could be discriminated using both distinguished partial least squares and support vector machine with the accuracies of more than 99%. For each wheat rust, disease severity levels were accurately retrieved using both the optimal QPLS models and the optimal SVR models with the coefficients of determination (R2) of more than 0.90 and the root mean square errors (RMSE) of less than 0.15. The results demonstrated that identification and severity evaluation of stripe rust and leaf rust at the leaf level could be implemented based on the hyperspectral data acquired using the developed method. A scientific basis was provided for implementing disease monitoring by using aerial and space remote sensing technologies.

  20. [Differentiation Study of Chinese Medical Syndrome Typing for Diarrhea-predominant Irritable Bowel Syndrome Based on Information of Four Chinese Medical Diagnostic Methods and Brain-gut Peptides].

    Science.gov (United States)

    Wu, Hao-meng; Xu, Zhi-wei; Ao, Hai-qing; Shi, Ya-fei; Hu, Hai-yan; Ji, Yun-peng

    2015-10-01

    To establish discriminant functions of diarrhea-predominant irritable bowel syndrome (IBS-D) by studying it from quantitative diagnosis angle, hoping to reduce interference of subjective factors in diagnosing and differentially diagnosing Chinese medical syndromes of IBS-D. A Chinese medical clinical epidemiological survey was carried out in 439 IBS-D patients using Clinical Information Collection Table of IBS. Initial syndromes were obtained by cluster analysis. They were analyzed using step-by-step discrimination by taking information of four Chinese medical diagnostic methods and serum brain-gut peptides (BGP) as variables. Clustering results were Gan stagnation Pi deficiency syndrome (GSPDS), Pi-Wei weakness syndrome (PWWS), Gan stagnation qi stasis syndrome (GSQSS), Pi-Shen yang deficiency syndrome (PSYDS), Pi-Wei damp-heat syndrome (PWDHS), cold-damp disturbing Pi syndrome (CDDPS). Of them, GSPDS was mostly often seen with effective percentage of 34. 2%, while CDDPS was the least often seen with effective percentage of 5.5%. A total of 5 discriminant functions for GSPDS, PWWS, GSQSS, PSYDS, and PWDHS were obtained by step-by-step dis- crimination method. The retrospective misjudgment rate was 4.1% (16/390), while the cross-validation misjudgment rate was 15.4% (60/390). The establishment of discriminant functions is of value in objectively diagnosing and differentially diagnosing Chinese medical syndromes of IBS-D.

  1. An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.

    Science.gov (United States)

    Xia, Jie; Jin, Hongwei; Liu, Zhenming; Zhang, Liangren; Wang, Xiang Simon

    2014-05-27

    Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the "artificial enrichment" and "analogue bias" of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD.

  2. An UPLC-MS/MS method for highly sensitive high-throughput analysis of phytohormones in plant tissues

    Directory of Open Access Journals (Sweden)

    Balcke Gerd Ulrich

    2012-11-01

    Full Text Available Abstract Background Phytohormones are the key metabolites participating in the regulation of multiple functions of plant organism. Among them, jasmonates, as well as abscisic and salicylic acids are responsible for triggering and modulating plant reactions targeted against pathogens and herbivores, as well as resistance to abiotic stress (drought, UV-irradiation and mechanical wounding. These factors induce dramatic changes in phytohormone biosynthesis and transport leading to rapid local and systemic stress responses. Understanding of underlying mechanisms is of principle interest for scientists working in various areas of plant biology. However, highly sensitive, precise and high-throughput methods for quantification of these phytohormones in small samples of plant tissues are still missing. Results Here we present an LC-MS/MS method for fast and highly sensitive determination of jasmonates, abscisic and salicylic acids. A single-step sample preparation procedure based on mixed-mode solid phase extraction was efficiently combined with essential improvements in mobile phase composition yielding higher efficiency of chromatographic separation and MS-sensitivity. This strategy resulted in dramatic increase in overall sensitivity, allowing successful determination of phytohormones in small (less than 50 mg of fresh weight tissue samples. The method was completely validated in terms of analyte recovery, sensitivity, linearity and precision. Additionally, it was cross-validated with a well-established GC-MS-based procedure and its applicability to a variety of plant species and organs was verified. Conclusion The method can be applied for the analyses of target phytohormones in small tissue samples obtained from any plant species and/or plant part relying on any commercially available (even less sensitive tandem mass spectrometry instrumentation.

  3. A noninvasive method for coronary artery diseases diagnosis using a clinically-interpretable fuzzy rule-based system

    Directory of Open Access Journals (Sweden)

    Hamid Reza Marateb

    2015-01-01

    Full Text Available Background: Coronary heart diseases/coronary artery diseases (CHDs/CAD, the most common form of cardiovascular disease (CVD, are a major cause for death and disability in developing/developed countries. CAD risk factors could be detected by physicians to prevent the CAD occurrence in the near future. Invasive coronary angiography, a current diagnosis method, is costly and associated with morbidity and mortality in CAD patients. The aim of this study was to design a computer-based noninvasive CAD diagnosis system with clinically interpretable rules. Materials and Methods: In this study, the Cleveland CAD dataset from the University of California UCI (Irvine was used. The interval-scale variables were discretized, with cut points taken from the literature. A fuzzy rule-based system was then formulated based on a neuro-fuzzy classifier (NFC whose learning procedure was speeded up by the scaled conjugate gradient algorithm. Two feature selection (FS methods, multiple logistic regression (MLR and sequential FS, were used to reduce the required attributes. The performance of the NFC (without/with FS was then assessed in a hold-out validation framework. Further cross-validation was performed on the best classifier. Results: In this dataset, 16 complete attributes along with the binary CHD diagnosis (gold standard for 272 subjects (68% male were analyzed. MLR + NFC showed the best performance. Its overall sensitivity, specificity, accuracy, type I error (α and statistical power were 79%, 89%, 84%, 0.1 and 79%, respectively. The selected features were "age and ST/heart rate slope categories," "exercise-induced angina status," fluoroscopy, and thallium-201 stress scintigraphy results. Conclusion: The proposed method showed "substantial agreement" with the gold standard. This algorithm is thus, a promising tool for screening CAD patients.

  4. Computerized detection of noncalcified plaques in coronary CT angiography: Evaluation of topological soft gradient prescreening method and luminal analysis

    Energy Technology Data Exchange (ETDEWEB)

    Wei, Jun, E-mail: jvwei@umich.edu; Zhou, Chuan; Chan, Heang-Ping; Chughtai, Aamer; Agarwal, Prachi; Kuriakose, Jean; Hadjiiski, Lubomir; Patel, Smita; Kazerooni, Ella [Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109 (United States)

    2014-08-15

    Purpose: The buildup of noncalcified plaques (NCPs) that are vulnerable to rupture in coronary arteries is a risk for myocardial infarction. Interpretation of coronary CT angiography (cCTA) to search for NCP is a challenging task for radiologists due to the low CT number of NCP, the large number of coronary arteries, and multiple phase CT acquisition. The authors conducted a preliminary study to develop machine learning method for automated detection of NCPs in cCTA. Methods: With IRB approval, a data set of 83 ECG-gated contrast enhanced cCTA scans with 120 NCPs was collected retrospectively from patient files. A multiscale coronary artery response and rolling balloon region growing (MSCAR-RBG) method was applied to each cCTA volume to extract the coronary arterial trees. Each extracted vessel was reformatted to a straightened volume composed of cCTA slices perpendicular to the vessel centerline. A topological soft-gradient (TSG) detection method was developed to prescreen for NCP candidates by analyzing the 2D topological features of the radial gradient field surface along the vessel wall. The NCP candidates were then characterized by a luminal analysis that used 3D geometric features to quantify the shape information and gray-level features to evaluate the density of the NCP candidates. With machine learning techniques, useful features were identified and combined into an NCP score to differentiate true NCPs from false positives (FPs). To evaluate the effectiveness of the image analysis methods, the authors performed tenfold cross-validation with the available data set. Receiver operating characteristic (ROC) analysis was used to assess the classification performance of individual features and the NCP score. The overall detection performance was estimated by free response ROC (FROC) analysis. Results: With our TSG prescreening method, a prescreening sensitivity of 92.5% (111/120) was achieved with a total of 1181 FPs (14.2 FPs/scan). On average, six features

  5. Computerized detection of noncalcified plaques in coronary CT angiography: Evaluation of topological soft gradient prescreening method and luminal analysis

    International Nuclear Information System (INIS)

    Wei, Jun; Zhou, Chuan; Chan, Heang-Ping; Chughtai, Aamer; Agarwal, Prachi; Kuriakose, Jean; Hadjiiski, Lubomir; Patel, Smita; Kazerooni, Ella

    2014-01-01

    Purpose: The buildup of noncalcified plaques (NCPs) that are vulnerable to rupture in coronary arteries is a risk for myocardial infarction. Interpretation of coronary CT angiography (cCTA) to search for NCP is a challenging task for radiologists due to the low CT number of NCP, the large number of coronary arteries, and multiple phase CT acquisition. The authors conducted a preliminary study to develop machine learning method for automated detection of NCPs in cCTA. Methods: With IRB approval, a data set of 83 ECG-gated contrast enhanced cCTA scans with 120 NCPs was collected retrospectively from patient files. A multiscale coronary artery response and rolling balloon region growing (MSCAR-RBG) method was applied to each cCTA volume to extract the coronary arterial trees. Each extracted vessel was reformatted to a straightened volume composed of cCTA slices perpendicular to the vessel centerline. A topological soft-gradient (TSG) detection method was developed to prescreen for NCP candidates by analyzing the 2D topological features of the radial gradient field surface along the vessel wall. The NCP candidates were then characterized by a luminal analysis that used 3D geometric features to quantify the shape information and gray-level features to evaluate the density of the NCP candidates. With machine learning techniques, useful features were identified and combined into an NCP score to differentiate true NCPs from false positives (FPs). To evaluate the effectiveness of the image analysis methods, the authors performed tenfold cross-validation with the available data set. Receiver operating characteristic (ROC) analysis was used to assess the classification performance of individual features and the NCP score. The overall detection performance was estimated by free response ROC (FROC) analysis. Results: With our TSG prescreening method, a prescreening sensitivity of 92.5% (111/120) was achieved with a total of 1181 FPs (14.2 FPs/scan). On average, six features

  6. Fuzzy method for pre-diagnosis of breast cancer from the Fine Needle Aspirate analysis

    Directory of Open Access Journals (Sweden)

    Sizilio Gláucia RMA

    2012-11-01

    Full Text Available Abstract Background Across the globe, breast cancer is one of the leading causes of death among women and, currently, Fine Needle Aspirate (FNA with visual interpretation is the easiest and fastest biopsy technique for the diagnosis of this deadly disease. Unfortunately, the ability of this method to diagnose cancer correctly when the disease is present varies greatly, from 65% to 98%. This article introduces a method to assist in the diagnosis and second opinion of breast cancer from the analysis of descriptors extracted from smears of breast mass obtained by FNA, with the use of computational intelligence resources - in this case, fuzzy logic. Methods For data acquisition of FNA, the Wisconsin Diagnostic Breast Cancer Data (WDBC, from the University of California at Irvine (UCI Machine Learning Repository, available on the internet through the UCI domain was used. The knowledge acquisition process was carried out by the extraction and analysis of numerical data of the WDBC and by interviews and discussions with medical experts. The PDM-FNA-Fuzzy was developed in four steps: 1 Fuzzification Stage; 2 Rules Base; 3 Inference Stage; and 4 Defuzzification Stage. Performance cross-validation was used in the tests, with three databases with gold pattern clinical cases randomly extracted from the WDBC. The final validation was held by medical specialists in pathology, mastology and general practice, and with gold pattern clinical cases, i.e. with known and clinically confirmed diagnosis. Results The Fuzzy Method developed provides breast cancer pre-diagnosis with 98.59% sensitivity (correct pre-diagnosis of malignancies; and 85.43% specificity (correct pre-diagnosis of benign cases. Due to the high sensitivity presented, these results are considered satisfactory, both by the opinion of medical specialists in the aforementioned areas and by comparison with other studies involving breast cancer diagnosis using FNA. Conclusions This paper presents an

  7. Identifying essential genes in bacterial metabolic networks with machine learning methods

    Science.gov (United States)

    2010-01-01

    Background Identifying essential genes in bacteria supports to identify potential drug targets and an understanding of minimal requirements for a synthetic cell. However, experimentally assaying the essentiality of their coding genes is resource intensive and not feasible for all bacterial organisms, in particular if they are infective. Results We developed a machine learning technique to identify essential genes using the experimental data of genome-wide knock-out screens from one bacterial organism to infer essential genes of another related bacterial organism. We used a broad variety of topological features, sequence characteristics and co-expression properties potentially associated with essentiality, such as flux deviations, centrality, codon frequencies of the sequences, co-regulation and phyletic retention. An organism-wise cross-validation on bacterial species yielded reliable results with good accuracies (area under the receiver-operator-curve of 75% - 81%). Finally, it was applied to drug target predictions for Salmonella typhimurium. We compared our predictions to the viability of experimental knock-outs of S. typhimurium and identified 35 enzymes, which are highly relevant to be considered as potential drug targets. Specifically, we detected promising drug targets in the non-mevalonate pathway. Conclusions Using elaborated features characterizing network topology, sequence information and microarray data enables to predict essential genes from a bacterial reference organism to a related query organism without any knowledge about the essentiality of genes of the query organism. In general, such a method is beneficial for inferring drug targets when experimental data about genome-wide knockout screens is not available for the investigated organism. PMID:20438628

  8. Refinement of a Method for Identifying Probable Archaeological Sites from Remotely Sensed Data

    Science.gov (United States)

    Tilton, James C.; Comer, Douglas C.; Priebe, Carey E.; Sussman, Daniel; Chen, Li

    2012-01-01

    To facilitate locating archaeological sites before they are compromised or destroyed, we are developing approaches for generating maps of probable archaeological sites, through detecting subtle anomalies in vegetative cover, soil chemistry, and soil moisture by analyzing remotely sensed data from multiple sources. We previously reported some success in this effort with a statistical analysis of slope, radar, and Ikonos data (including tasseled cap and NDVI transforms) with Student's t-test. We report here on new developments in our work, performing an analysis of 8-band multispectral Worldview-2 data. The Worldview-2 analysis begins by computing medians and median absolute deviations for the pixels in various annuli around each site of interest on the 28 band difference ratios. We then use principle components analysis followed by linear discriminant analysis to train a classifier which assigns a posterior probability that a location is an archaeological site. We tested the procedure using leave-one-out cross validation with a second leave-one-out step to choose parameters on a 9,859x23,000 subset of the WorldView-2 data over the western portion of Ft. Irwin, CA, USA. We used 100 known non-sites and trained one classifier for lithic sites (n=33) and one classifier for habitation sites (n=16). We then analyzed convex combinations of scores from the Archaeological Predictive Model (APM) and our scores. We found that that the combined scores had a higher area under the ROC curve than either individual method, indicating that including WorldView-2 data in analysis improved the predictive power of the provided APM.

  9. Seasonal forecasting of hydrological drought in the Limpopo Basin: a comparison of statistical methods

    Science.gov (United States)

    Seibert, Mathias; Merz, Bruno; Apel, Heiko

    2017-03-01

    The Limpopo Basin in southern Africa is prone to droughts which affect the livelihood of millions of people in South Africa, Botswana, Zimbabwe and Mozambique. Seasonal drought early warning is thus vital for the whole region. In this study, the predictability of hydrological droughts during the main runoff period from December to May is assessed using statistical approaches. Three methods (multiple linear models, artificial neural networks, random forest regression trees) are compared in terms of their ability to forecast streamflow with up to 12 months of lead time. The following four main findings result from the study. 1. There are stations in the basin at which standardised streamflow is predictable with lead times up to 12 months. The results show high inter-station differences of forecast skill but reach a coefficient of determination as high as 0.73 (cross validated). 2. A large range of potential predictors is considered in this study, comprising well-established climate indices, customised teleconnection indices derived from sea surface temperatures and antecedent streamflow as a proxy of catchment conditions. El Niño and customised indices, representing sea surface temperature in the Atlantic and Indian oceans, prove to be important teleconnection predictors for the region. Antecedent streamflow is a strong predictor in small catchments (with median 42 % explained variance), whereas teleconnections exert a stronger influence in large catchments. 3. Multiple linear models show the best forecast skill in this study and the greatest robustness compared to artificial neural networks and random forest regression trees, despite their capabilities to represent nonlinear relationships. 4. Employed in early warning, the models can be used to forecast a specific drought level. Even if the coefficient of determination is low, the forecast models have a skill better than a climatological forecast, which is shown by analysis of receiver operating characteristics

  10. CrossLink: a novel method for cross-condition classification of cancer subtypes.

    Science.gov (United States)

    Ma, Chifeng; Sastry, Konduru S; Flore, Mario; Gehani, Salah; Al-Bozom, Issam; Feng, Yusheng; Serpedin, Erchin; Chouchane, Lotfi; Chen, Yidong; Huang, Yufei

    2016-08-22

    We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.

  11. On the importance of methods in hydrological modelling. Perspectives from a case study

    Science.gov (United States)

    Fenicia, Fabrizio; Kavetski, Dmitri

    2017-04-01

    The hydrological community generally appreciates that developing any non-trivial hydrological model requires a multitude of modelling choices. These choices may range from a (seemingly) straightforward application of mass conservation, to the (often) guesswork-like selection of constitutive functions, parameter values, etc. The application of a model itself requires a myriad of methodological choices - the selection of numerical solvers, objective functions for model calibration, validation approaches, performance metrics, etc. Not unreasonably, hydrologists embarking on ever ambitious projects prioritize hydrological insight over the morass of methodological choices. Perhaps to emphasize "ideas" over "methods", some journals have even reduced the fontsize of the methodology sections of its articles. However, the very nature of modelling is that seemingly routine methodological choices can significantly affect the conclusions of case studies and investigations - making it dangerous to skimp over methodological details in an enthusiastic rush towards the next great hydrological idea. This talk shares modelling insights from a hydrological study of a 300 km2 catchment in Luxembourg, where the diversity of hydrograph dynamics observed at 10 locations begs the question of whether external forcings or internal catchment properties act as dominant controls on streamflow generation. The hydrological insights are fascinating (at least to us), but in this talk we emphasize the impact of modelling methodology on case study conclusions and recommendations. How did we construct our prior set of hydrological model hypotheses? What numerical solver was implemented and why was an objective function based on Bayesian theory deployed? And what would have happened had we omitted model cross-validation, or not used a systematic hypothesis testing approach?

  12. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method.

    Science.gov (United States)

    Nielsen, Morten; Lundegaard, Claus; Lund, Ole

    2007-07-04

    Antigen presenting cells (APCs) sample the extra cellular space and present peptides from here to T helper cells, which can be activated if the peptides are of foreign origin. The peptides are presented on the surface of the cells in complex with major histocompatibility class II (MHC II) molecules. Identification of peptides that bind MHC II molecules is thus a key step in rational vaccine design and developing methods for accurate prediction of the peptide:MHC interactions play a central role in epitope discovery. The MHC class II binding groove is open at both ends making the correct alignment of a peptide in the binding groove a crucial part of identifying the core of an MHC class II binding motif. Here, we present a novel stabilization matrix alignment method, SMM-align, that allows for direct prediction of peptide:MHC binding affinities. The predictive performance of the method is validated on a large MHC class II benchmark data set covering 14 HLA-DR (human MHC) and three mouse H2-IA alleles. The predictive performance of the SMM-align method was demonstrated to be superior to that of the Gibbs sampler, TEPITOPE, SVRMHC, and MHCpred methods. Cross validation between peptide data set obtained from different sources demonstrated that direct incorporation of peptide length potentially results in over-fitting of the binding prediction method. Focusing on amino terminal peptide flanking residues (PFR), we demonstrate a consistent gain in predictive performance by favoring binding registers with a minimum PFR length of two amino acids. Visualizing the binding motif as obtained by the SMM-align and TEPITOPE methods highlights a series of fundamental discrepancies between the two predicted motifs. For the DRB1*1302 allele for instance, the TEPITOPE method favors basic amino acids at most anchor positions, whereas the SMM-align method identifies a preference for hydrophobic or neutral amino acids at the anchors. The SMM-align method was shown to outperform other

  13. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method

    Directory of Open Access Journals (Sweden)

    Lund Ole

    2007-07-01

    Full Text Available Abstract Background Antigen presenting cells (APCs sample the extra cellular space and present peptides from here to T helper cells, which can be activated if the peptides are of foreign origin. The peptides are presented on the surface of the cells in complex with major histocompatibility class II (MHC II molecules. Identification of peptides that bind MHC II molecules is thus a key step in rational vaccine design and developing methods for accurate prediction of the peptide:MHC interactions play a central role in epitope discovery. The MHC class II binding groove is open at both ends making the correct alignment of a peptide in the binding groove a crucial part of identifying the core of an MHC class II binding motif. Here, we present a novel stabilization matrix alignment method, SMM-align, that allows for direct prediction of peptide:MHC binding affinities. The predictive performance of the method is validated on a large MHC class II benchmark data set covering 14 HLA-DR (human MHC and three mouse H2-IA alleles. Results The predictive performance of the SMM-align method was demonstrated to be superior to that of the Gibbs sampler, TEPITOPE, SVRMHC, and MHCpred methods. Cross validation between peptide data set obtained from different sources demonstrated that direct incorporation of peptide length potentially results in over-fitting of the binding prediction method. Focusing on amino terminal peptide flanking residues (PFR, we demonstrate a consistent gain in predictive performance by favoring binding registers with a minimum PFR length of two amino acids. Visualizing the binding motif as obtained by the SMM-align and TEPITOPE methods highlights a series of fundamental discrepancies between the two predicted motifs. For the DRB1*1302 allele for instance, the TEPITOPE method favors basic amino acids at most anchor positions, whereas the SMM-align method identifies a preference for hydrophobic or neutral amino acids at the anchors. Conclusion

  14. Design and implementation of new design of numerical experiments for non linear models; Conception et mise en oeuvre de nouvelles methodes d'elaboration de plans d'experiences pour l'apprentissage de modeles non lineaires

    Energy Technology Data Exchange (ETDEWEB)

    Gazut, St

    2007-03-15

    This thesis addresses the problem of the construction of surrogate models in numerical simulation. Whenever numerical experiments are costly, the simulation model is complex and difficult to use. It is important then to select the numerical experiments as efficiently as possible in order to minimize their number. In statistics, the selection of experiments is known as optimal experimental design. In the context of numerical simulation where no measurement uncertainty is present, we describe an alternative approach based on statistical learning theory and re-sampling techniques. The surrogate models are constructed using neural networks and the generalization error is estimated by leave-one-out, cross-validation and bootstrap. It is shown that the bootstrap can control the over-fitting and extend the concept of leverage for non linear in their parameters surrogate models. The thesis describes an iterative method called LDR for Learner Disagreement from experiment Re-sampling, based on active learning using several surrogate models constructed on bootstrap samples. The method consists in adding new experiments where the predictors constructed from bootstrap samples disagree most. We compare the LDR method with other methods of experimental design such as D-optimal selection. (author)

  15. A PRISMA-Driven Systematic Review of Predictive Equations for Assessing Fat and Fat-Free Mass in Healthy Children and Adolescents Using Multicomponent Molecular Models as the Reference Method

    Directory of Open Access Journals (Sweden)

    Analiza M. Silva

    2013-01-01

    Full Text Available Simple methods to assess both fat (FM and fat-free mass (FFM are required in paediatric populations. Several bioelectrical impedance instruments (BIAs and anthropometric equations have been developed using different criterion methods (multicomponent models for assessing FM and FFM. Through childhood, FFM density increases while FFM hydration decreases until reaching adult values. Therefore, multicomponent models should be used as the gold standard method for developing simple techniques because two-compartment models (2C model rely on the assumed adult values of FFM density and hydration (1.1 g/cm3 and 73.2%, respectively. This study will review BIA and/or anthropometric-based equations for assessing body composition in paediatric populations. We reviewed English language articles from MEDLINE (1985–2012 with the selection of predictive equations developed for assessing FM and FFM using three-compartment (3C and 4C models as criterion. Search terms included children, adolescent, childhood, adolescence, 4C model, 3C model, multicomponent model, equation, prediction, DXA, BIA, resistance, anthropometry, skinfold, FM, and FFM. A total of 14 studies (33 equations were selected with the majority developed using DXA as the criterion method with a limited number of studies providing cross-validation results. Overall, the selected equations are useful for epidemiological studies, but some concerns still arise on an individual basis.

  16. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    Directory of Open Access Journals (Sweden)

    Santana Isabel

    2011-08-01

    Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.

  17. Evaluation of normalization methods for cDNA microarray data by k-NN classification

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Wei; Xing, Eric P; Myers, Connie; Mian, Saira; Bissell, Mina J

    2004-12-17

    Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Using LOOCV error of k-NNs as the evaluation criterion, three double

  18. Rapid method for the simultaneous detection of boar taint compounds by means of solid phase microextraction coupled to gas chromatography/mass spectrometry.

    Science.gov (United States)

    Verplanken, Kaat; Wauters, Jella; Van Durme, Jim; Claus, Dirk; Vercammen, Joeri; De Saeger, Sarah; Vanhaecke, Lynn

    2016-09-02

    Because of animal welfare issues, the voluntary ban on surgical castration of male piglets, starting January 2018 was announced in a European Treaty. One viable alternative is the fattening of entire male pigs. However, this can cause negative consumer reactions due to the occurrence of boar taint and possibly lead to severe economic losses in pig husbandry. In this study, headspace solid phase microextraction (HS-SPME) coupled to GC-MS was used in the development and optimization of a candidate method for fast and accurate detection of the boar taint compounds. Remarkably fast extraction (45s) of the boar taint compounds from adipose tissue was achieved by singeing the fat with a soldering iron while released volatiles were extracted in-situ using HS-SPME. The obtained method showed good performance characteristics after validation according to CD 2002/657/EC and ISO/IEC 17025 guidelines. Moreover, cross-validation with an in-house UHPLC-HR-Orbitrap-MS method showed good agreement between an in-laboratory method and the new candidate method for the fast extraction and detection of skatole and androstenone, which emphasizes the accuracy of this new SPME-GC-MS method. Threshold detection of the boar taint compounds on a portable GC-MS could not be achieved. However, despite the lack of sensitivity obtained on the latter instrument, a very fast method with run-to-run time of 3.5min for the detection of the boar taint compounds was developed. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. TEMPTING system: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries.

    Science.gov (United States)

    Chang, Yung-Chun; Dai, Hong-Jie; Wu, Johnny Chi-Yang; Chen, Jian-Ming; Tsai, Richard Tzong-Han; Hsu, Wen-Lian

    2013-12-01

    Patient discharge summaries provide detailed medical information about individuals who have been hospitalized. To make a precise and legitimate assessment of the abundant data, a proper time layout of the sequence of relevant events should be compiled and used to drive a patient-specific timeline, which could further assist medical personnel in making clinical decisions. The process of identifying the chronological order of entities is called temporal relation extraction. In this paper, we propose a hybrid method to identify appropriate temporal links between a pair of entities. The method combines two approaches: one is rule-based and the other is based on the maximum entropy model. We develop an integration algorithm to fuse the results of the two approaches. All rules and the integration algorithm are formally stated so that one can easily reproduce the system and results. To optimize the system's configuration, we used the 2012 i2b2 challenge TLINK track dataset and applied threefold cross validation to the training set. Then, we evaluated its performance on the training and test datasets. The experiment results show that the proposed TEMPTING (TEMPoral relaTion extractING) system (ranked seventh) achieved an F-score of 0.563, which was at least 30% better than that of the baseline system, which randomly selects TLINK candidates from all pairs and assigns the TLINK types. The TEMPTING system using the hybrid method also outperformed the stage-based TEMPTING system. Its F-scores were 3.51% and 0.97% better than those of the stage-based system on the training set and test set, respectively. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods

    Science.gov (United States)

    2013-01-01

    Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313

  1. A meta-analysis based method for prioritizing candidate genes involved in a pre-specific function

    Directory of Open Access Journals (Sweden)

    Jingjing Zhai

    2016-12-01

    Full Text Available The identification of genes associated with a given biological function in plants remains a challenge, although network-based gene prioritization algorithms have been developed for Arabidopsis thaliana and many non-model plant species. Nevertheless, these network-based gene prioritization algorithms have encountered several problems; one in particular is that of unsatisfactory prediction accuracy due to limited network coverage, varying link quality, and/or uncertain network connectivity. Thus a model that integrates complementary biological data may be expected to increase the prediction accuracy of gene prioritization. Towards this goal, we developed a novel gene prioritization method named RafSee, to rank candidate genes using a random forest algorithm that integrates sequence, evolutionary, and epigenetic features of plants. Subsequently, we proposed an integrative approach named RAP (Rank Aggregation-based data fusion for gene Prioritization, in which an order statistics-based meta-analysis was used to aggregate the rank of the network-based gene prioritization method and RafSee, for accurately prioritizing candidate genes involved in a pre-specific biological function. Finally, we showcased the utility of RAP by prioritizing 380 flowering-time genes in Arabidopsis. The ‘leave-one-out’ cross-validation experiment showed that RafSee could work as a complement to a current state-of-art network-based gene prioritization system (AraNet v2. Moreover, RAP ranked 53.68% (204/380 flowering-time genes higher than AraNet v2, resulting in an 39.46% improvement in term of the first quartile rank. Further evaluations also showed that RAP was effective in prioritizing genes-related to different abiotic stresses. To enhance the usability of RAP for Arabidopsis and non-model plant species, an R package implementing the method is freely available at http://bioinfo.nwafu.edu.cn/software.

  2. Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries

    Directory of Open Access Journals (Sweden)

    Han Bucong

    2012-11-01

    Full Text Available Abstract Background Src plays various roles in tumour progression, invasion, metastasis, angiogenesis and survival. It is one of the multiple targets of multi-target kinase inhibitors in clinical uses and trials for the treatment of leukemia and other cancers. These successes and appearances of drug resistance in some patients have raised significant interest and efforts in discovering new Src inhibitors. Various in-silico methods have been used in some of these efforts. It is desirable to explore additional in-silico methods, particularly those capable of searching large compound libraries at high yields and reduced false-hit rates. Results We evaluated support vector machines (SVM as virtual screening tools for searching Src inhibitors from large compound libraries. SVM trained and tested by 1,703 inhibitors and 63,318 putative non-inhibitors correctly identified 93.53%~ 95.01% inhibitors and 99.81%~ 99.90% non-inhibitors in 5-fold cross validation studies. SVM trained by 1,703 inhibitors reported before 2011 and 63,318 putative non-inhibitors correctly identified 70.45% of the 44 inhibitors reported since 2011, and predicted as inhibitors 44,843 (0.33% of 13.56M PubChem, 1,496 (0.89% of 168 K MDDR, and 719 (7.73% of 9,305 MDDR compounds similar to the known inhibitors. Conclusions SVM showed comparable yield and reduced false hit rates in searching large compound libraries compared to the similarity-based and other machine-learning VS methods developed from the same set of training compounds and molecular descriptors. We tested three virtual hits of the same novel scaffold from in-house chemical libraries not reported as Src inhibitor, one of which showed moderate activity. SVM may be potentially explored for searching Src inhibitors from large compound libraries at low false-hit rates.

  3. Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries.

    Science.gov (United States)

    Han, Bucong; Ma, Xiaohua; Zhao, Ruiying; Zhang, Jingxian; Wei, Xiaona; Liu, Xianghui; Liu, Xin; Zhang, Cunlong; Tan, Chunyan; Jiang, Yuyang; Chen, Yuzong

    2012-11-23

    Src plays various roles in tumour progression, invasion, metastasis, angiogenesis and survival. It is one of the multiple targets of multi-target kinase inhibitors in clinical uses and trials for the treatment of leukemia and other cancers. These successes and appearances of drug resistance in some patients have raised significant interest and efforts in discovering new Src inhibitors. Various in-silico methods have been used in some of these efforts. It is desirable to explore additional in-silico methods, particularly those capable of searching large compound libraries at high yields and reduced false-hit rates. We evaluated support vector machines (SVM) as virtual screening tools for searching Src inhibitors from large compound libraries. SVM trained and tested by 1,703 inhibitors and 63,318 putative non-inhibitors correctly identified 93.53%~ 95.01% inhibitors and 99.81%~ 99.90% non-inhibitors in 5-fold cross validation studies. SVM trained by 1,703 inhibitors reported before 2011 and 63,318 putative non-inhibitors correctly identified 70.45% of the 44 inhibitors reported since 2011, and predicted as inhibitors 44,843 (0.33%) of 13.56M PubChem, 1,496 (0.89%) of 168 K MDDR, and 719 (7.73%) of 9,305 MDDR compounds similar to the known inhibitors. SVM showed comparable yield and reduced false hit rates in searching large compound libraries compared to the similarity-based and other machine-learning VS methods developed from the same set of training compounds and molecular descriptors. We tested three virtual hits of the same novel scaffold from in-house chemical libraries not reported as Src inhibitor, one of which showed moderate activity. SVM may be potentially explored for searching Src inhibitors from large compound libraries at low false-hit rates.

  4. TH-CD-202-06: A Method for Characterizing and Validating Dynamic Lung Density Change During Quiet Respiration

    Energy Technology Data Exchange (ETDEWEB)

    Dou, T [University of California, Los Angeles, Los Angeles, CA (United States); Ruan, D [UCLA School of Medicine, Los Angeles, CA (United States); Heinrich, M [Institute of Medical Informatics, University of Lubeck, Lubeck, Schleswig-Holstein (Germany); Low, D [UCLA, Los Angeles, CA (United States)

    2016-06-15

    Purpose: To obtain a functional relationship that calibrates the lung tissue density change under free breathing conditions through correlating Jacobian values to the Hounsfield units. Methods: Free-breathing lung computed tomography images were acquired using a fast helical CT protocol, where 25 scans were acquired per patient. Using a state-of-the-art deformable registration algorithm, a set of the deformation vector fields (DVF) was generated to provide spatial mapping from the reference image geometry to the other free-breathing scans. These DVFs were used to generate Jacobian maps, which estimate voxelwise volume change. Subsequently, the set of 25 corresponding Jacobian and voxel intensity in Hounsfield units (HU) were collected and linear regression was performed based on the mass conservation relationship to correlate the volume change to density change. Based on the resulting fitting coefficients, the tissues were classified into parenchymal (Type I), vascular (Type II), and soft tissue (Type III) types. These coefficients modeled the voxelwise density variation during quiet breathing. The accuracy of the proposed method was assessed using mean absolute difference in HU between the CT scan intensities and the model predicted values. In addition, validation experiments employing a leave-five-out method were performed to evaluate the model accuracy. Results: The computed mean model errors were 23.30±9.54 HU, 29.31±10.67 HU, and 35.56±20.56 HU, respectively, for regions I, II, and III, respectively. The cross validation experiments averaged over 100 trials had mean errors of 30.02 ± 1.67 HU over the entire lung. These mean values were comparable with the estimated CT image background noise. Conclusion: The reported validation experiment statistics confirmed the lung density modeling during free breathing. The proposed technique was general and could be applied to a wide range of problem scenarios where accurate dynamic lung density information is needed

  5. A new and fast image feature selection method for developing an optimal mammographic mass detection scheme.

    Science.gov (United States)

    Tan, Maxine; Pu, Jiantao; Zheng, Bin

    2014-08-01

    Selecting optimal features from a large image feature pool remains a major challenge in developing computer-aided detection (CAD) schemes of medical images. The objective of this study is to investigate a new approach to significantly improve efficacy of image feature selection and classifier optimization in developing a CAD scheme of mammographic masses. An image dataset including 1600 regions of interest (ROIs) in which 800 are positive (depicting malignant masses) and 800 are negative (depicting CAD-generated false positive regions) was used in this study. After segmentation of each suspicious lesion by a multilayer topographic region growth algorithm, 271 features were computed in different feature categories including shape, texture, contrast, isodensity, spiculation, local topological features, as well as the features related to the presence and location of fat and calcifications. Besides computing features from the original images, the authors also computed new texture features from the dilated lesion segments. In order to select optimal features from this initial feature pool and build a highly performing classifier, the authors examined and compared four feature selection methods to optimize an artificial neural network (ANN) based classifier, namely: (1) Phased Searching with NEAT in a Time-Scaled Framework, (2) A sequential floating forward selection (SFFS) method, (3) A genetic algorithm (GA), and (4) A sequential forward selection (SFS) method. Performances of the four approaches were assessed using a tenfold cross validation method. Among these four methods, SFFS has highest efficacy, which takes 3%-5% of computational time as compared to GA approach, and yields the highest performance level with the area under a receiver operating characteristic curve (AUC) = 0.864 ± 0.034. The results also demonstrated that except using GA, including the new texture features computed from the dilated mass segments improved the AUC results of the ANNs optimized

  6. Model's sparse representation based on reduced mixed GMsFE basis methods

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, Lijian, E-mail: ljjiang@hnu.edu.cn [Institute of Mathematics, Hunan University, Changsha 410082 (China); Li, Qiuqi, E-mail: qiuqili@hnu.edu.cn [College of Mathematics and Econometrics, Hunan University, Changsha 410082 (China)

    2017-06-01

    In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a large number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in

  7. Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism.

    Science.gov (United States)

    Mohammed, Akram; Guda, Chittibabu

    2015-01-01

    Enzymes are known as the molecular machines that drive the metabolism of an organism; hence identification of the full enzyme complement of an organism is essential to build the metabolic blueprint of that species as well as to understand the interplay of multiple species in an ecosystem. Experimental characterization of the enzymatic reactions of all enzymes in a genome is a tedious and expensive task. The problem is more pronounced in the metagenomic samples where even the species are not adequately cultured or characterized. Enzymes encoded by the gut microbiota play an essential role in the host metabolism; thus, warranting the need to accurately identify and annotate the full enzyme complements of species in the genomic and metagenomic projects. To fulfill this need, we develop and apply a method called ECemble, an ensemble approach to identify enzymes and enzyme classes and study the human gut metabolic pathways. ECemble method uses an ensemble of machine-learning methods to accurately model and predict enzymes from protein sequences and also identifies the enzyme classes and subclasses at the finest resolution. A tenfold cross-validation result shows accuracy between 97 and 99% at different levels in the hierarchy of enzyme classification, which is superior to comparable methods. We applied ECemble to predict the entire complements of enzymes from ten sequenced proteomes including the human proteome. We also applied this method to predict enzymes encoded by the human gut microbiome from gut metagenomic samples, and to study the role played by the microbe-derived enzymes in the human metabolism. After mapping the known and predicted enzymes to canonical human pathways, we identified 48 pathways that have at least one bacteria-encoded enzyme, which demonstrates the complementary role of gut microbiome in human gut metabolism. These pathways are primarily involved in metabolizing dietary nutrients such as carbohydrates, amino acids, lipids, cofactors and

  8. Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism

    Science.gov (United States)

    2015-01-01

    Background Enzymes are known as the molecular machines that drive the metabolism of an organism; hence identification of the full enzyme complement of an organism is essential to build the metabolic blueprint of that species as well as to understand the interplay of multiple species in an ecosystem. Experimental characterization of the enzymatic reactions of all enzymes in a genome is a tedious and expensive task. The problem is more pronounced in the metagenomic samples where even the species are not adequately cultured or characterized. Enzymes encoded by the gut microbiota play an essential role in the host metabolism; thus, warranting the need to accurately identify and annotate the full enzyme complements of species in the genomic and metagenomic projects. To fulfill this need, we develop and apply a method called ECemble, an ensemble approach to identify enzymes and enzyme classes and study the human gut metabolic pathways. Results ECemble method uses an ensemble of machine-learning methods to accurately model and predict enzymes from protein sequences and also identifies the enzyme classes and subclasses at the finest resolution. A tenfold cross-validation result shows accuracy between 97 and 99% at different levels in the hierarchy of enzyme classification, which is superior to comparable methods. We applied ECemble to predict the entire complements of enzymes from ten sequenced proteomes including the human proteome. We also applied this method to predict enzymes encoded by the human gut microbiome from gut metagenomic samples, and to study the role played by the microbe-derived enzymes in the human metabolism. After mapping the known and predicted enzymes to canonical human pathways, we identified 48 pathways that have at least one bacteria-encoded enzyme, which demonstrates the complementary role of gut microbiome in human gut metabolism. These pathways are primarily involved in metabolizing dietary nutrients such as carbohydrates, amino acids, lipids

  9. Accuracy assessment of high resolution satellite imagery orientation by leave-one-out method

    Science.gov (United States)

    Brovelli, Maria Antonia; Crespi, Mattia; Fratarcangeli, Francesca; Giannone, Francesca; Realini, Eugenio

    Interest in high-resolution satellite imagery (HRSI) is spreading in several application fields, at both scientific and commercial levels. Fundamental and critical goals for the geometric use of this kind of imagery are their orientation and orthorectification, processes able to georeference the imagery and correct the geometric deformations they undergo during acquisition. In order to exploit the actual potentialities of orthorectified imagery in Geomatics applications, the definition of a methodology to assess the spatial accuracy achievable from oriented imagery is a crucial topic. In this paper we want to propose a new method for accuracy assessment based on the Leave-One-Out Cross-Validation (LOOCV), a model validation method already applied in different fields such as machine learning, bioinformatics and generally in any other field requiring an evaluation of the performance of a learning algorithm (e.g. in geostatistics), but never applied to HRSI orientation accuracy assessment. The proposed method exhibits interesting features which are able to overcome the most remarkable drawbacks involved by the commonly used method (Hold-Out Validation — HOV), based on the partitioning of the known ground points in two sets: the first is used in the orientation-orthorectification model (GCPs — Ground Control Points) and the second is used to validate the model itself (CPs — Check Points). In fact the HOV is generally not reliable and it is not applicable when a low number of ground points is available. To test the proposed method we implemented a new routine that performs the LOOCV in the software SISAR, developed by the Geodesy and Geomatics Team at the Sapienza University of Rome to perform the rigorous orientation of HRSI; this routine was tested on some EROS-A and QuickBird images. Moreover, these images were also oriented using the world recognized commercial software OrthoEngine v. 10 (included in the Geomatica suite by PCI), manually performing the LOOCV

  10. Optimizing methods for linking cinematic features to fMRI data.

    Science.gov (United States)

    Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

    2015-04-15

    One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved

  11. Do qualitative methods validate choice experiment-results? A case study on the economic valuation of peatland restoration in Central Kalimantan, Indonesia

    Energy Technology Data Exchange (ETDEWEB)

    Schaafsma, M.; Van Beukering, P.J.H.; Davies, O.; Oskolokaite, I.

    2009-05-15

    This study explores the benefits of combining independent results of qualitative focus group discussions (FGD) with a quantitative choice experiment (CE) in a developing country context. The assessment addresses the compensation needed by local communities in Central Kalimantan to cooperate in peatland restoration programs by using a CE combined with a series of FGD to validate and explain the CE-results. The main conclusion of this study is that a combination of qualitative and quantitative methods is necessary to assess the economic value of ecological services in monetary terms and to better understand the underlying attitudes and motives that drive these outcomes. The FGD not only cross-validate results of the CE, but also help to interpret the differences in preferences of respondents arising from environmental awareness and ecosystem characteristics. The FGD confirms that the CE results provide accurate information for ecosystem valuation. Additional to the advantages of FGD listed in the literature, this study finds that FGD provide the possibility to identify the specific terms and conditions on which respondents will accept land-use change scenarios. The results show that FGD may help to address problems regarding the effects of distribution of costs and benefits over time that neo-classical economic theory poses for the interpretation of economic valuation results in the demand it puts on the rationality of trade-offs and the required calculations.

  12. Enhancing the discrimination accuracy between metastases, gliomas and meningiomas on brain MRI by volumetric textural features and ensemble pattern recognition methods.

    Science.gov (United States)

    Georgiadis, Pantelis; Cavouras, Dionisis; Kalatzis, Ioannis; Glotsos, Dimitris; Athanasiadis, Emmanouil; Kostopoulos, Spiros; Sifaki, Koralia; Malamas, Menelaos; Nikiforidis, George; Solomou, Ekaterini

    2009-01-01

    Three-dimensional (3D) texture analysis of volumetric brain magnetic resonance (MR) images has been identified as an important indicator for discriminating among different brain pathologies. The purpose of this study was to evaluate the efficiency of 3D textural features using a pattern recognition system in the task of discriminating benign, malignant and metastatic brain tissues on T1 postcontrast MR imaging (MRI) series. The dataset consisted of 67 brain MRI series obtained from patients with verified and untreated intracranial tumors. The pattern recognition system was designed as an ensemble classification scheme employing a support vector machine classifier, specially modified in order to integrate the least squares features transformation logic in its kernel function. The latter, in conjunction with using 3D textural features, enabled boosting up the performance of the system in discriminating metastatic, malignant and benign brain tumors with 77.14%, 89.19% and 93.33% accuracy, respectively. The method was evaluated using an external cross-validation process; thus, results might be considered indicative of the generalization performance of the system to "unseen" cases. The proposed system might be used as an assisting tool for brain tumor characterization on volumetric MRI series.

  13. Performance comparison of deep learning and segmentation-based radiomic methods in the task of distinguishing benign and malignant breast lesions on DCE-MRI

    Science.gov (United States)

    Antropova, Natasha; Huynh, Benjamin; Giger, Maryellen

    2017-03-01

    Intuitive segmentation-based CADx/radiomic features, calculated from the lesion segmentations of dynamic contrast-enhanced magnetic resonance images (DCE-MRIs) have been utilized in the task of distinguishing between malignant and benign lesions. Additionally, transfer learning with pre-trained deep convolutional neural networks (CNNs) allows for an alternative method of radiomics extraction, where the features are derived directly from the image data. However, the comparison of computer-extracted segmentation-based and CNN features in MRI breast lesion characterization has not yet been conducted. In our study, we used a DCE-MRI database of 640 breast cases - 191 benign and 449 malignant. Thirty-eight segmentation-based features were extracted automatically using our quantitative radiomics workstation. Also, 2D ROIs were selected around each lesion on the DCE-MRIs and directly input into a pre-trained CNN AlexNet, yielding CNN features. Each method was investigated separately and in combination in terms of performance in the task of distinguishing between benign and malignant lesions. Area under the ROC curve (AUC) served as the figure of merit. Both methods yielded promising classification performance with round-robin cross-validated AUC values of 0.88 (se =0.01) and 0.76 (se=0.02) for segmentationbased and deep learning methods, respectively. Combination of the two methods enhanced the performance in malignancy assessment resulting in an AUC value of 0.91 (se=0.01), a statistically significant improvement over the performance of the CNN method alone.

  14. Variable importance and prediction methods for longitudinal problems with missing variables.

    Directory of Open Access Journals (Sweden)

    Iván Díaz

    Full Text Available We present prediction and variable importance (VIM methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can provide a tool to make care decisions informed by the high-dimensional patient's physiological and clinical history. Our VIM parameters are analogous to slope coefficients in adjusted regressions, but are not dependent on a specific statistical model, nor require a certain functional form of the prediction regression to be estimated. In addition, they can be causally interpreted under causal and statistical assumptions as the expected outcome under time-specific clinical interventions, related to changes in the mean of the outcome if each individual experiences a specified change in the variable (keeping other variables in the model fixed. Better yet, the targeted MLE used is doubly robust and locally efficient. Because the proposed VIM does not constrain the prediction model fit, we use a very flexible ensemble learner (the SuperLearner, which returns a linear combination of a list of user-given algorithms. Not only is such a prediction algorithm intuitive appealing, it has theoretical justification as being asymptotically equivalent to the oracle selector. The results of the analysis show effects whose size and significance would have been not been found using a parametric approach (such as stepwise regression or LASSO. In addition, the procedure is even more compelling as the predictor on which it is based showed significant improvements in cross-validated fit, for instance area under the curve (AUC for a receiver-operator curve (ROC. Thus, given that 1 our VIM

  15. Prediction of interactions between viral and host proteins using supervised machine learning methods.

    Directory of Open Access Journals (Sweden)

    Ranjan Kumar Barman

    Full Text Available BACKGROUND: Viral-host protein-protein interaction plays a vital role in pathogenesis, since it defines viral infection of the host and regulation of the host proteins. Identification of key viral-host protein-protein interactions (PPIs has great implication for therapeutics. METHODS: In this study, a systematic attempt has been made to predict viral-host PPIs by integrating different features, including domain-domain association, network topology and sequence information using viral-host PPIs from VirusMINT. The three well-known supervised machine learning methods, such as SVM, Naïve Bayes and Random Forest, which are commonly used in the prediction of PPIs, were employed to evaluate the performance measure based on five-fold cross validation techniques. RESULTS: Out of 44 descriptors, best features were found to be domain-domain association and methionine, serine and valine amino acid composition of viral proteins. In this study, SVM-based method achieved better sensitivity of 67% over Naïve Bayes (37.49% and Random Forest (55.66%. However the specificity of Naïve Bayes was the highest (99.52% as compared with SVM (74% and Random Forest (89.08%. Overall, the SVM and Random Forest achieved accuracy of 71% and 72.41%, respectively. The proposed SVM-based method was evaluated on blind dataset and attained a sensitivity of 64%, specificity of 83%, and accuracy of 74%. In addition, unknown potential targets of hepatitis B virus-human and hepatitis E virus-human PPIs have been predicted through proposed SVM model and validated by gene ontology enrichment analysis. Our proposed model shows that, hepatitis B virus "C protein" binds to membrane docking protein, while "X protein" and "P protein" interacts with cell-killing and metabolic process proteins, respectively. CONCLUSION: The proposed method can predict large scale interspecies viral-human PPIs. The nature and function of unknown viral proteins (HBV and HEV, interacting partners of host

  16. Improved method for prioritization of disease associated lncRNAs based on ceRNA theory and functional genomics data.

    Science.gov (United States)

    Wang, Peng; Guo, Qiuyan; Gao, Yue; Zhi, Hui; Zhang, Yan; Liu, Yue; Zhang, Jizhou; Yue, Ming; Guo, Maoni; Ning, Shangwei; Zhang, Guangmei; Li, Xia

    2017-01-17

    Although several computational models that predict disease-associated lncRNAs (long non-coding RNAs) exist, only a limited number of disease-associated lncRNAs are known. In this study, we mapped lncRNAs to their functional genomics context using competing endogenous RNAs (ceRNAs) theory. Based on the criteria that similar lncRNAs are likely involved in similar diseases, we proposed a disease lncRNA prioritization method, DisLncPri, to identify novel disease-lncRNA associations. Using a leave-one-out cross validation (LOOCV) strategy, DisLncPri achieved reliable area under curve (AUC) values of 0.89 and 0.87 for the LncRNADisease and Lnc2Cancer datasets that further improved to 0.90 and 0.89 by integrating a multiple rank fusion strategy. We found that DisLncPri had the highest rank enrichment score and AUC value in comparison to several other methods for case studies of alzheimer's disease, ovarian cancer, pancreatic cancer and gastric cancer. Several novel lncRNAs in the top ranks of these diseases were found to be newly verified by relevant databases or reported in recent studies. Prioritization of lncRNAs from a microarray (GSE53622) of oesophageal cancer patients highlighted ENSG00000226029 (top 2), a previously unidentified lncRNA as a potential prognostic biomarker. Our analysis thus indicates that DisLncPri is an excellent tool for identifying lncRNAs that could be novel biomarkers and therapeutic targets in a variety of human diseases.

  17. IntNetLncSim: an integrative network analysis method to infer human lncRNA functional similarity.

    Science.gov (United States)

    Cheng, Liang; Shi, Hongbo; Wang, Zhenzhen; Hu, Yang; Yang, Haixiu; Zhou, Chen; Sun, Jie; Zhou, Meng

    2016-07-26

    Increasing evidence indicated that long non-coding RNAs (lncRNAs) were involved in various biological processes and complex diseases by communicating with mRNAs/miRNAs each other. Exploiting interactions between lncRNAs and mRNA/miRNAs to lncRNA functional similarity (LFS) is an effective method to explore function of lncRNAs and predict novel lncRNA-disease associations. In this article, we proposed an integrative framework, IntNetLncSim, to infer LFS by modeling the information flow in an integrated network that comprises both lncRNA-related transcriptional and post-transcriptional information. The performance of IntNetLncSim was evaluated by investigating the relationship of LFS with the similarity of lncRNA-related mRNA sets (LmRSets) and miRNA sets (LmiRSets). As a result, LFS by IntNetLncSim was significant positively correlated with the LmRSet (Pearson correlation γ2=0.8424) and LmiRSet (Pearson correlation γ2=0.2601). Particularly, the performance of IntNetLncSim is superior to several previous methods. In the case of applying the LFS to identify novel lncRNA-disease relationships, we achieved an area under the ROC curve (0.7300) in experimentally verified lncRNA-disease associations based on leave-one-out cross-validation. Furthermore, highly-ranked lncRNA-disease associations confirmed by literature mining demonstrated the excellent performance of IntNetLncSim. Finally, a web-accessible system was provided for querying LFS and potential lncRNA-disease relationships: http://www.bio-bigdata.com/IntNetLncSim.

  18. A new method using multiphoton imaging and morphometric analysis for differentiating chromophobe renal cell carcinoma and oncocytoma kidney tumors

    Science.gov (United States)

    Wu, Binlin; Mukherjee, Sushmita; Jain, Manu

    2016-03-01

    Distinguishing chromophobe renal cell carcinoma (chRCC) from oncocytoma on hematoxylin and eosin images may be difficult and require time-consuming ancillary procedures. Multiphoton microscopy (MPM), an optical imaging modality, was used to rapidly generate sub-cellular histological resolution images from formalin-fixed unstained tissue sections from chRCC and oncocytoma.Tissues were excited using 780nm wavelength and emission signals (including second harmonic generation and autofluorescence) were collected in different channels between 390 nm and 650 nm. Granular structure in the cell cytoplasm was observed in both chRCC and oncocytoma. Quantitative morphometric analysis was conducted to distinguish chRCC and oncocytoma. To perform the analysis, cytoplasm and granules in tumor cells were segmented from the images. Their area and fluorescence intensity were found in different channels. Multiple features were measured to quantify the morphological and fluorescence properties. Linear support vector machine (SVM) was used for classification. Re-substitution validation, cross validation and receiver operating characteristic (ROC) curve were implemented to evaluate the efficacy of the SVM classifier. A wrapper feature algorithm was used to select the optimal features which provided the best predictive performance in separating the two tissue types (classes). Statistical measures such as sensitivity, specificity, accuracy and area under curve (AUC) of ROC were calculated to evaluate the efficacy of the classification. Over 80% accuracy was achieved as the predictive performance. This method, if validated on a larger and more diverse sample set, may serve as an automated rapid diagnostic tool to differentiate between chRCC and oncocytoma. An advantage of such automated methods are that they are free from investigator bias and variability.

  19. Early identification of posttraumatic stress following military deployment: Application of machine learning methods to a prospective study of Danish soldiers.

    Science.gov (United States)

    Karstoft, Karen-Inge; Statnikov, Alexander; Andersen, Søren B; Madsen, Trine; Galatzer-Levy, Isaac R

    2015-09-15

    Pre-deployment identification of soldiers at risk for long-term posttraumatic stress psychopathology after home coming is important to guide decisions about deployment. Early post-deployment identification can direct early interventions to those in need and thereby prevents the development of chronic psychopathology. Both hold significant public health benefits given large numbers of deployed soldiers, but has so far not been achieved. Here, we aim to assess the potential for pre- and early post-deployment prediction of resilience or posttraumatic stress development in soldiers by application of machine learning (ML) methods. ML feature selection and prediction algorithms were applied to a prospective cohort of 561 Danish soldiers deployed to Afghanistan in 2009 to identify unique risk indicators and forecast long-term posttraumatic stress responses. Robust pre- and early postdeployment risk indicators were identified, and included individual PTSD symptoms as well as total level of PTSD symptoms, previous trauma and treatment, negative emotions, and thought suppression. The predictive performance of these risk indicators combined was assessed by cross-validation. Together, these indicators forecasted long term posttraumatic stress responses with high accuracy (pre-deployment: AUC = 0.84 (95% CI = 0.81-0.87), post-deployment: AUC = 0.88 (95% CI = 0.85-0.91)). This study utilized a previously collected data set and was therefore not designed to exhaust the potential of ML methods. Further, the study relied solely on self-reported measures. Pre-deployment and early post-deployment identification of risk for long-term posttraumatic psychopathology are feasible and could greatly reduce the public health costs of war. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. DBAC: A simple prediction method for protein binding hot spots based on burial levels and deeply buried atomic contacts

    Science.gov (United States)

    2011-01-01

    Background A protein binding hot spot is a cluster of residues in the interface that are energetically important for the binding of the protein with its interaction partner. Identifying protein binding hot spots can give useful information to protein engineering and drug design, and can also deepen our understanding of protein-protein interaction. These residues are usually buried inside the interface with very low solvent accessible surface area (SASA). Thus SASA is widely used as an outstanding feature in hot spot prediction by many computational methods. However, SASA is not capable of distinguishing slightly buried residues, of which most are non hot spots, and deeply buried ones that are usually inside a hot spot. Results We propose a new descriptor called “burial level” for characterizing residues, atoms and atomic contacts. Specifically, burial level captures the depth the residues are buried. We identify different kinds of deeply buried atomic contacts (DBAC) at different burial levels that are directly broken in alanine substitution. We use their numbers as input for SVM to classify between hot spot or non hot spot residues. We achieve F measure of 0.6237 under the leave-one-out cross-validation on a data set containing 258 mutations. This performance is better than other computational methods. Conclusions Our results show that hot spot residues tend to be deeply buried in the interface, not just having a low SASA value. This indicates that a high burial level is not only a necessary but also a more sufficient condition than a low SASA for a residue to be a hot spot residue. We find that those deeply buried atoms become increasingly more important when their burial levels rise up. This work also confirms the contribution of deeply buried interfacial atomic contacts to the energy of protein binding hot spot. PMID:21689480

  1. Comparison of two methods forecasting binding rate of plasma protein.

    Science.gov (United States)

    Hongjiu, Liu; Yanrong, Hu

    2014-01-01

    By introducing the descriptors calculated from the molecular structure, the binding rates of plasma protein (BRPP) with seventy diverse drugs are modeled by a quantitative structure-activity relationship (QSAR) technique. Two algorithms, heuristic algorithm (HA) and support vector machine (SVM), are used to establish linear and nonlinear models to forecast BRPP. Empirical analysis shows that there are good performances for HA and SVM with cross-validation correlation coefficients Rcv(2) of 0.80 and 0.83. Comparing HA with SVM, it was found that SVM has more stability and more robustness to forecast BRPP.

  2. Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

    Directory of Open Access Journals (Sweden)

    Hoefsloot Huub CJ

    2009-05-01

    Full Text Available Abstract Background Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Results Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high. We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated. Conclusion We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre

  3. A Generally Applicable Computer Algorithm Based on the Group Additivity Method for the Calculation of Seven Molecular Descriptors: Heat of Combustion, LogPO/W, LogS, Refractivity, Polarizability, Toxicity and LogBB of Organic Compounds; Scope and Limits of Applicability

    Directory of Open Access Journals (Sweden)

    Rudolf Naef

    2015-10-01

    Full Text Available A generally applicable computer algorithm for the calculation of the seven molecular descriptors heat of combustion, logPoctanol/water, logS (water solubility, molar refractivity, molecular polarizability, aqueous toxicity (protozoan growth inhibition and logBB (log (cblood/cbrain is presented. The method, an extendable form of the group-additivity method, is based on the complete break-down of the molecules into their constituting atoms and their immediate neighbourhood. The contribution of the resulting atom groups to the descriptor values is calculated using the Gauss-Seidel fitting method, based on experimental data gathered from literature. The plausibility of the method was tested for each descriptor by means of a k-fold cross-validation procedure demonstrating good to excellent predictive power for the former six descriptors and low reliability of logBB predictions. The goodness of fit (Q2 and the standard deviation of the 10-fold cross-validation calculation was >0.9999 and 25.2 kJ/mol, respectively, (based on N = 1965 test compounds for the heat of combustion, 0.9451 and 0.51 (N = 2640 for logP, 0.8838 and 0.74 (N = 1419 for logS, 0.9987 and 0.74 (N = 4045 for the molar refractivity, 0.9897 and 0.77 (N = 308 for the molecular polarizability, 0.8404 and 0.42 (N = 810 for the toxicity and 0.4709 and 0.53 (N = 383 for logBB. The latter descriptor revealing a very low Q2 for the test molecules (R2 was 0.7068 and standard deviation 0.38 for N = 413 training molecules is included as an example to show the limits of the group-additivity method. An eighth molecular descriptor, the heat of formation, was indirectly calculated from the heat of combustion data and correlated with published experimental heat of formation data with a correlation coefficient R2 of 0.9974 (N = 2031.

  4. Interventions of the nursing diagnosis „Acute Pain“ – Evaluation of patients' experiences after total hip arthroplasty compared with the nursing record by using Q-DIO-Pain: a mixed methods study

    Science.gov (United States)

    Zanon, David C; Gralher, Dieter; Müller-Staub, Maria

    2017-01-01

    Background: Pain affects patients' rehabilitation after hip replacement surgery. Aim: The study aim was to compare patients' responses, on their received pain relieving nursing interventions after hip replacement surgery, with the documented interventions in their nursing records. Method: A mixed methods design was applied. In order to evaluate quantitative data the instrument „Quality of Diagnoses, Interventions and Outcomes“ (Q-DIO) was further developed to measure pain interventions in nursing records (Q-DIO-Pain). Patients (n = 37) answered a survey on the third postoperative day. The patients' survey findings were then compared with the Q-DIO-Pain results and cross-validated by qualitative interviews. Results: The most reported pain level was „no pain“ (NRS 0 – 10 Points). However, 17 – 50 % of patients reported pain levels of three or higher and 11 – 22 % of five or higher in situations of motion / ambulation. A significant match between patients' findings and Q-DIO-Pain results was found for the intervention „helping to adapt medications“ (n = 32, ICC = 0.111, p = 0.042, CI 95 % 2-sided). Otherwise no significant matches were found. Interviews with patients and nurses confirmed that far more pain-relieving interventions affecting „Acute Pain“ were carried out, than were documented. Conclusions: Based on the results, pain assessments and effective pain-relieving interventions, especially before or after motion / ambulation should be improved and documented. It is recommended to implement a nursing standard for pain control.

  5. Ensemble Data Mining Methods

    Data.gov (United States)

    National Aeronautics and Space Administration — Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve...

  6. BDF-methods

    DEFF Research Database (Denmark)

    Hostrup, Astrid Kuijers

    1999-01-01

    An introduction to BDF-methods is given. The use of these methods on differential algebraic equations (DAE's) with different indexes with respect to order, stability and convergens of the BDF-methods is presented.......An introduction to BDF-methods is given. The use of these methods on differential algebraic equations (DAE's) with different indexes with respect to order, stability and convergens of the BDF-methods is presented....

  7. Uranium price forecasting methods

    International Nuclear Information System (INIS)

    Fuller, D.M.

    1994-01-01

    This article reviews a number of forecasting methods that have been applied to uranium prices and compares their relative strengths and weaknesses. The methods reviewed are: (1) judgemental methods, (2) technical analysis, (3) time-series methods, (4) fundamental analysis, and (5) econometric methods. Historically, none of these methods has performed very well, but a well-thought-out model is still useful as a basis from which to adjust to new circumstances and try again

  8. Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery

    International Nuclear Information System (INIS)

    Hu, Chao; Jain, Gaurav; Zhang, Puqiang; Schmidt, Craig; Gomadam, Parthasarathy; Gorka, Tom

    2014-01-01

    Highlights: • We develop a data-driven method for the battery capacity estimation. • Five charge-related features that are indicative of the capacity are defined. • The kNN regression model captures the dependency of the capacity on the features. • Results with 10 years’ continuous cycling data verify the effectiveness of the method. - Abstract: Reliability of lithium-ion (Li-ion) rechargeable batteries used in implantable medical devices has been recognized as of high importance from a broad range of stakeholders, including medical device manufacturers, regulatory agencies, physicians, and patients. To ensure Li-ion batteries in these devices operate reliably, it is important to be able to assess the battery health condition by estimating the battery capacity over the life-time. This paper presents a data-driven method for estimating the capacity of Li-ion battery based on the charge voltage and current curves. The contributions of this paper are three-fold: (i) the definition of five characteristic features of the charge curves that are indicative of the capacity, (ii) the development of a non-linear kernel regression model, based on the k-nearest neighbor (kNN) regression, that captures the complex dependency of the capacity on the five features, and (iii) the adaptation of particle swarm optimization (PSO) to finding the optimal combination of feature weights for creating a kNN regression model that minimizes the cross validation (CV) error in the capacity estimation. Verification with 10 years’ continuous cycling data suggests that the proposed method is able to accurately estimate the capacity of Li-ion battery throughout the whole life-time

  9. Methods in aquatic bacteriology

    National Research Council Canada - National Science Library

    Austin, B

    1988-01-01

    .... Within these sections detailed chapters consider sampling methods, determination of biomass, isolation methods, identification, the bacterial microflora of fish, invertebrates, plants and the deep...

  10. A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

    Science.gov (United States)

    Kilian, Reinhold; Matschinger, Herbert; Löeffler, Walter; Roick, Christiane; Angermeyer, Matthias C

    2002-03-01

    Transformation of the dependent cost variable is often used to solve the problems of heteroscedasticity and skewness in linear ordinary least square regression of health service cost data. However, transformation may cause difficulties in the interpretation of regression coefficients and the retransformation of predicted values. The study compares the advantages and disadvantages of different methods to estimate regression based cost functions using data on the annual costs of schizophrenia treatment. Annual costs of psychiatric service use and clinical and socio-demographic characteristics of the patients were assessed for a sample of 254 patients with a diagnosis of schizophrenia (ICD-10 F 20.0) living in Leipzig. The clinical characteristics of the participants were assessed by means of the BPRS 4.0, the GAF, and the CAN for service needs. Quality of life was measured by WHOQOL-BREF. A linear OLS regression model with non-parametric standard errors, a log-transformed OLS model and a generalized linear model with a log-link and a gamma distribution were used to estimate service costs. For the estimation of robust non-parametric standard errors, the variance estimator by White and a bootstrap estimator based on 2000 replications were employed. Models were evaluated by the comparison of the R2 and the root mean squared error (RMSE). RMSE of the log-transformed OLS model was computed with three different methods of bias-correction. The 95% confidence intervals for the differences between the RMSE were computed by means of bootstrapping. A split-sample-cross-validation procedure was used to forecast the costs for the one half of the sample on the basis of a regression equation computed for the other half of the sample. All three methods showed significant positive influences of psychiatric symptoms and met psychiatric service needs on service costs. Only the log- transformed OLS model showed a significant negative impact of age, and only the GLM shows a significant

  11. A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases

    Directory of Open Access Journals (Sweden)

    Karp Peter D

    2004-06-01

    probability threshold of 0.9 during cross-validation using known reactions in computationally-predicted pathway databases. After applying our method to 513 pathway holes in 333 pathways from three Pathway/Genome databases, we increased the number of complete pathways by 42%. We made putative assignments to 46% of the holes, including annotation of 17 sequences of previously unknown function. Conclusions Our pathway hole filler can be used not only to increase the utility of Pathway/Genome databases to both experimental and computational researchers, but also to improve predictions of protein function.

  12. Transport equation solving methods

    International Nuclear Information System (INIS)

    Granjean, P.M.

    1984-06-01

    This work is mainly devoted to Csub(N) and Fsub(N) methods. CN method: starting from a lemma stated by Placzek, an equivalence is established between two problems: the first one is defined in a finite medium bounded by a surface S, the second one is defined in the whole space. In the first problem the angular flux on the surface S is shown to be the solution of an integral equation. This equation is solved by Galerkin's method. The Csub(N) method is applied here to one-velocity problems: in plane geometry, slab albedo and transmission with Rayleigh scattering, calculation of the extrapolation length; in cylindrical geometry, albedo and extrapolation length calculation with linear scattering. Fsub(N) method: the basic integral transport equation of the Csub(N) method is integrated on Case's elementary distributions; another integral transport equation is obtained: this equation is solved by a collocation method. The plane problems solved by the Csub(N) method are also solved by the Fsub(N) method. The Fsub(N) method is extended to any polynomial scattering law. Some simple spherical problems are also studied. Chandrasekhar's method, collision probability method, Case's method are presented for comparison with Csub(N) and Fsub(N) methods. This comparison shows the respective advantages of the two methods: a) fast convergence and possible extension to various geometries for Csub(N) method; b) easy calculations and easy extension to polynomial scattering for Fsub(N) method [fr

  13. 1HNMR-Based metabolomic profiling method to develop plasma biomarkers for sensitivity to chronic heat stress in growing pigs.

    Directory of Open Access Journals (Sweden)

    Samir Dou

    Full Text Available The negative impact of heat stress (HS on the production performances in pig faming is of particular concern. Novel diagnostic methods are needed to predict the robustness of pigs to HS. Our study aimed to assess the reliability of blood metabolome to predict the sensitivity to chronic HS of 10 F1 (Large White × Creole sire families (SF reared in temperate (TEMP and in tropical (TROP regions (n = 56±5 offsprings/region/SF. Live body weight (BW and rectal temperature (RT were recorded at 23 weeks of age. Average daily feed intake (AFDI and average daily gain were calculated from weeks 11 to 23 of age, together with feed conversion ratio. Plasma blood metabolome profiles were obtained by Nuclear Magnetic Resonance spectroscopy (1HNMR from blood samples collected at week 23 in TEMP. The sensitivity to hot climatic conditions of each SF was estimated by computing a composite index of sensitivity (Isens derived from a linear combination of t statistics applied to familial BW, ADFI and RT in TEMP and TROP climates. A model of prediction of sensitivity was established with sparse Partial Least Square Discriminant Analysis (sPLS-DA between the two most robust SF (n = 102 and the two most sensitive ones (n = 121 using individual metabolomic profiles measured in TEMP. The sPLS-DA selected 29 buckets that enabled 78% of prediction accuracy by cross-validation. On the basis of this training, we predicted the proportion of sensitive pigs within the 6 remaining families (n = 337. This proportion was defined as the predicted membership of families to the sensitive category. The positive correlation between this proportion and Isens (r = 0.97, P < 0.01 suggests that plasma metabolome can be used to predict the sensitivity of pigs to hot climate.

  14. Comparison of standard resampling methods for performance estimation of artificial neural network ensembles

    OpenAIRE

    Green, Michael; Ohlsson, Mattias

    2007-01-01

    Estimation of the generalization performance for classification within the medical applications domain is always an important task. In this study we focus on artificial neural network ensembles as the machine learning technique. We present a numerical comparison between five common resampling techniques: k-fold cross validation (CV), holdout, using three cutoffs, and bootstrap using five different data sets. The results show that CV together with holdout $0.25$ and $0.50$ are the best resampl...

  15. Computer-aided breast MR image feature analysis for prediction of tumor response to chemotherapy

    International Nuclear Information System (INIS)

    Aghaei, Faranak; Tan, Maxine; Liu, Hong; Zheng, Bin; Hollingsworth, Alan B.; Qian, Wei

    2015-01-01

    Purpose: To identify a new clinical marker based on quantitative kinetic image features analysis and assess its feasibility to predict tumor response to neoadjuvant chemotherapy. Methods: The authors assembled a dataset involving breast MR images acquired from 68 cancer patients before undergoing neoadjuvant chemotherapy. Among them, 25 patients had complete response (CR) and 43 had partial and nonresponse (NR) to chemotherapy based on the response evaluation criteria in solid tumors. The authors developed a computer-aided detection scheme to segment breast areas and tumors depicted on the breast MR images and computed a total of 39 kinetic image features from both tumor and background parenchymal enhancement regions. The authors then applied and tested two approaches to classify between CR and NR cases. The first one analyzed each individual feature and applied a simple feature fusion method that combines classification results from multiple features. The second approach tested an attribute selected classifier that integrates an artificial neural network (ANN) with a wrapper subset evaluator, which was optimized using a leave-one-case-out validation method. Results: In the pool of 39 features, 10 yielded relatively higher classification performance with the areas under receiver operating characteristic curves (AUCs) ranging from 0.61 to 0.78 to classify between CR and NR cases. Using a feature fusion method, the maximum AUC = 0.85 ± 0.05. Using the ANN-based classifier, AUC value significantly increased to 0.96 ± 0.03 (p < 0.01). Conclusions: This study demonstrated that quantitative analysis of kinetic image features computed from breast MR images acquired prechemotherapy has potential to generate a useful clinical marker in predicting tumor response to chemotherapy

  16. Computer-aided breast MR image feature analysis for prediction of tumor response to chemotherapy

    Energy Technology Data Exchange (ETDEWEB)

    Aghaei, Faranak; Tan, Maxine; Liu, Hong; Zheng, Bin, E-mail: Bin.Zheng-1@ou.edu [School of Electrical and Computer Engineering, University of Oklahoma, Norman, Oklahoma 73019 (United States); Hollingsworth, Alan B. [Mercy Women’s Center, Mercy Health Center, Oklahoma City, Oklahoma 73120 (United States); Qian, Wei [Department of Electrical and Computer Engineering, University of Texas, El Paso, Texas 79968 (United States)

    2015-11-15

    Purpose: To identify a new clinical marker based on quantitative kinetic image features analysis and assess its feasibility to predict tumor response to neoadjuvant chemotherapy. Methods: The authors assembled a dataset involving breast MR images acquired from 68 cancer patients before undergoing neoadjuvant chemotherapy. Among them, 25 patients had complete response (CR) and 43 had partial and nonresponse (NR) to chemotherapy based on the response evaluation criteria in solid tumors. The authors developed a computer-aided detection scheme to segment breast areas and tumors depicted on the breast MR images and computed a total of 39 kinetic image features from both tumor and background parenchymal enhancement regions. The authors then applied and tested two approaches to classify between CR and NR cases. The first one analyzed each individual feature and applied a simple feature fusion method that combines classification results from multiple features. The second approach tested an attribute selected classifier that integrates an artificial neural network (ANN) with a wrapper subset evaluator, which was optimized using a leave-one-case-out validation method. Results: In the pool of 39 features, 10 yielded relatively higher classification performance with the areas under receiver operating characteristic curves (AUCs) ranging from 0.61 to 0.78 to classify between CR and NR cases. Using a feature fusion method, the maximum AUC = 0.85 ± 0.05. Using the ANN-based classifier, AUC value significantly increased to 0.96 ± 0.03 (p < 0.01). Conclusions: This study demonstrated that quantitative analysis of kinetic image features computed from breast MR images acquired prechemotherapy has potential to generate a useful clinical marker in predicting tumor response to chemotherapy.

  17. Body composition estimation from selected slices: equations computed from a new semi-automatic thresholding method developed on whole-body CT scans

    Directory of Open Access Journals (Sweden)

    Alizé Lacoste Jeanson

    2017-05-01

    Full Text Available Background Estimating volumes and masses of total body components is important for the study and treatment monitoring of nutrition and nutrition-related disorders, cancer, joint replacement, energy-expenditure and exercise physiology. While several equations have been offered for estimating total body components from MRI slices, no reliable and tested method exists for CT scans. For the first time, body composition data was derived from 41 high-resolution whole-body CT scans. From these data, we defined equations for estimating volumes and masses of total body AT and LT from corresponding tissue areas measured in selected CT scan slices. Methods We present a new semi-automatic approach to defining the density cutoff between adipose tissue (AT and lean tissue (LT in such material. An intra-class correlation coefficient (ICC was used to validate the method. The equations for estimating the whole-body composition volume and mass from areas measured in selected slices were modeled with ordinary least squares (OLS linear regressions and support vector machine regression (SVMR. Results and Discussion The best predictive equation for total body AT volume was based on the AT area of a single slice located between the 4th and 5th lumbar vertebrae (L4-L5 and produced lower prediction errors (|PE| = 1.86 liters, %PE = 8.77 than previous equations also based on CT scans. The LT area of the mid-thigh provided the lowest prediction errors (|PE| = 2.52 liters, %PE = 7.08 for estimating whole-body LT volume. We also present equations to predict total body AT and LT masses from a slice located at L4-L5 that resulted in reduced error compared with the previously published equations based on CT scans. The multislice SVMR predictor gave the theoretical upper limit for prediction precision of volumes and cross-validated the results.

  18. Profiling plasma extracellular vesicle by pluronic block-copolymer based enrichment method unveils features associated with breast cancer aggression, metastasis and invasion.

    Science.gov (United States)

    Zhong, Zhenyu; Rosenow, Matthew; Xiao, Nick; Spetzler, David

    2018-01-01

    Extracellular vesicle (EV)-based liquid biopsies have been proposed to be a readily obtainable biological substrate recently for both profiling and diagnostics purposes. Development of a fast and reliable preparation protocol to enrich such small particles could accelerate the discovery of informative, disease-related biomarkers. Though multiple EV enrichment protocols are available, in terms of efficiency, reproducibility and simplicity, precipitation-based methods are most amenable to studies with large numbers of subjects. However, the selectivity of the precipitation becomes critical. Here, we present a simple plasma EV enrichment protocol based on pluronic block copolymer. The enriched plasma EV was able to be verified by multiple platforms. Our results showed that the particles enriched from plasma by the copolymer were EV size vesicles with membrane structure; proteomic profiling showed that EV-related proteins were significantly enriched, while high-abundant plasma proteins were significantly reduced in comparison to other precipitation-based enrichment methods. Next-generation sequencing confirmed the existence of various RNA species that have been observed in EVs from previous studies. Small RNA sequencing showed enriched species compared to the corresponding plasma. Moreover, plasma EVs enriched from 20 advanced breast cancer patients and 20 age-matched non-cancer controls were profiled by semi-quantitative mass spectrometry. Protein features were further screened by EV proteomic profiles generated from four breast cancer cell lines, and then selected in cross-validation models. A total of 60 protein features that highly contributed in model prediction were identified. Interestingly, a large portion of these features were associated with breast cancer aggression, metastasis as well as invasion, consistent with the advanced clinical stage of the patients. In summary, we have developed a plasma EV enrichment method with improved precipitation selectivity

  19. Geospatial distribution modeling and determining suitability of groundwater quality for irrigation purpose using geospatial methods and water quality index (WQI) in Northern Ethiopia

    Science.gov (United States)

    Gidey, Amanuel

    2018-06-01

    Determining suitability and vulnerability of groundwater quality for irrigation use is a key alarm and first aid for careful management of groundwater resources to diminish the impacts on irrigation. This study was conducted to determine the overall suitability of groundwater quality for irrigation use and to generate their spatial distribution maps in Elala catchment, Northern Ethiopia. Thirty-nine groundwater samples were collected to analyze and map the water quality variables. Atomic absorption spectrophotometer, ultraviolet spectrophotometer, titration and calculation methods were used for laboratory groundwater quality analysis. Arc GIS, geospatial analysis tools, semivariogram model types and interpolation methods were used to generate geospatial distribution maps. Twelve and eight water quality variables were used to produce weighted overlay and irrigation water quality index models, respectively. Root-mean-square error, mean square error, absolute square error, mean error, root-mean-square standardized error, measured values versus predicted values were used for cross-validation. The overall weighted overlay model result showed that 146 km2 areas are highly suitable, 135 km2 moderately suitable and 60 km2 area unsuitable for irrigation use. The result of irrigation water quality index confirms 10.26% with no restriction, 23.08% with low restriction, 20.51% with moderate restriction, 15.38% with high restriction and 30.76% with the severe restriction for irrigation use. GIS and irrigation water quality index are better methods for irrigation water resources management to achieve a full yield irrigation production to improve food security and to sustain it for a long period, to avoid the possibility of increasing environmental problems for the future generation.

  20. A simple method for measuring signs of {sup 1}H{sup N} chemical shift differences between ground and excited protein states

    Energy Technology Data Exchange (ETDEWEB)

    Bouvignies, Guillaume; Korzhnev, Dmitry M.; Neudecker, Philipp; Hansen, D. Flemming [University of Toronto, Departments of Molecular Genetics, Biochemistry and Chemistry (Canada); Cordes, Matthew H. J. [University of Arizona, Department of Chemistry and Biochemistry (United States); Kay, Lewis E., E-mail: kay@pound.med.utoronto.c [University of Toronto, Departments of Molecular Genetics, Biochemistry and Chemistry (Canada)

    2010-06-15

    NMR relaxation dispersion spectroscopy is a powerful method for studying protein conformational dynamics whereby visible, ground and invisible, excited conformers interconvert on the millisecond time-scale. In addition to providing kinetics and thermodynamics parameters of the exchange process, the CPMG dispersion experiment also allows extraction of the absolute values of the chemical shift differences between interconverting states, |{Delta}{omega}-tilde|, opening the way for structure determination of excited state conformers. Central to the goal of structural analysis is the availability of the chemical shifts of the excited state that can only be obtained once the signs of {Delta}{omega}-tilde are known. Herein we describe a very simple method for determining the signs of {sup 1}H{sup N} {Delta}{omega}-tilde values based on a comparison of peak positions in the directly detected dimensions of a pair of {sup 1}H{sup N}-{sup 15}N correlation maps recorded at different static magnetic fields. The utility of the approach is demonstrated for three proteins that undergo millisecond time-scale conformational rearrangements. Although the method provides fewer signs than previously published techniques it does have a number of strengths: (1) Data sets needed for analysis are typically available from other experiments, such as those required for measuring signs of {sup 15}N {Delta}{omega}-tilde values, thus requiring no additional experimental time, (2) acquisition times in the critical detection dimension can be as long as necessary and (3) the signs obtained can be used to cross-validate those from other approaches.

  1. Automated Tree Crown Delineation and Biomass Estimation from Airborne LiDAR data: A Comparison of Statistical and Machine Learning Methods

    Science.gov (United States)

    Gleason, C. J.; Im, J.

    2011-12-01

    Airborne LiDAR remote sensing has been used effectively in assessing forest biomass because of its canopy penetrating effects and its ability to accurately describe the canopy surface. Current research in assessing biomass using airborne LiDAR focuses on either the individual tree as a base unit of study or statistical representations of a small aggregation of trees (i.e., plot level), and both methods usually rely on regression against field data to model the relationship between the LiDAR-derived data (e.g., volume) and biomass. This study estimates biomass for mixed forests and coniferous plantations (Picea Abies) within Heiberg Memorial Forest, Tully, NY, at both the plot and individual tree level. Plots are regularly spaced with a radius of 13m, and field data include diameter at breast height (dbh), tree height, and tree species. Field data collection and LiDAR data acquisition were seasonally coincident and both obtained in August of 2010. Resulting point cloud density was >5pts/m2. LiDAR data were processed to provide a canopy height surface, and a combination of watershed segmentation, active contouring, and genetic algorithm optimization was applied to delineate individual trees from the surface. This updated delineation method was shown to be more accurate than traditional watershed segmentation. Once trees had been delineated, four biomass estimation models were applied and compared: support vector regression (SVR), linear mixed effects regression (LME), random forest (RF), and Cubist regression. Candidate variables to be used in modeling were derived from the LiDAR surface, and include metrics of height, width, and volume per delineated tree footprint. Previously published allometric equations provided field estimates of biomass to inform the regressions and calculate their accuracy via leave-one-out cross validation. This study found that for forests such as found in the study area, aggregation of individual trees to form a plot-based estimate of

  2. Predicting response before initiation of neoadjuvant chemotherapy in breast cancer using new methods for the analysis of dynamic contrast enhanced MRI (DCE MRI) data

    Science.gov (United States)

    DeGrandchamp, Joseph B.; Whisenant, Jennifer G.; Arlinghaus, Lori R.; Abramson, V. G.; Yankeelov, Thomas E.; Cárdenas-Rodríguez, Julio

    2016-03-01

    The pharmacokinetic parameters derived from dynamic contrast enhanced (DCE) MRI have shown promise as biomarkers for tumor response to therapy. However, standard methods of analyzing DCE MRI data (Tofts model) require high temporal resolution, high signal-to-noise ratio (SNR), and the Arterial Input Function (AIF). Such models produce reliable biomarkers of response only when a therapy has a large effect on the parameters. We recently reported a method that solves the limitations, the Linear Reference Region Model (LRRM). Similar to other reference region models, the LRRM needs no AIF. Additionally, the LRRM is more accurate and precise than standard methods at low SNR and slow temporal resolution, suggesting LRRM-derived biomarkers could be better predictors. Here, the LRRM, Non-linear Reference Region Model (NRRM), Linear Tofts model (LTM), and Non-linear Tofts Model (NLTM) were used to estimate the RKtrans between muscle and tumor (or the Ktrans for Tofts) and the tumor kep,TOI for 39 breast cancer patients who received neoadjuvant chemotherapy (NAC). These parameters and the receptor statuses of each patient were used to construct cross-validated predictive models to classify patients as complete pathological responders (pCR) or non-complete pathological responders (non-pCR) to NAC. Model performance was evaluated using area under the ROC curve (AUC). The AUC for receptor status alone was 0.62, while the best performance using predictors from the LRRM, NRRM, LTM, and NLTM were AUCs of 0.79, 0.55, 0.60, and 0.59 respectively. This suggests that the LRRM can be used to predict response to NAC in breast cancer.

  3. Pharmacophore Modelling and 4D-QSAR Study of Ruthenium(II) Arene Complexes as Anticancer Agents (Inhibitors) by Electron Conformational- Genetic Algorithm Method.

    Science.gov (United States)

    Yavuz, Sevtap Caglar; Sabanci, Nazmiye; Saripinar, Emin

    2018-01-01

    The EC-GA method was employed in this study as a 4D-QSAR method, for the identification of the pharmacophore (Pha) of ruthenium(II) arene complex derivatives and quantitative prediction of activity. The arrangement of the computed geometric and electronic parameters for atoms and bonds of each compound occurring in a matrix is known as the electron-conformational matrix of congruity (ECMC). It contains the data from HF/3-21G level calculations. Compounds were represented by a group of conformers for each compound rather than a single conformation, known as fourth dimension to generate the model. ECMCs were compared within a certain range of tolerance values by using the EMRE program and the responsible pharmacophore group for ruthenium(II) arene complex derivatives was found. For selecting the sub-parameter which had the most effect on activity in the series and the calculation of theoretical activity values, the non-linear least square method and genetic algorithm which are included in the EMRE program were used. In addition, compounds were classified as the training and test set and the accuracy of the models was tested by cross-validation statistically. The model for training and test sets attained by the optimum 10 parameters gave highly satisfactory results with R2 training= 0.817, q 2=0.718 and SEtraining=0.066, q2 ext1 = 0.867, q2 ext2 = 0.849, q2 ext3 =0.895, ccctr = 0.895, ccctest = 0.930 and cccall = 0.905. Since there is no 4D-QSAR research on metal based organic complexes in the literature, this study is original and gives a powerful tool to the design of novel and selective ruthenium(II) arene complexes. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  4. Advanced differential quadrature methods

    CERN Document Server

    Zong, Zhi

    2009-01-01

    Modern Tools to Perform Numerical DifferentiationThe original direct differential quadrature (DQ) method has been known to fail for problems with strong nonlinearity and material discontinuity as well as for problems involving singularity, irregularity, and multiple scales. But now researchers in applied mathematics, computational mechanics, and engineering have developed a range of innovative DQ-based methods to overcome these shortcomings. Advanced Differential Quadrature Methods explores new DQ methods and uses these methods to solve problems beyond the capabilities of the direct DQ method.After a basic introduction to the direct DQ method, the book presents a number of DQ methods, including complex DQ, triangular DQ, multi-scale DQ, variable order DQ, multi-domain DQ, and localized DQ. It also provides a mathematical compendium that summarizes Gauss elimination, the Runge-Kutta method, complex analysis, and more. The final chapter contains three codes written in the FORTRAN language, enabling readers to q...

  5. Inflow Turbulence Generation Methods

    Science.gov (United States)

    Wu, Xiaohua

    2017-01-01

    Research activities on inflow turbulence generation methods have been vigorous over the past quarter century, accompanying advances in eddy-resolving computations of spatially developing turbulent flows with direct numerical simulation, large-eddy simulation (LES), and hybrid Reynolds-averaged Navier-Stokes-LES. The weak recycling method, rooted in scaling arguments on the canonical incompressible boundary layer, has been applied to supersonic boundary layer, rough surface boundary layer, and microscale urban canopy LES coupled with mesoscale numerical weather forecasting. Synthetic methods, originating from analytical approximation to homogeneous isotropic turbulence, have branched out into several robust methods, including the synthetic random Fourier method, synthetic digital filtering method, synthetic coherent eddy method, and synthetic volume forcing method. This article reviews major progress in inflow turbulence generation methods with an emphasis on fundamental ideas, key milestones, representative applications, and critical issues. Directions for future research in the field are also highlighted.

  6. Calculation of Five Thermodynamic Molecular Descriptors by Means of a General Computer Algorithm Based on the Group-Additivity Method: Standard Enthalpies of Vaporization, Sublimation and Solvation, and Entropy of Fusion of Ordinary Organic Molecules and Total Phase-Change Entropy of Liquid Crystals.

    Science.gov (United States)

    Naef, Rudolf; Acree, William E

    2017-06-25

    The calculation of the standard enthalpies of vaporization, sublimation and solvation of organic molecules is presented using a common computer algorithm on the basis of a group-additivity method. The same algorithm is also shown to enable the calculation of their entropy of fusion as well as the total phase-change entropy of liquid crystals. The present method is based on the complete breakdown of the molecules into their constituting atoms and their immediate neighbourhood; the respective calculations of the contribution of the atomic groups by means of the Gauss-Seidel fitting method is based on experimental data collected from literature. The feasibility of the calculations for each of the mentioned descriptors was verified by means of a 10-fold cross-validation procedure proving the good to high quality of the predicted values for the three mentioned enthalpies and for the entropy of fusion, whereas the predictive quality for the total phase-change entropy of liquid crystals was poor. The goodness of fit ( Q ²) and the standard deviation (σ) of the cross-validation calculations for the five descriptors was as follows: 0.9641 and 4.56 kJ/mol ( N = 3386 test molecules) for the enthalpy of vaporization, 0.8657 and 11.39 kJ/mol ( N = 1791) for the enthalpy of sublimation, 0.9546 and 4.34 kJ/mol ( N = 373) for the enthalpy of solvation, 0.8727 and 17.93 J/mol/K ( N = 2637) for the entropy of fusion and 0.5804 and 32.79 J/mol/K ( N = 2643) for the total phase-change entropy of liquid crystals. The large discrepancy between the results of the two closely related entropies is discussed in detail. Molecules for which both the standard enthalpies of vaporization and sublimation were calculable, enabled the estimation of their standard enthalpy of fusion by simple subtraction of the former from the latter enthalpy. For 990 of them the experimental enthalpy-of-fusion values are also known, allowing their comparison with predictions, yielding a correlation coefficient R

  7. Calculation of Five Thermodynamic Molecular Descriptors by Means of a General Computer Algorithm Based on the Group-Additivity Method: Standard Enthalpies of Vaporization, Sublimation and Solvation, and Entropy of Fusion of Ordinary Organic Molecules and Total Phase-Change Entropy of Liquid Crystals

    Directory of Open Access Journals (Sweden)

    Rudolf Naef

    2017-06-01

    Full Text Available The calculation of the standard enthalpies of vaporization, sublimation and solvation of organic molecules is presented using a common computer algorithm on the basis of a group-additivity method. The same algorithm is also shown to enable the calculation of their entropy of fusion as well as the total phase-change entropy of liquid crystals. The present method is based on the complete breakdown of the molecules into their constituting atoms and their immediate neighbourhood; the respective calculations of the contribution of the atomic groups by means of the Gauss-Seidel fitting method is based on experimental data collected from literature. The feasibility of the calculations for each of the mentioned descriptors was verified by means of a 10-fold cross-validation procedure proving the good to high quality of the predicted values for the three mentioned enthalpies and for the entropy of fusion, whereas the predictive quality for the total phase-change entropy of liquid crystals was poor. The goodness of fit (Q2 and the standard deviation (σ of the cross-validation calculations for the five descriptors was as follows: 0.9641 and 4.56 kJ/mol (N = 3386 test molecules for the enthalpy of vaporization, 0.8657 and 11.39 kJ/mol (N = 1791 for the enthalpy of sublimation, 0.9546 and 4.34 kJ/mol (N = 373 for the enthalpy of solvation, 0.8727 and 17.93 J/mol/K (N = 2637 for the entropy of fusion and 0.5804 and 32.79 J/mol/K (N = 2643 for the total phase-change entropy of liquid crystals. The large discrepancy between the results of the two closely related entropies is discussed in detail. Molecules for which both the standard enthalpies of vaporization and sublimation were calculable, enabled the estimation of their standard enthalpy of fusion by simple subtraction of the former from the latter enthalpy. For 990 of them the experimental enthalpy-of-fusion values are also known, allowing their comparison with predictions, yielding a correlation

  8. Methods of nonlinear analysis

    CERN Document Server

    Bellman, Richard Ernest

    1970-01-01

    In this book, we study theoretical and practical aspects of computing methods for mathematical modelling of nonlinear systems. A number of computing techniques are considered, such as methods of operator approximation with any given accuracy; operator interpolation techniques including a non-Lagrange interpolation; methods of system representation subject to constraints associated with concepts of causality, memory and stationarity; methods of system representation with an accuracy that is the best within a given class of models; methods of covariance matrix estimation;methods for low-rank mat

  9. Consumer Behavior Research Methods

    DEFF Research Database (Denmark)

    Chrysochou, Polymeros

    2017-01-01

    This chapter starts by distinguishing consumer behavior research methods based on the type of data used, being either secondary or primary. Most consumer behavior research studies phenomena that require researchers to enter the field and collect data on their own, and therefore the chapter...... emphasizes the discussion of primary research methods. Based on the nature of the data primary research methods are further distinguished into qualitative and quantitative. The chapter describes the most important and popular qualitative and quantitative methods. It concludes with an overall evaluation...... of the methods and how to improve quality in consumer behavior research methods....

  10. A multifactorial analysis of obesity as CVD risk factor: Use of neural network based methods in a nutrigenetics context

    Directory of Open Access Journals (Sweden)

    Valavanis Ioannis K

    2010-09-01

    Full Text Available Abstract Background Obesity is a multifactorial trait, which comprises an independent risk factor for cardiovascular disease (CVD. The aim of the current work is to study the complex etiology beneath obesity and identify genetic variations and/or factors related to nutrition that contribute to its variability. To this end, a set of more than 2300 white subjects who participated in a nutrigenetics study was used. For each subject a total of 63 factors describing genetic variants related to CVD (24 in total, gender, and nutrition (38 in total, e.g. average daily intake in calories and cholesterol, were measured. Each subject was categorized according to body mass index (BMI as normal (BMI ≤ 25 or overweight (BMI > 25. Two artificial neural network (ANN based methods were designed and used towards the analysis of the available data. These corresponded to i a multi-layer feed-forward ANN combined with a parameter decreasing method (PDM-ANN, and ii a multi-layer feed-forward ANN trained by a hybrid method (GA-ANN which combines genetic algorithms and the popular back-propagation training algorithm. Results PDM-ANN and GA-ANN were comparatively assessed in terms of their ability to identify the most important factors among the initial 63 variables describing genetic variations, nutrition and gender, able to classify a subject into one of the BMI related classes: normal and overweight. The methods were designed and evaluated using appropriate training and testing sets provided by 3-fold Cross Validation (3-CV resampling. Classification accuracy, sensitivity, specificity and area under receiver operating characteristics curve were utilized to evaluate the resulted predictive ANN models. The most parsimonious set of factors was obtained by the GA-ANN method and included gender, six genetic variations and 18 nutrition-related variables. The corresponding predictive model was characterized by a mean accuracy equal of 61.46% in the 3-CV testing sets

  11. A multifactorial analysis of obesity as CVD risk factor: use of neural network based methods in a nutrigenetics context.

    Science.gov (United States)

    Valavanis, Ioannis K; Mougiakakou, Stavroula G; Grimaldi, Keith A; Nikita, Konstantina S

    2010-09-08

    Obesity is a multifactorial trait, which comprises an independent risk factor for cardiovascular disease (CVD). The aim of the current work is to study the complex etiology beneath obesity and identify genetic variations and/or factors related to nutrition that contribute to its variability. To this end, a set of more than 2300 white subjects who participated in a nutrigenetics study was used. For each subject a total of 63 factors describing genetic variants related to CVD (24 in total), gender, and nutrition (38 in total), e.g. average daily intake in calories and cholesterol, were measured. Each subject was categorized according to body mass index (BMI) as normal (BMI ≤ 25) or overweight (BMI > 25). Two artificial neural network (ANN) based methods were designed and used towards the analysis of the available data. These corresponded to i) a multi-layer feed-forward ANN combined with a parameter decreasing method (PDM-ANN), and ii) a multi-layer feed-forward ANN trained by a hybrid method (GA-ANN) which combines genetic algorithms and the popular back-propagation training algorithm. PDM-ANN and GA-ANN were comparatively assessed in terms of their ability to identify the most important factors among the initial 63 variables describing genetic variations, nutrition and gender, able to classify a subject into one of the BMI related classes: normal and overweight. The methods were designed and evaluated using appropriate training and testing sets provided by 3-fold Cross Validation (3-CV) resampling. Classification accuracy, sensitivity, specificity and area under receiver operating characteristics curve were utilized to evaluate the resulted predictive ANN models. The most parsimonious set of factors was obtained by the GA-ANN method and included gender, six genetic variations and 18 nutrition-related variables. The corresponding predictive model was characterized by a mean accuracy equal of 61.46% in the 3-CV testing sets. The ANN based methods revealed factors

  12. A method for managing re-identification risk from small geographic areas in Canada

    Directory of Open Access Journals (Sweden)

    Neisa Angelica

    2010-04-01

    Full Text Available Abstract Background A common disclosure control practice for health datasets is to identify small geographic areas and either suppress records from these small areas or aggregate them into larger ones. A recent study provided a method for deciding when an area is too small based on the uniqueness criterion. The uniqueness criterion stipulates that an the area is no longer too small when the proportion of unique individuals on the relevant variables (the quasi-identifiers approaches zero. However, using a uniqueness value of zero is quite a stringent threshold, and is only suitable when the risks from data disclosure are quite high. Other uniqueness thresholds that have been proposed for health data are 5% and 20%. Methods We estimated uniqueness for urban Forward Sortation Areas (FSAs by using the 2001 long form Canadian census data representing 20% of the population. We then constructed two logistic regression models to predict when the uniqueness is greater than the 5% and 20% thresholds, and validated their predictive accuracy using 10-fold cross-validation. Predictor variables included the population size of the FSA and the maximum number of possible values on the quasi-identifiers (the number of equivalence classes. Results All model parameters were significant and the models had very high prediction accuracy, with specificity above 0.9, and sensitivity at 0.87 and 0.74 for the 5% and 20% threshold models respectively. The application of the models was illustrated with an analysis of the Ontario newborn registry and an emergency department dataset. At the higher thresholds considerably fewer records compared to the 0% threshold would be considered to be in small areas and therefore undergo disclosure control actions. We have also included concrete guidance for data custodians in deciding which one of the three uniqueness thresholds to use (0%, 5%, 20%, depending on the mitigating controls that the data recipients have in place, the

  13. Comparison of catchment grouping methods for flow duration curve estimation at ungauged sites in France

    Directory of Open Access Journals (Sweden)

    E. Sauquet

    2011-08-01

    physiographic and/or climatic variables and the two parameters of the EOF model. Results on percentile estimation in cross validation show that a significant benefit is obtained by defining homogeneous regions before developing regressions, particularly when grouping methods make use of hydrogeological information.

  14. Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images.

    Science.gov (United States)

    Wang, Hongkai; Zhou, Zongwei; Li, Yingci; Chen, Zhonghua; Lu, Peiou; Wang, Wenzhi; Liu, Wanyu; Yu, Lijuan

    2017-12-01

    This study aimed to compare one state-of-the-art deep learning method and four classical machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer (NSCLC) from 18 F-FDG PET/CT images. Another objective was to compare the discriminative power of the recently popular PET/CT texture features with the widely used diagnostic features such as tumor size, CT value, SUV, image contrast, and intensity standard deviation. The four classical machine learning methods included random forests, support vector machines, adaptive boosting, and artificial neural network. The deep learning method was the convolutional neural networks (CNN). The five methods were evaluated using 1397 lymph nodes collected from PET/CT images of 168 patients, with corresponding pathology analysis results as gold standard. The comparison was conducted using 10 times 10-fold cross-validation based on the criterion of sensitivity, specificity, accuracy (ACC), and area under the ROC curve (AUC). For each classical method, different input features were compared to select the optimal feature set. Based on the optimal feature set, the classical methods were compared with CNN, as well as with human doctors from our institute. For the classical methods, the diagnostic features resulted in 81~85% ACC and 0.87~0.92 AUC, which were significantly higher than the results of texture features. CNN's sensitivity, specificity, ACC, and AUC were 84, 88, 86, and 0.91, respectively. There was no significant difference between the results of CNN and the best classical method. The sensitivity, specificity, and ACC of human doctors were 73, 90, and 82, respectively. All the five machine learning methods had higher sensitivities but lower specificities than human doctors. The present study shows that the performance of CNN is not significantly different from the best classical methods and human doctors for classifying mediastinal lymph node metastasis of NSCLC from PET/CT images

  15. Association between mammogram density and background parenchymal enhancement of breast MRI

    Science.gov (United States)

    Aghaei, Faranak; Danala, Gopichandh; Wang, Yunzhi; Zarafshani, Ali; Qian, Wei; Liu, Hong; Zheng, Bin

    2018-02-01

    Breast density has been widely considered as an important risk factor for breast cancer. The purpose of this study is to examine the association between mammogram density results and background parenchymal enhancement (BPE) of breast MRI. A dataset involving breast MR images was acquired from 65 high-risk women. Based on mammography density (BIRADS) results, the dataset was divided into two groups of low and high breast density cases. The Low-Density group has 15 cases with mammographic density (BIRADS 1 and 2), while the High-density group includes 50 cases, which were rated by radiologists as mammographic density BIRADS 3 and 4. A computer-aided detection (CAD) scheme was applied to segment and register breast regions depicted on sequential images of breast MRI scans. CAD scheme computed 20 global BPE features from the entire two breast regions, separately from the left and right breast region, as well as from the bilateral difference between left and right breast regions. An image feature selection method namely, CFS method, was applied to remove the most redundant features and select optimal features from the initial feature pool. Then, a logistic regression classifier was built using the optimal features to predict the mammogram density from the BPE features. Using a leave-one-case-out validation method, the classifier yields the accuracy of 82% and area under ROC curve, AUC=0.81+/-0.09. Also, the box-plot based analysis shows a negative association between mammogram density results and BPE features in the MRI images. This study demonstrated a negative association between mammogram density and BPE of breast MRI images.

  16. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    International Nuclear Information System (INIS)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L.; Vassiou, K.

    2015-01-01

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST cluster , average of minimum distance—AMINDIST cluster ) and the area overlap measure (AOM cluster ). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross-validation

  17. Dissolution Methods Database

    Data.gov (United States)

    U.S. Department of Health & Human Services — For a drug product that does not have a dissolution test method in the United States Pharmacopeia (USP), the FDA Dissolution Methods Database provides information on...

  18. The three circle method

    International Nuclear Information System (INIS)

    Garncarek, Z.

    1989-01-01

    The three circle method in its general form is presented. The method is especially useful for investigation of shapes of agglomerations of objects. An example of its applications to investigation of galaxies distribution is given. 17 refs. (author)

  19. Design Methods in Practice

    DEFF Research Database (Denmark)

    Jensen, Torben Elgaard; Andreasen, Mogens Myrup

    2010-01-01

    The paper challenges the dominant and widespread view that a good design method will guarantee a systematic approach as well as certain results. First, it explores the substantial differences between on the one hand the conception of methods implied in Pahl & Beitz’s widely recognized text book...... on engineering design, and on the other hand the understanding of method use, which has emerged from micro-sociological studies of practice (ethnomethodology). Second, it reviews a number of case studies conducted by engineering students, who were instructed to investigate the actual use of design methods...... in Danish companies. The paper concludes that design methods in practice deviate substantially from Pahl & Beitz’s description of method use: The object and problems, which are the starting points for method use, are more contested and less given than generally assumed; The steps of methods are often...

  20. Advances in Numerical Methods

    CERN Document Server

    Mastorakis, Nikos E

    2009-01-01

    Features contributions that are focused on significant aspects of current numerical methods and computational mathematics. This book carries chapters that advanced methods and various variations on known techniques that can solve difficult scientific problems efficiently.

  1. Basic Finite Element Method

    International Nuclear Information System (INIS)

    Lee, Byeong Hae

    1992-02-01

    This book gives descriptions of basic finite element method, which includes basic finite element method and data, black box, writing of data, definition of VECTOR, definition of matrix, matrix and multiplication of matrix, addition of matrix, and unit matrix, conception of hardness matrix like spring power and displacement, governed equation of an elastic body, finite element method, Fortran method and programming such as composition of computer, order of programming and data card and Fortran card, finite element program and application of nonelastic problem.

  2. Conformable variational iteration method

    Directory of Open Access Journals (Sweden)

    Omer Acan

    2017-02-01

    Full Text Available In this study, we introduce the conformable variational iteration method based on new defined fractional derivative called conformable fractional derivative. This new method is applied two fractional order ordinary differential equations. To see how the solutions of this method, linear homogeneous and non-linear non-homogeneous fractional ordinary differential equations are selected. Obtained results are compared the exact solutions and their graphics are plotted to demonstrate efficiency and accuracy of the method.

  3. VALUATION METHODS- LITERATURE REVIEW

    OpenAIRE

    Dorisz Talas

    2015-01-01

    This paper is a theoretical overview of the often used valuation methods with the help of which the value of a firm or its equity is calculated. Many experts (including Aswath Damodaran, Guochang Zhang and CA Hozefa Natalwala) classify the methods. The basic models are based on discounted cash flows. The main method uses the free cash flow for valuation, but there are some newer methods that reveal and correct the weaknesses of the traditional models. The valuation of flexibility of managemen...

  4. Mixed methods research.

    Science.gov (United States)

    Halcomb, Elizabeth; Hickman, Louise

    2015-04-08

    Mixed methods research involves the use of qualitative and quantitative data in a single research project. It represents an alternative methodological approach, combining qualitative and quantitative research approaches, which enables nurse researchers to explore complex phenomena in detail. This article provides a practical overview of mixed methods research and its application in nursing, to guide the novice researcher considering a mixed methods research project.

  5. Possibilities of roentgenological method

    International Nuclear Information System (INIS)

    Sivash, Eh.S.; Sal'man, M.M.

    1980-01-01

    Literary and experimental data on estimating possibilities of roentgenologic investigations using an electron optical amplifier, X-ray television and roentgen cinematography are generalized. Different methods of studying gastro-intestinal tract are compared. The advantage of the roentgenologic method over the endoscopic method after stomach resection is shown [ru

  6. The Generalized Sturmian Method

    DEFF Research Database (Denmark)

    Avery, James Emil

    2011-01-01

    these ideas clearly so that they become more accessible. By bringing together these non-standard methods, the book intends to inspire graduate students, postdoctoral researchers and academics to think of novel approaches. Is there a method out there that we have not thought of yet? Can we design a new method...... generations of researchers were left to work out how to achieve this ambitious goal for molecular systems of ever-increasing size. This book focuses on non-mainstream methods to solve the molecular electronic Schrödinger equation. Each method is based on a set of core ideas and this volume aims to explain...

  7. Mimetic discretization methods

    CERN Document Server

    Castillo, Jose E

    2013-01-01

    To help solve physical and engineering problems, mimetic or compatible algebraic discretization methods employ discrete constructs to mimic the continuous identities and theorems found in vector calculus. Mimetic Discretization Methods focuses on the recent mimetic discretization method co-developed by the first author. Based on the Castillo-Grone operators, this simple mimetic discretization method is invariably valid for spatial dimensions no greater than three. The book also presents a numerical method for obtaining corresponding discrete operators that mimic the continuum differential and

  8. DOE methods compendium

    International Nuclear Information System (INIS)

    Leasure, C.S.

    1992-01-01

    The Department of Energy (DOE) has established an analytical methods compendium development program to integrate its environmental analytical methods. This program is administered through DOE's Laboratory Management Division (EM-563). The primary objective of this program is to assemble a compendium of analytical chemistry methods of known performance for use by all DOE Environmental Restoration and Waste Management program. This compendium will include methods for sampling, field screening, fixed analytical laboratory and mobile analytical laboratory analyses. It will also include specific guidance on the proper selection of appropriate sampling and analytical methods in using specific analytical requirements

  9. A new method for class prediction based on signed-rank algorithms applied to Affymetrix® microarray experiments

    Directory of Open Access Journals (Sweden)

    Vassal Aurélien

    2008-01-01

    Full Text Available Abstract Background The huge amount of data generated by DNA chips is a powerful basis to classify various pathologies. However, constant evolution of microarray technology makes it difficult to mix data from different chip types for class prediction of limited sample populations. Affymetrix® technology provides both a quantitative fluorescence signal and a decision (detection call: absent or present based on signed-rank algorithms applied to several hybridization repeats of each gene, with a per-chip normalization. We developed a new prediction method for class belonging based on the detection call only from recent Affymetrix chip type. Biological data were obtained by hybridization on U133A, U133B and U133Plus 2.0 microarrays of purified normal B cells and cells from three independent groups of multiple myeloma (MM patients. Results After a call-based data reduction step to filter out non class-discriminative probe sets, the gene list obtained was reduced to a predictor with correction for multiple testing by iterative deletion of probe sets that sequentially improve inter-class comparisons and their significance. The error rate of the method was determined using leave-one-out and 5-fold cross-validation. It was successfully applied to (i determine a sex predictor with the normal donor group classifying gender with no error in all patient groups except for male MM samples with a Y chromosome deletion, (ii predict the immunoglobulin light and heavy chains expressed by the malignant myeloma clones of the validation group and (iii predict sex, light and heavy chain nature for every new patient. Finally, this method was shown powerful when compared to the popular classification method Prediction Analysis of Microarray (PAM. Conclusion This normalization-free method is routinely used for quality control and correction of collection errors in patient reports to clinicians. It can be easily extended to multiple class prediction suitable with

  10. Multiple triangulation and collaborative research using qualitative methods to explore decision making in pre-hospital emergency care

    Directory of Open Access Journals (Sweden)

    Maxine Johnson

    2017-01-01

    Full Text Available Abstract Background Paramedics make important and increasingly complex decisions at scene about patient care. Patient safety implications of influences on decision making in the pre-hospital setting were previously under-researched. Cutting edge perspectives advocate exploring the whole system rather than individual influences on patient safety. Ethnography (the study of people and cultures has been acknowledged as a suitable method for identifying health care issues as they occur within the natural context. In this paper we compare multiple methods used in a multi-site, qualitative study that aimed to identify system influences on decision making. Methods The study was conducted in three NHS Ambulance Trusts in England and involved researchers from each Trust working alongside academic researchers. Exploratory interviews with key informants e.g. managers (n = 16 and document review provided contextual information. Between October 2012 and July 2013 researchers observed 34 paramedic shifts and ten paramedics provided additional accounts via audio-recorded ‘digital diaries’ (155 events. Three staff focus groups (total n = 21 and three service user focus groups (total n = 23 explored a range of experiences and perceptions. Data collection and analysis was carried out by academic and ambulance service researchers as well as service users. Workshops were held at each site to elicit feedback on the findings and facilitate prioritisation of issues identified. Results The use of a multi-method qualitative approach allowed cross-validation of important issues for ambulance service staff and service users. A key factor in successful implementation of the study was establishing good working relationships with academic and ambulance service teams. Enrolling at least one research lead at each site facilitated the recruitment process as well as study progress. Active involvement with the study allowed ambulance service researchers and service

  11. Methods for assessing geodiversity

    Science.gov (United States)

    Zwoliński, Zbigniew; Najwer, Alicja; Giardino, Marco

    2017-04-01

    The accepted systematics of geodiversity assessment methods will be presented in three categories: qualitative, quantitative and qualitative-quantitative. Qualitative methods are usually descriptive methods that are suited to nominal and ordinal data. Quantitative methods use a different set of parameters and indicators to determine the characteristics of geodiversity in the area being researched. Qualitative-quantitative methods are a good combination of the collection of quantitative data (i.e. digital) and cause-effect data (i.e. relational and explanatory). It seems that at the current stage of the development of geodiversity research methods, qualitative-quantitative methods are the most advanced and best assess the geodiversity of the study area. Their particular advantage is the integration of data from different sources and with different substantive content. Among the distinguishing features of the quantitative and qualitative-quantitative methods for assessing geodiversity are their wide use within geographic information systems, both at the stage of data collection and data integration, as well as numerical processing and their presentation. The unresolved problem for these methods, however, is the possibility of their validation. It seems that currently the best method of validation is direct filed confrontation. Looking to the next few years, the development of qualitative-quantitative methods connected with cognitive issues should be expected, oriented towards ontology and the Semantic Web.

  12. Automatic staging of bladder cancer on CT urography

    Science.gov (United States)

    Garapati, Sankeerth S.; Hadjiiski, Lubomir M.; Cha, Kenny H.; Chan, Heang-Ping; Caoili, Elaine M.; Cohan, Richard H.; Weizer, Alon; Alva, Ajjai; Paramagul, Chintana; Wei, Jun; Zhou, Chuan

    2016-03-01

    Correct staging of bladder cancer is crucial for the decision of neoadjuvant chemotherapy treatment and minimizing the risk of under- or over-treatment. Subjectivity and variability of clinicians in utilizing available diagnostic information may lead to inaccuracy in staging bladder cancer. An objective decision support system that merges the information in a predictive model based on statistical outcomes of previous cases and machine learning may assist clinicians in making more accurate and consistent staging assessments. In this study, we developed a preliminary method to stage bladder cancer. With IRB approval, 42 bladder cancer cases with CTU scans were collected from patient files. The cases were classified into two classes based on pathological stage T2, which is the decision threshold for neoadjuvant chemotherapy treatment (i.e. for stage >=T2) clinically. There were 21 cancers below stage T2 and 21 cancers at stage T2 or above. All 42 lesions were automatically segmented using our auto-initialized cascaded level sets (AI-CALS) method. Morphological features were extracted, which were selected and merged by linear discriminant analysis (LDA) classifier. A leave-one-case-out resampling scheme was used to train and test the classifier using the 42 lesions. The classification accuracy was quantified using the area under the ROC curve (Az). The average training Az was 0.97 and the test Az was 0.85. The classifier consistently selected the lesion volume, a gray level feature and a contrast feature. This predictive model shows promise for assisting in assessing the bladder cancer stage.

  13. Cross-Validation of the Spanish HP-Version of the Jefferson Scale of Empathy Confirmed with Some Cross-Cultural Differences.

    Science.gov (United States)

    Alcorta-Garza, Adelina; San-Martín, Montserrat; Delgado-Bolton, Roberto; Soler-González, Jorge; Roig, Helena; Vivanco, Luis

    2016-01-01

    Medical educators agree that empathy is essential for physicians' professionalism. The Health Professional Version of the Jefferson Scale of Empathy (JSE-HP) was developed in response to a need for a psychometrically sound instrument to measure empathy in the context of patient care. Although extensive support for its validity and reliability is available, the authors recognize the necessity to examine psychometrics of the JSE-HP in different socio-cultural contexts to assure the psychometric soundness of this instrument. The first aim of this study was to confirm its psychometric properties in the cross-cultural context of Spain and Latin American countries. The second aim was to measure the influence of social and cultural factors on the development of medical empathy in health practitioners. The original English version of the JSE-HP was translated into International Spanish using back-translation procedures. The Spanish version of the JSE-HP was administered to 896 physicians from Spain and 13 Latin American countries. Data were subjected to exploratory factor analysis using principal component analysis (PCA) with oblique rotation (promax) to allow for correlation among the resulting factors, followed by a second analysis, using confirmatory factor analysis (CFA). Two theoretical models, one based on the English JSE-HP and another on the first Spanish student version of the JSE (JSE-S), were tested. Demographic variables were compared using group comparisons. A total of 715 (80%) surveys were returned fully completed. Cronbach's alpha coefficient of the JSE for the entire sample was 0.84. The psychometric properties of the Spanish JSE-HP matched those of the original English JSE-HP. However, the Spanish JSE-S model proved more appropriate than the original English model for the sample in this study. Group comparisons among physicians classified by gender, medical specialties, cultural and cross-cultural backgrounds yielded statistically significant differences (p cross-cultural differences described could open gates for further lines of medical education research.

  14. Cross-validation of the factorial structure of the Neighborhood Environment Walkability Scale (NEWS) and its abbreviated form (NEWS-A)

    Science.gov (United States)

    The Neighborhood Environment Walkability Scale (NEWS) and its abbreviated form (NEWS-A) assess perceived environmental attributes believed to influence physical activity. A multilevel confirmatory factor analysis (MCFA) conducted on a sample from Seattle, WA, showed that, at the respondent level, th...

  15. Statistical parametric maps of {sup 18}F-FDG PET and 3-D autoradiography in the rat brain: a cross-validation study

    Energy Technology Data Exchange (ETDEWEB)

    Prieto, Elena; Marti-Climent, Josep M. [Clinica Universidad de Navarra, Nuclear Medicine Department, Pamplona (Spain); Collantes, Maria; Molinet, Francisco [Center for Applied Medical Research (CIMA) and Clinica Universidad de Navarra, Small Animal Imaging Research Unit, Pamplona (Spain); Delgado, Mercedes; Garcia-Garcia, Luis; Pozo, Miguel A. [Universidad Complutense de Madrid, Brain Mapping Unit, Madrid (Spain); Juri, Carlos [Center for Applied Medical Research (CIMA), Movement Disorders Group, Neurosciences Division, Pamplona (Spain); Clinica Universidad de Navarra, Department of Neurology and Neurosurgery, Pamplona (Spain); Centro de Investigacion Biomedica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Pamplona (Spain); Pontificia Universidad Catolica de Chile, Department of Neurology, Santiago (Chile); Fernandez-Valle, Maria E. [Universidad Complutense de Madrid, MRI Research Center, Madrid (Spain); Gago, Belen [Center for Applied Medical Research (CIMA), Movement Disorders Group, Neurosciences Division, Pamplona (Spain); Centro de Investigacion Biomedica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Pamplona (Spain); Obeso, Jose A. [Center for Applied Medical Research (CIMA), Movement Disorders Group, Neurosciences Division, Pamplona (Spain); Clinica Universidad de Navarra, Department of Neurology and Neurosurgery, Pamplona (Spain); Centro de Investigacion Biomedica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Pamplona (Spain); Penuelas, Ivan [Clinica Universidad de Navarra, Nuclear Medicine Department, Pamplona (Spain); Center for Applied Medical Research (CIMA) and Clinica Universidad de Navarra, Small Animal Imaging Research Unit, Pamplona (Spain)

    2011-12-15

    Although specific positron emission tomography (PET) scanners have been developed for small animals, spatial resolution remains one of the most critical technical limitations, particularly in the evaluation of the rodent brain. The purpose of the present study was to examine the reliability of voxel-based statistical analysis (Statistical Parametric Mapping, SPM) applied to {sup 18}F-fluorodeoxyglucose (FDG) PET images of the rat brain, acquired on a small animal PET not specifically designed for rodents. The gold standard for the validation of the PET results was the autoradiography of the same animals acquired under the same physiological conditions, reconstructed as a 3-D volume and analysed using SPM. Eleven rats were studied under two different conditions: conscious or under inhalatory anaesthesia during {sup 18}F-FDG uptake. All animals were studied in vivo under both conditions in a dedicated small animal Philips MOSAIC PET scanner and magnetic resonance images were obtained for subsequent spatial processing. Then, rats were randomly assigned to a conscious or anaesthetized group for postmortem autoradiography, and slices from each animal were aligned and stacked to create a 3-D autoradiographic volume. Finally, differences in {sup 18}F-FDG uptake between conscious and anaesthetized states were assessed from PET and autoradiography data by SPM analysis and results were compared. SPM results of PET and 3-D autoradiography are in good agreement and led to the detection of consistent cortical differences between the conscious and anaesthetized groups, particularly in the bilateral somatosensory cortices. However, SPM analysis of 3-D autoradiography also highlighted differences in the thalamus that were not detected with PET. This study demonstrates that any difference detected with SPM analysis of MOSAIC PET images of rat brain is detected also by the gold standard autoradiographic technique, confirming that this methodology provides reliable results, although partial volume effects might make it difficult to detect slight differences in small regions. (orig.)

  16. Cross-validation of IASI/MetOp derived tropospheric δD with TES and ground-based FTIR observations

    Science.gov (United States)

    Lacour, J.-L.; Clarisse, L.; Worden, J.; Schneider, M.; Barthlott, S.; Hase, F.; Risi, C.; Clerbaux, C.; Hurtmans, D.; Coheur, P.-F.

    2015-03-01

    The Infrared Atmospheric Sounding Interferometer (IASI) flying onboard MetOpA and MetOpB is able to capture fine isotopic variations of the HDO to H2O ratio (δD) in the troposphere. Such observations at the high spatio-temporal resolution of the sounder are of great interest to improve our understanding of the mechanisms controlling humidity in the troposphere. In this study we aim to empirically assess the validity of our error estimation previously evaluated theoretically. To achieve this, we compare IASI δD retrieved profiles with other available profiles of δD, from the TES infrared sounder onboard AURA and from three ground-based FTIR stations produced within the MUSICA project: the NDACC (Network for the Detection of Atmospheric Composition Change) sites Kiruna and Izaña, and the TCCON site Karlsruhe, which in addition to near-infrared TCCON spectra also records mid-infrared spectra. We describe the achievable level of agreement between the different retrievals and show that these theoretical errors are in good agreement with empirical differences. The comparisons are made at different locations from tropical to Arctic latitudes, above sea and above land. Generally IASI and TES are similarly sensitive to δD in the free troposphere which allows one to compare their measurements directly. At tropical latitudes where IASI's sensitivity is lower than that of TES, we show that the agreement improves when taking into account the sensitivity of IASI in the TES retrieval. For the comparison IASI-FTIR only direct comparisons are performed because the sensitivity profiles of the two observing systems do not allow to take into account their differences of sensitivity. We identify a quasi negligible bias in the free troposphere (-3‰) between IASI retrieved δD with the TES, which are bias corrected, but important with the ground-based FTIR reaching -47‰. We also suggest that model-satellite observation comparisons could be optimized with IASI thanks to its high spatial and temporal sampling.

  17. Sentimentality and Nostalgia in Elderly People in Bulgaria and Greece - Cross-Validity of the Questionnaire SNEP and Cross-Cultural Comparison.

    Science.gov (United States)

    Stoyanova, Stanislava Yordanova; Giannouli, Vaitsa; Gergov, Teodor Krasimirov

    2017-03-01

    Sentimentality and nostalgia are two similar psychological constructs, which play an important role in the emotional lives of elderly people who are usually focused on the past. There are two objectives of this study - making cross-cultural comparison of sentimentality and nostalgia among Bulgarian and Greek elderly people using a questionnaire, and establishing the psychometric properties of this questionnaire among Greek elderly people. Sentimentality and nostalgia in elderly people in Bulgaria and Greece were studied by means of Sentimentality and Nostalgia in Elderly People questionnaire (SNEP), created by Gergov and Stoyanova (2013). For the Greek version, one factor structure without sub-scales is proposed, while for the Bulgarian version of SNEP the factor structure had four sub-scales, besides the total score. Together with some similarities (medium level of nostalgia and sentimentality being widespread), the elderly people in Bulgaria and Greece differed cross-culturally in their sentimentality and nostalgia related to the past in direction of more increased sentimentality and nostalgia in the Bulgarian sample. Some gender and age differences revealed that the oldest male Bulgarians were the most sentimental. The psychometric properties of this questionnaire were examined for the first time in a Greek sample of elders and a trend was found for stability of sentimentality and nostalgia in elderly people that could be studied further in longitudinal studies.

  18. Fine-mapping and cross-validation of QTLs linked to fatty acid composition in multiple independent interspecific crosses of oil palm.

    Science.gov (United States)

    Ting, Ngoot-Chin; Yaakub, Zulkifli; Kamaruddin, Katialisa; Mayes, Sean; Massawe, Festo; Sambanthamurthi, Ravigadevi; Jansen, Johannes; Low, Leslie Eng Ti; Ithnin, Maizura; Kushairi, Ahmad; Arulandoo, Xaviar; Rosli, Rozana; Chan, Kuang-Lim; Amiruddin, Nadzirah; Sritharan, Kandha; Lim, Chin Ching; Nookiah, Rajanaidu; Amiruddin, Mohd Din; Singh, Rajinder

    2016-04-14

    The commercial oil palm (Elaeis guineensis Jacq.) produces a mesocarp oil (commonly called 'palm oil') with approximately equal proportions of saturated and unsaturated fatty acids (FAs). An increase in unsaturated FAs content or iodine value (IV) as a measure of the degree of unsaturation would help to open up new markets for the oil. One way to manipulate the fatty acid composition (FAC) in palm oil is through introgression of favourable alleles from the American oil palm, E. oleifera, which has a more unsaturated oil. In this study, a segregating E. oleifera x E. guineensis (OxG) hybrid population for FAC is used to identify quantitative trait loci (QTLs) linked to IV and various FAs. QTL analysis revealed 10 major and two putative QTLs for IV and six FAs, C14:0, C16:0, C16:1, C18:0, C18:1 and C18:2 distributed across six linkage groups (LGs), OT1, T2, T3, OT4, OT6 and T9. The major QTLs for IV and C16:0 on LGOT1 explained 60.0 - 69.0 % of the phenotypic trait variation and were validated in two independent BC2 populations. The genomic interval contains several key structural genes in the FA and oil biosynthesis pathways such as PATE/FATB, HIBCH, BASS2, LACS4 and DGAT1 and also a relevant transcription factor (TF), WRI1. The literature suggests that some of these genes can exhibit pleiotropic effects in the regulatory networks of these traits. Using the whole genome sequence data, markers tightly linked to the candidate genes were also developed. Clustering trait values according to the allelic forms of these candidate markers revealed significant differences in the IV and FAs of the palms in the mapping and validation crosses. The candidate gene approach described and exploited here is useful to identify the potential causal genes linked to FAC and can be adopted for marker-assisted selection (MAS) in oil palm.

  19. Refinement and cross-validation of nickel bioavailability in PNEC-Pro, a regulatory tool for site-specific risk assessment of metals in surface water.

    Science.gov (United States)

    Verschoor, Anja J; Vijver, Martina G; Vink, Jos P M

    2017-09-01

    The European Water Framework Directive prescribes that the environmental quality standards for nickel in surface waters should be based on bioavailable concentrations. Biotic ligand models (BLMs) are powerful tools to account for site-specific bioavailability within risk assessments. Several BLMs and simplified tools are available. For nickel, most of them are based on the same toxicity dataset and chemical speciation methodology as laid down in the 2008 European Union Environmental Risk Assessment Report (RAR). Since then, further insights into the toxic effects of nickel on aquatic species have been gained, and new data and methodologies have been generated and implemented using the predicted-no-effect-concentration (PNEC)-pro tool. The aim of the present study is to provide maximum transparency on data revisions and how this affects the derived environmental quality standards. A case study with 7 different ecoregions was used to determine differences in species sensitivity distributions and in hazardous concentrations for 5% of the species (HC5) values between the original Ni-RAR BLMs and the PNEC-pro BLMs. The BLM parameters used were pH dependent, which extended the applicability domain of PNEC-pro up to a pH of 8.7 for surface waters. After inclusion of additional species and adjustment for cross-species extrapolation, the HC5s were well within the prediction range of the RAR. Based on the latest data and scientific insights, transfer functions in the user-friendly PNEC-pro tool have been updated accordingly without compromising the original considerations of the Ni-RAR. Environ Toxicol Chem 2017;36:2367-2376. © 2017 SETAC. © 2017 SETAC.

  20. Sentimentality and Nostalgia in Elderly People in Bulgaria and Greece – Cross-Validity of the Questionnaire SNEP and Cross-Cultural Comparison

    Science.gov (United States)

    Stoyanova, Stanislava Yordanova; Giannouli, Vaitsa; Gergov, Teodor Krasimirov

    2017-01-01

    Sentimentality and nostalgia are two similar psychological constructs, which play an important role in the emotional lives of elderly people who are usually focused on the past. There are two objectives of this study - making cross-cultural comparison of sentimentality and nostalgia among Bulgarian and Greek elderly people using a questionnaire, and establishing the psychometric properties of this questionnaire among Greek elderly people. Sentimentality and nostalgia in elderly people in Bulgaria and Greece were studied by means of Sentimentality and Nostalgia in Elderly People questionnaire (SNEP), created by Gergov and Stoyanova (2013). For the Greek version, one factor structure without sub-scales is proposed, while for the Bulgarian version of SNEP the factor structure had four sub-scales, besides the total score. Together with some similarities (medium level of nostalgia and sentimentality being widespread), the elderly people in Bulgaria and Greece differed cross-culturally in their sentimentality and nostalgia related to the past in direction of more increased sentimentality and nostalgia in the Bulgarian sample. Some gender and age differences revealed that the oldest male Bulgarians were the most sentimental. The psychometric properties of this questionnaire were examined for the first time in a Greek sample of elders and a trend was found for stability of sentimentality and nostalgia in elderly people that could be studied further in longitudinal studies. PMID:28344678

  1. Cross-validation, predictive validity, and time course of the Benzodiazepine Dependence Self-Report Questionnaire in a benzodiazepine discontinuation trial.

    NARCIS (Netherlands)

    Oude Voshaar, R.C.; Mol, A.J.J.; Gorgels, W.J.M.J.; Breteler, M.H.M.; Balkom, A.J.L.M. van; Lisdonk, E.H. van de; Kan, C.C.; Zitman, F.G.

    2003-01-01

    The Benzodiazepine Dependence Self-Report Questionnaire (Bendep-SRQ) measures the severity of benzodiazepine (BZ) dependence on four domains: awareness of problematic use, preoccupation with the availability of BZ, lack of compliance with the therapeutic regimen, and withdrawal. Although promising

  2. Cross-validation, predictive validity and time course of the Benzodiazepine Dependence Self-Report Questionnaire (Bendep-SRQ) in a benzodiazepine discontinuation trial

    NARCIS (Netherlands)

    Oude Voshaar, R.C.; Mol, A.J.J.; Gorgels, W.J.M.J.; Breteler, M.H.M.; Balkom, A.J.L.M. van; Lisdonk, E.H. van de; Zitman, F.G.

    2003-01-01

    The Benzodiazepine Dependence Self-Report Questionnaire (Bendep-SRQ) measures the severity of benzodiazepine (BZ) dependence on four domains: awareness of problematic use, preoccupation with the availability of BZ, lack of compliance with the therapeutic regimen, and withdrawal. Although promising

  3. Statistical parametric maps of 18F-FDG PET and 3-D autoradiography in the rat brain: a cross-validation study

    International Nuclear Information System (INIS)

    Prieto, Elena; Marti-Climent, Josep M.; Collantes, Maria; Molinet, Francisco; Delgado, Mercedes; Garcia-Garcia, Luis; Pozo, Miguel A.; Juri, Carlos; Fernandez-Valle, Maria E.; Gago, Belen; Obeso, Jose A.; Penuelas, Ivan

    2011-01-01

    Although specific positron emission tomography (PET) scanners have been developed for small animals, spatial resolution remains one of the most critical technical limitations, particularly in the evaluation of the rodent brain. The purpose of the present study was to examine the reliability of voxel-based statistical analysis (Statistical Parametric Mapping, SPM) applied to 18 F-fluorodeoxyglucose (FDG) PET images of the rat brain, acquired on a small animal PET not specifically designed for rodents. The gold standard for the validation of the PET results was the autoradiography of the same animals acquired under the same physiological conditions, reconstructed as a 3-D volume and analysed using SPM. Eleven rats were studied under two different conditions: conscious or under inhalatory anaesthesia during 18 F-FDG uptake. All animals were studied in vivo under both conditions in a dedicated small animal Philips MOSAIC PET scanner and magnetic resonance images were obtained for subsequent spatial processing. Then, rats were randomly assigned to a conscious or anaesthetized group for postmortem autoradiography, and slices from each animal were aligned and stacked to create a 3-D autoradiographic volume. Finally, differences in 18 F-FDG uptake between conscious and anaesthetized states were assessed from PET and autoradiography data by SPM analysis and results were compared. SPM results of PET and 3-D autoradiography are in good agreement and led to the detection of consistent cortical differences between the conscious and anaesthetized groups, particularly in the bilateral somatosensory cortices. However, SPM analysis of 3-D autoradiography also highlighted differences in the thalamus that were not detected with PET. This study demonstrates that any difference detected with SPM analysis of MOSAIC PET images of rat brain is detected also by the gold standard autoradiographic technique, confirming that this methodology provides reliable results, although partial volume effects might make it difficult to detect slight differences in small regions. (orig.)

  4. Direct spectral analysis of tea samples using 266 nm UV pulsed laser-induced breakdown spectroscopy and cross validation of LIBS results with ICP-MS.

    Science.gov (United States)

    Gondal, M A; Habibullah, Y B; Baig, Umair; Oloore, L E

    2016-05-15

    Tea is one of the most common and popular beverages spanning vast array of cultures all over the world. The main nutritional benefits of drinking tea are its anti-oxidant properties, presumed protection against certain cancers, inhibition of inflammation and possible protective effects against diabetes. Laser induced breakdown spectrometer (LIBS) was assembled as a powerful tool for qualitative and quantitative analysis of various brands of tea samples using 266 nm pulsed UV laser. LIBS spectra for six brands of tea samples in the wavelength range of 200-900 nm was recorded and all elements present in our tea samples were identified. The major toxic elements detected in several brands of tea samples were bromine, chromium and minerals like iron, calcium, potassium and silicon. The spectral assignment was conducted prior to the determination of concentration of each element. For quantitative analysis, calibration curves were drawn for each element using standard samples prepared in known concentration in the tea matrix. The plasma parameters (electron temperature and electron density) were also determined prior to the tea samples spectroscopic analysis. The concentration of iron, chromium, potassium, bromine, copper, silicon and calcium detected in all tea samples was between 378-656, 96-124, 1421-6785, 99-1476, 17-36, 2-11 and 92-130 mg L(-1) respectively. The limits of detection estimated for Fe, Cr, K, Br, Cu, Si, Ca in tea samples were 22, 12, 14, 11, 6, 1 and 12 mg L(-1) respectively. To further confirm the accuracy of our LIBS results, we determined the concentration of each element present in tea samples by using standard analytical technique like ICP-MS. The concentrations detected with our LIBS system are in excellent agreement with ICP-MS results. The system assembled for spectral analysis in this work could be highly applicable for testing the quality and purity of food and also pharmaceuticals products. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Cross-validation Methodology between Ground and GPM Satellite-based Radar Rainfall Product over Dallas-Fort Worth (DFW) Metroplex

    Science.gov (United States)

    Chen, H.; Chandrasekar, V.; Biswas, S.

    2015-12-01

    Over the past two decades, a large number of rainfall products have been developed based on satellite, radar, and/or rain gauge observations. However, to produce optimal rainfall estimation for a given region is still challenging due to the space time variability of rainfall at many scales and the spatial and temporal sampling difference of different rainfall instruments. In order to produce high-resolution rainfall products for urban flash flood applications and improve the weather sensing capability in urban environment, the center for Collaborative Adaptive Sensing of the Atmosphere (CASA), in collaboration with National Weather Service (NWS) and North Central Texas Council of Governments (NCTCOG), has developed an urban radar remote sensing network in DFW Metroplex. DFW is the largest inland metropolitan area in the U.S., that experiences a wide range of natural weather hazards such as flash flood and hailstorms. The DFW urban remote sensing network, centered by the deployment of eight dual-polarization X-band radars and a NWS WSR-88DP radar, is expected to provide impacts-based warning and forecasts for benefit of the public safety and economy. High-resolution quantitative precipitation estimation (QPE) is one of the major goals of the development of this urban test bed. In addition to ground radar-based rainfall estimation, satellite-based rainfall products for this area are also of interest for this study. Typical example is the rainfall rate product produced by the Dual-frequency Precipitation Radar (DPR) onboard Global Precipitation Measurement (GPM) Core Observatory satellite. Therefore, cross-comparison between ground and space-based rainfall estimation is critical to building an optimal regional rainfall system, which can take advantages of the sampling differences of different sensors. This paper presents the real-time high-resolution QPE system developed for DFW urban radar network, which is based upon the combination of S-band WSR-88DP and X-band CASA radars. In addition, we focuses on the cross-comparison between rainfall estimation from this ground based QPE system and GPM rainfall products. The observations collected during the GPM satellite overpasses over DFW area will be used extensively in this study. Data alignment for better comparison will also be presented.

  6. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    Science.gov (United States)

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of

  7. Body composition estimation from selected slices: equations computed from a new semi-automatic thresholding method developed on whole-body CT scans.

    Science.gov (United States)

    Lacoste Jeanson, Alizé; Dupej, Ján; Villa, Chiara; Brůžek, Jaroslav

    2017-01-01

    Estimating volumes and masses of total body components is important for the study and treatment monitoring of nutrition and nutrition-related disorders, cancer, joint replacement, energy-expenditure and exercise physiology. While several equations have been offered for estimating total body components from MRI slices, no reliable and tested method exists for CT scans. For the first time, body composition data was derived from 41 high-resolution whole-body CT scans. From these data, we defined equations for estimating volumes and masses of total body AT and LT from corresponding tissue areas measured in selected CT scan slices. We present a new semi-automatic approach to defining the density cutoff between adipose tissue (AT) and lean tissue (LT) in such material. An intra-class correlation coefficient (ICC) was used to validate the method. The equations for estimating the whole-body composition volume and mass from areas measured in selected slices were modeled with ordinary least squares (OLS) linear regressions and support vector machine regression (SVMR). The best predictive equation for total body AT volume was based on the AT area of a single slice located between the 4th and 5th lumbar vertebrae (L4-L5) and produced lower prediction errors (|PE| = 1.86 liters, %PE = 8.77) than previous equations also based on CT scans. The LT area of the mid-thigh provided the lowest prediction errors (|PE| = 2.52 liters, %PE = 7.08) for estimating whole-body LT volume. We also present equations to predict total body AT and LT masses from a slice located at L4-L5 that resulted in reduced error compared with the previously published equations based on CT scans. The multislice SVMR predictor gave the theoretical upper limit for prediction precision of volumes and cross-validated the results.

  8. Characterization of masses in digital breast tomosynthesis: Comparison of machine learning in projection views and reconstructed slices

    International Nuclear Information System (INIS)

    Chan, Heang-Ping; Wu Yita; Sahiner, Berkman; Wei, Jun; Helvie, Mark A.; Zhang Yiheng; Moore, Richard H.; Kopans, Daniel B.; Hadjiiski, Lubomir; Way, Ted

    2010-01-01

    slices centered at the central slice were compared. For classification of masses using the PV approach, a feature extraction process similar to that described above for the DBT approach was performed on the ROIs from the individual PVs. Six feature spaces obtained from the central PV alone and by averaging the corresponding features from three to 11 PVs were formed. In each feature space for either the DBT-slice or the PV approach, a linear discriminant analysis classifier with stepwise feature selection was trained and tested using a two-loop leave-one-case-out resampling procedure. Simplex optimization was used to guide feature selection automatically within the training set in each leave-one-case-out cycle. The performance of the classifiers was evaluated by the area (A z ) under the receiver operating characteristic curve. Results: The test A z values from the DBT-slice approach ranged from 0.87±0.03 to 0.93±0.02, while those from the PV approach ranged from 0.78±0.04 to 0.84±0.04. The highest test A z of 0.93±0.02 from the nine-DBT-slice feature space was significantly (p=0.006) better than the highest test A z of 0.84±0.04 from the nine-PV feature space. Conclusion: The features of breast lesions extracted from the DBT slices consistently provided higher classification accuracy than those extracted from the PV images.

  9. MR-based synthetic CT generation using a deep convolutional neural network method.

    Science.gov (United States)

    Han, Xiao

    2017-04-01

    Interests have been rapidly growing in the field of radiotherapy to replace CT with magnetic resonance imaging (MRI), due to superior soft tissue contrast offered by MRI and the desire to reduce unnecessary radiation dose. MR-only radiotherapy also simplifies clinical workflow and avoids uncertainties in aligning MR with CT. Methods, however, are needed to derive CT-equivalent representations, often known as synthetic CT (sCT), from patient MR images for dose calculation and DRR-based patient positioning. Synthetic CT estimation is also important for PET attenuation correction in hybrid PET-MR systems. We propose in this work a novel deep convolutional neural network (DCNN) method for sCT generation and evaluate its performance on a set of brain tumor patient images. The proposed method builds upon recent developments of deep learning and convolutional neural networks in the computer vision literature. The proposed DCNN model has 27 convolutional layers interleaved with pooling and unpooling layers and 35 million free parameters, which can be trained to learn a direct end-to-end mapping from MR images to their corresponding CTs. Training such a large model on our limited data is made possible through the principle of transfer learning and by initializing model weights from a pretrained model. Eighteen brain tumor patients with both CT and T1-weighted MR images are used as experimental data and a sixfold cross-validation study is performed. Each sCT generated is compared against the real CT image of the same patient on a voxel-by-voxel basis. Comparison is also made with respect to an atlas-based approach that involves deformable atlas registration and patch-based atlas fusion. The proposed DCNN method produced a mean absolute error (MAE) below 85 HU for 13 of the 18 test subjects. The overall average MAE was 84.8 ± 17.3 HU for all subjects, which was found to be significantly better than the average MAE of 94.5 ± 17.8 HU for the atlas-based method. The DCNN

  10. Chemometric methods and near-infrared spectroscopy applied to bioenergy production

    International Nuclear Information System (INIS)

    Liebmann, B.

    2010-01-01

    data analysis (i) successfully determine the concentrations of moisture, protein, and starch in the feedstock material as well as glucose, ethanol, glycerol, lactic acid, acetic acid in the processed bioethanol broths; (ii) and allow quantifying a complex biofuel's property such as the heating value. At the third stage, this thesis focuses on new chemometric methods that improve mathematical analysis of multivariate data such as NIR spectra. The newly developed method 'repeated double cross validation' (rdCV) separates optimization of regression models from tests of model performance; furthermore, rdCV estimates the variability of the model performance based on a large number of prediction errors from test samples. The rdCV procedure has been applied to both the classical PLS regression and the robust 'partial robust M' regression method, which can handle erroneous data. The peculiar and relatively unknown 'random projection' method is tested for its potential of dimensionality reduction of data from chemometrics and chemoinformatics. The main findings are: (i) rdCV fosters a realistic assessment of model performance, (ii) robust regression has outstanding performance for data containing outliers and thus is strongly recommendable, and (iii) random projection is a useful niche application for high-dimensional data combined with possible restrictions in data storage and computing time. The three chemometric methods described are available as functions for the free software R. (author) [de

  11. Methods of Software Verification

    Directory of Open Access Journals (Sweden)

    R. E. Gurin

    2015-01-01

    Full Text Available This article is devoted to the problem of software verification (SW. Methods of software verification designed to check the software for compliance with the stated requirements such as correctness, system security and system adaptability to small changes in the environment, portability and compatibility, etc. These are various methods both by the operation process and by the way of achieving result. The article describes the static and dynamic methods of software verification and paid attention to the method of symbolic execution. In its review of static analysis are discussed and described the deductive method, and methods for testing the model. A relevant issue of the pros and cons of a particular method is emphasized. The article considers classification of test techniques for each method. In this paper we present and analyze the characteristics and mechanisms of the static analysis of dependencies, as well as their views, which can reduce the number of false positives in situations where the current state of the program combines two or more states obtained both in different paths of execution and in working with multiple object values. Dependences connect various types of software objects: single variables, the elements of composite variables (structure fields, array elements, the size of the heap areas, the length of lines, the number of initialized array elements in the verification code using static methods. The article pays attention to the identification of dependencies within the framework of the abstract interpretation, as well as gives an overview and analysis of the inference tools.Methods of dynamic analysis such as testing, monitoring and profiling are presented and analyzed. Also some kinds of tools are considered which can be applied to the software when using the methods of dynamic analysis. Based on the work a conclusion is drawn, which describes the most relevant problems of analysis techniques, methods of their solutions and

  12. Radiometric dating methods

    International Nuclear Information System (INIS)

    Bourdon, B.

    2003-01-01

    The general principle of isotope dating methods is based on the presence of radioactive isotopes in the geologic or archaeological object to be dated. The decay with time of these isotopes is used to determine the 'zero' time corresponding to the event to be dated. This paper recalls the general principle of isotope dating methods (bases, analytical methods, validation of results and uncertainties) and presents the methods based on natural radioactivity (Rb-Sr, Sm-Nd, U-Pb, Re-Os, K-Ar (Ar-Ar), U-Th-Ra- 210 Pb, U-Pa, 14 C, 36 Cl, 10 Be) and the methods based on artificial radioactivity with their applications. Finally, the methods based on irradiation damages (thermoluminescence, fission tracks, electron spin resonance) are briefly evoked. (J.S.)

  13. Performative Schizoid Method

    DEFF Research Database (Denmark)

    Svabo, Connie

    2016-01-01

    is presented and an example is provided of a first exploratory engagement with it. The method is used in a specific project Becoming Iris, making inquiry into arts-based knowledge creation during a three month visiting scholarship at a small, independent visual ar