WorldWideScience

Sample records for reliability scores applied

  1. A generic method for assignment of reliability scores applied to solvent accessibility predictions

    Directory of Open Access Journals (Sweden)

    Nielsen Morten

    2009-07-01

    Full Text Available Abstract Background Estimation of the reliability of specific real value predictions is nontrivial and the efficacy of this is often questionable. It is important to know if you can trust a given prediction and therefore the best methods associate a prediction with a reliability score or index. For discrete qualitative predictions, the reliability is conventionally estimated as the difference between output scores of selected classes. Such an approach is not feasible for methods that predict a biological feature as a single real value rather than a classification. As a solution to this challenge, we have implemented a method that predicts the relative surface accessibility of an amino acid and simultaneously predicts the reliability for each prediction, in the form of a Z-score. Results An ensemble of artificial neural networks has been trained on a set of experimentally solved protein structures to predict the relative exposure of the amino acids. The method assigns a reliability score to each surface accessibility prediction as an inherent part of the training process. This is in contrast to the most commonly used procedures where reliabilities are obtained by post-processing the output. Conclusion The performance of the neural networks was evaluated on a commonly used set of sequences known as the CB513 set. An overall Pearson's correlation coefficient of 0.72 was obtained, which is comparable to the performance of the currently best public available method, Real-SPINE. Both methods associate a reliability score with the individual predictions. However, our implementation of reliability scores in the form of a Z-score is shown to be the more informative measure for discriminating good predictions from bad ones in the entire range from completely buried to fully exposed amino acids. This is evident when comparing the Pearson's correlation coefficient for the upper 20% of predictions sorted according to reliability. For this subset, values of 0

  2. Examining the reliability of ADAS-Cog change scores.

    Science.gov (United States)

    Grochowalski, Joseph H; Liu, Ying; Siedlecki, Karen L

    2016-09-01

    The purpose of this study was to estimate and examine ways to improve the reliability of change scores on the Alzheimer's Disease Assessment Scale, Cognitive Subtest (ADAS-Cog). The sample, provided by the Alzheimer's Disease Neuroimaging Initiative, included individuals with Alzheimer's disease (AD) (n = 153) and individuals with mild cognitive impairment (MCI) (n = 352). All participants were administered the ADAS-Cog at baseline and 1 year, and change scores were calculated as the difference in scores over the 1-year period. Three types of change score reliabilities were estimated using multivariate generalizability. Two methods to increase change score reliability were evaluated: reweighting the subtests of the scale and adding more subtests. Reliability of ADAS-Cog change scores over 1 year was low for both the AD sample (ranging from .53 to .64) and the MCI sample (.39 to .61). Reweighting the change scores from the AD sample improved reliability (.68 to .76), but lengthening provided no useful improvement for either sample. The MCI change scores had low reliability, even with reweighting and adding additional subtests. The ADAS-Cog scores had low reliability for measuring change. Researchers using the ADAS-Cog should estimate and report reliability for their use of the change scores. The ADAS-Cog change scores are not recommended for assessment of meaningful clinical change.

  3. Lower bounds to the reliabilities of factor score estimators

    NARCIS (Netherlands)

    Hessen, D.J.

    2017-01-01

    Under the general common factor model, the reliabilities of factor score estimators might be of more interest than the reliability of the total score (the unweighted sum of item scores). In this paper, lower bounds to the reliabilities of Thurstone’s factor score estimators, Bartlett’s factor score

  4. Lower Bounds to the Reliabilities of Factor Score Estimators.

    Science.gov (United States)

    Hessen, David J

    2016-10-06

    Under the general common factor model, the reliabilities of factor score estimators might be of more interest than the reliability of the total score (the unweighted sum of item scores). In this paper, lower bounds to the reliabilities of Thurstone's factor score estimators, Bartlett's factor score estimators, and McDonald's factor score estimators are derived and conditions are given under which these lower bounds are equal. The relative performance of the derived lower bounds is studied using classic example data sets. The results show that estimates of the lower bounds to the reliabilities of Thurstone's factor score estimators are greater than or equal to the estimates of the lower bounds to the reliabilities of Bartlett's and McDonald's factor score estimators.

  5. Reliability Generalization: Exploring Variation of Reliability Coefficients of MMPI Clinical Scales Scores.

    Science.gov (United States)

    Vacha-Haase, Tammi; Kogan, Lori R.; Tani, Crystal R.; Woodall, Renee A.

    2001-01-01

    Used reliability generalization to explore the variance of scores on 10 Minnesota Multiphasic Personality Inventory (MMPI) clinical scales drawing on 1,972 articles in the literature on the MMPI. Results highlight the premise that scores, not tests, are reliable or unreliable, and they show that study characteristics do influence scores on the…

  6. How reliable are Functional Movement Screening scores? A systematic review of rater reliability.

    Science.gov (United States)

    Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John

    2016-05-01

    Several physical assessment protocols to identify intrinsic risk factors for injury aetiology related to movement quality have been described. The Functional Movement Screen (FMS) is a standardised, field-expedient test battery intended to assess movement quality and has been used clinically in preparticipation screening and in sports injury research. To critically appraise and summarise research investigating the reliability of scores obtained using the FMS battery. Systematic literature review. Systematic search of Google Scholar, Scopus (including ScienceDirect and PubMed), EBSCO (including Academic Search Complete, AMED, CINAHL, Health Source: Nursing/Academic Edition), MEDLINE and SPORTDiscus. Studies meeting eligibility criteria were assessed by 2 reviewers for risk of bias using the Quality Appraisal of Reliability Studies checklist. Overall quality of evidence was determined using van Tulder's levels of evidence approach. 12 studies were appraised. Overall, there was a 'moderate' level of evidence in favour of 'acceptable' (intraclass correlation coefficient ≥0.6) inter-rater and intra-rater reliability for composite scores derived from live scoring. For inter-rater reliability of composite scores derived from video recordings there was 'conflicting' evidence, and 'limited' evidence for intra-rater reliability. For inter-rater reliability based on live scoring of individual subtests there was 'moderate' evidence of 'acceptable' reliability (κ≥0.4) for 4 subtests (Deep Squat, Shoulder Mobility, Active Straight-leg Raise, Trunk Stability Push-up) and 'conflicting' evidence for the remaining 3 (Hurdle Step, In-line Lunge, Rotary Stability). This review found 'moderate' evidence that raters can achieve acceptable levels of inter-rater and intra-rater reliability of composite FMS scores when using live ratings. Overall, there were few high-quality studies, and the quality of several studies was impacted by poor study reporting particularly in relation to

  7. Inter-expert and intra-expert reliability in sleep spindle scoring

    DEFF Research Database (Denmark)

    Wendt, Sabrina Lyngbye; Welinder, Peter; Sørensen, Helge Bjarup Dissing

    2015-01-01

    Objectives To measure the inter-expert and intra-expert agreement in sleep spindle scoring, and to quantify how many experts are needed to build a reliable dataset of sleep spindle scorings. Methods The EEG dataset was comprised of 400 randomly selected 115 s segments of stage 2 sleep from 110...... with higher reliability than the estimation of spindle duration. Reliability of sleep spindle scoring can be improved by using qualitative confidence scores, rather than a dichotomous yes/no scoring system. Conclusions We estimate that 2–3 experts are needed to build a spindle scoring dataset...... with ‘substantial’ reliability (κ: 0.61–0.8), and 4 or more experts are needed to build a dataset with ‘almost perfect’ reliability (κ: 0.81–1). Significance Spindle scoring is a critical part of sleep staging, and spindles are believed to play an important role in development, aging, and diseases of the nervous...

  8. Validity and reliability of Nintendo Wii Fit balance scores.

    Science.gov (United States)

    Wikstrom, Erik A

    2012-01-01

    Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Descriptive laboratory study. Sports medicine research laboratory. Forty-five recreationally active participants (age = 27.0 ± 9.8 years, height = 170.9 ± 9.2 cm, mass = 72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Participants completed a single-limb-stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC] = 0.80) to poor (ICC = 0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with scores ranging from fair (ICC = 0.74) to poor (ICC = 0.29). Wii Fit balance activity scores had poor concurrent validity relative to COP outcomes and SEBT

  9. Validity and Reliability of Nintendo Wii Fit Balance Scores

    Science.gov (United States)

    Wikstrom, Erik A.

    2012-01-01

    Context: Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. Objective: To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Design: Descriptive laboratory study. Setting: Sports medicine research laboratory. Patients or Other Participants: Forty-five recreationally active participants (age  =  27.0 ± 9.8 years, height  =  170.9 ± 9.2 cm, mass  =  72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Intervention(s): Participants completed a single-limb–stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Main Outcome Measure(s): Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. Results: All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC]  =  0.80) to poor (ICC  =  0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with

  10. Reliable scar scoring system to assess photographs of burn patients.

    Science.gov (United States)

    Mecott, Gabriel A; Finnerty, Celeste C; Herndon, David N; Al-Mousawi, Ahmed M; Branski, Ludwik K; Hegde, Sachin; Kraft, Robert; Williams, Felicia N; Maldonado, Susana A; Rivero, Haidy G; Rodriguez-Escobar, Noe; Jeschke, Marc G

    2015-12-01

    Several scar-scoring scales exist to clinically monitor burn scar development and maturation. Although scoring scars through direct clinical examination is ideal, scars must sometimes be scored from photographs. No scar scale currently exists for the latter purpose. We modified a previously described scar scale (Yeong et al., J Burn Care Rehabil 1997) and tested the reliability of this new scale in assessing burn scars from photographs. The new scale consisted of three parameters as follows: scar height, surface appearance, and color mismatch. Each parameter was assigned a score of 1 (best) to 4 (worst), generating a total score of 3-12. Five physicians with burns training scored 120 representative photographs using the original and modified scales. Reliability was analyzed using coefficient of agreement, Cronbach alpha, intraclass correlation coefficient, variance, and coefficient of variance. Analysis of variance was performed using the Kruskal-Wallis test. Color mismatch and scar height scores were validated by analyzing actual height and color differences. The intraclass correlation coefficient, the coefficient of agreement, and Cronbach alpha were higher for the modified scale than those of the original scale. The original scale produced more variance than that in the modified scale. Subanalysis demonstrated that, for all categories, the modified scale had greater correlation and reliability than the original scale. The correlation between color mismatch scores and actual color differences was 0.84 and between scar height scores and actual height was 0.81. The modified scar scale is a simple, reliable, and useful scale for evaluating photographs of burn patients. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Reliability of a consensus-based ultrasound score for tenosynovitis in rheumatoid arthritis

    DEFF Research Database (Denmark)

    Naredo, Esperanza; D'Agostino, Maria Antonietta; Wakefield, Richard J

    2013-01-01

    OBJECTIVE: To produce consensus-based scoring systems for ultrasound (US) tenosynovitis and to assess the intraobserver and interobserver reliability of these scoring systems in rheumatoid arthritis (RA). METHODS: We undertook a Delphi process on US-defined tenosynovitis and US scoring system...... recruited. Ten rheumatologists expert in MSUS blindly, independently and consecutively scored for tenosynovitis in B-mode and PD mode three wrist extensor compartments, two finger flexor tendons and two ankle tendons of each patient in two rounds in a blinded fashion. Intraobserver reliability was assessed...... Doppler signal within the synovial sheath. The intraobserver reliability for tenosynovitis scoring on B-mode and PD mode was good (κ value 0.72 for B-mode; κ value 0.78 for PD mode). Interobserver reliability assessment showed good κ values for PD tenosynovitis scoring (first round, 0.64; second round, 0...

  12. Specific algorithm method of scoring the Clock Drawing Test applied in cognitively normal elderly

    Directory of Open Access Journals (Sweden)

    Liana Chaves Mendes-Santos

    Full Text Available The Clock Drawing Test (CDT is an inexpensive, fast and easily administered measure of cognitive function, especially in the elderly. This instrument is a popular clinical tool widely used in screening for cognitive disorders and dementia. The CDT can be applied in different ways and scoring procedures also vary. OBJECTIVE: The aims of this study were to analyze the performance of elderly on the CDT and evaluate inter-rater reliability of the CDT scored by using a specific algorithm method adapted from Sunderland et al. (1989. METHODS: We analyzed the CDT of 100 cognitively normal elderly aged 60 years or older. The CDT ("free-drawn" and Mini-Mental State Examination (MMSE were administered to all participants. Six independent examiners scored the CDT of 30 participants to evaluate inter-rater reliability. RESULTS AND CONCLUSION: A score of 5 on the proposed algorithm ("Numbers in reverse order or concentrated", equivalent to 5 points on the original Sunderland scale, was the most frequent (53.5%. The CDT specific algorithm method used had high inter-rater reliability (p<0.01, and mean score ranged from 5.06 to 5.96. The high frequency of an overall score of 5 points may suggest the need to create more nuanced evaluation criteria, which are sensitive to differences in levels of impairment in visuoconstructive and executive abilities during aging.

  13. Cross-cultural adaptation, reliability and validity of the Turkish version of the Hospital for Special Surgery (HSS) Knee Score.

    Science.gov (United States)

    Narin, Selnur; Unver, Bayram; Bakırhan, Serkan; Bozan, Ozgür; Karatosun, Vasfi

    2014-01-01

    The purpose of this study was to adapt the English version of the Hospital for Special Surgery (HSS) knee score for use in a Turkish population and to evaluate its validity, reliability and cultural adaptation. Standard forward-back translation of the HSS knee score was performed and the Turkish version was applied in 73 patients. The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), Mini-Mental State Examination and sit-to-stand test were also performed and analyzed. Internal consistency reliability was tested using Cronbach's alpha. The intraclass correlation coefficient (ICC) was used to calculate the test-retest reliability at one-week intervals. Validity was assessed by calculating the Pearson correlation between the HSS, WOMAC and sit-to-stand test scores. The ICC ranged from 0.98 to 0.99 with high internal consistency (Cronbach's alpha: 0.87). The WOMAC score correlated with total HSS score (r: -0.80, p<0.001) and sit-to-stand score (r: 0.12, p: 0.312). The Turkish version of the HSS knee score is reliable and valid in evaluating the total knee arthroplasty in Turkish patients.

  14. Scoring haemophilic arthropathy on X-rays: improving inter- and intra-observer reliability and agreement using a consensus atlas

    Energy Technology Data Exchange (ETDEWEB)

    Foppen, Wouter; Schaaf, Irene C. van der; Beek, Frederik J.A. [University Medical Center Utrecht, Department of Radiology (Netherlands); Verkooijen, Helena M. [University Medical Center Utrecht, Department of Radiology (Netherlands); University Medical Center Utrecht, Julius Center for Health Sciences and Primary Care, Utrecht (Netherlands); Fischer, Kathelijn [University Medical Center Utrecht, Julius Center for Health Sciences and Primary Care, Utrecht (Netherlands); University Medical Center Utrecht, Van Creveldkliniek, Department of Hematology, Utrecht (Netherlands)

    2016-06-15

    The radiological Pettersson score (PS) is widely applied for classification of arthropathy to evaluate costly haemophilia treatment. This study aims to assess and improve inter- and intra-observer reliability and agreement of the PS. Two series of X-rays (bilateral elbows, knees, and ankles) of 10 haemophilia patients (120 joints) with haemophilic arthropathy were scored by three observers according to the PS (maximum score 13/joint). Subsequently, (dis-)agreement in scoring was discussed until consensus. Example images were collected in an atlas. Thereafter, second series of 120 joints were scored using the atlas. One observer rescored the second series after three months. Reliability was assessed by intraclass correlation coefficients (ICC), agreement by limits of agreement (LoA). Median Pettersson score at joint level (PS{sub joint}) of affected joints was 6 (interquartile range 3-9). Using the consensus atlas, inter-observer reliability of the PS{sub joint} improved significantly from 0.94 (95 % confidence interval (CI) 0.91-0.96) to 0.97 (CI 0.96-0.98). LoA improved from ±1.7 to ±1.1 for the PS{sub joint}. Therefore, true differences in arthropathy were differences in the PS{sub joint} of >2 points. Intra-observer reliability of the PS{sub joint} was 0.98 (CI 0.97-0.98), intra-observer LoA were ±0.9 points. Reliability and agreement of the PS improved by using a consensus atlas. (orig.)

  15. Processes and Procedures for Estimating Score Reliability and Precision

    Science.gov (United States)

    Bardhoshi, Gerta; Erford, Bradley T.

    2017-01-01

    Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…

  16. A Latent Class Approach to Estimating Test-Score Reliability

    Science.gov (United States)

    van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas

    2011-01-01

    This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…

  17. Interobserver Reliability of the Total Body Score System for Quantifying Human Decomposition.

    Science.gov (United States)

    Dabbs, Gretchen R; Connor, Melissa; Bytheway, Joan A

    2016-03-01

    Several authors have tested the accuracy of the Total Body Score (TBS) method for quantifying decomposition, but none have examined the reliability of the method as a scoring system by testing interobserver error rates. Sixteen participants used the TBS system to score 59 observation packets including photographs and written descriptions of 13 human cadavers in different stages of decomposition (postmortem interval: 2-186 days). Data analysis used a two-way random model intraclass correlation in SPSS (v. 17.0). The TBS method showed "almost perfect" agreement between observers, with average absolute correlation coefficients of 0.990 and average consistency correlation coefficients of 0.991. While the TBS method may have sources of error, scoring reliability is not one of them. Individual component scores were examined, and the influences of education and experience levels were investigated. Overall, the trunk component scores were the least concordant. Suggestions are made to improve the reliability of the TBS method. © 2016 American Academy of Forensic Sciences.

  18. Reliable change indices and standardized regression-based change score norms for evaluating neuropsychological change in children with epilepsy.

    Science.gov (United States)

    Busch, Robyn M; Lineweaver, Tara T; Ferguson, Lisa; Haut, Jennifer S

    2015-06-01

    Reliable change indices (RCIs) and standardized regression-based (SRB) change score norms permit evaluation of meaningful changes in test scores following treatment interventions, like epilepsy surgery, while accounting for test-retest reliability, practice effects, score fluctuations due to error, and relevant clinical and demographic factors. Although these methods are frequently used to assess cognitive change after epilepsy surgery in adults, they have not been widely applied to examine cognitive change in children with epilepsy. The goal of the current study was to develop RCIs and SRB change score norms for use in children with epilepsy. Sixty-three children with epilepsy (age range: 6-16; M=10.19, SD=2.58) underwent comprehensive neuropsychological evaluations at two time points an average of 12 months apart. Practice effect-adjusted RCIs and SRB change score norms were calculated for all cognitive measures in the battery. Practice effects were quite variable across the neuropsychological measures, with the greatest differences observed among older children, particularly on the Children's Memory Scale and Wisconsin Card Sorting Test. There was also notable variability in test-retest reliabilities across measures in the battery, with coefficients ranging from 0.14 to 0.92. Reliable change indices and SRB change score norms for use in assessing meaningful cognitive change in children following epilepsy surgery are provided for measures with reliability coefficients above 0.50. This is the first study to provide RCIs and SRB change score norms for a comprehensive neuropsychological battery based on a large sample of children with epilepsy. Tables to aid in evaluating cognitive changes in children who have undergone epilepsy surgery are provided for clinical use. An Excel sheet to perform all relevant calculations is also available to interested clinicians or researchers. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. The scoring of arousal in sleep: reliability, validity, and alternatives.

    Science.gov (United States)

    Bonnet, Michael H; Doghramji, Karl; Roehrs, Timothy; Stepanski, Edward J; Sheldon, Stephen H; Walters, Arthur S; Wise, Merrill; Chesson, Andrew L

    2007-03-15

    The reliability and validity of EEG arousals and other types of arousal are reviewed. Brief arousals during sleep had been observed for many years, but the evolution of sleep medicine in the 1980s directed new attention to these events. Early studies at that time in animals and humans linked brief EEG arousals and associated fragmentation of sleep to daytime sleepiness and degraded performance. Increasing interest in scoring of EEG arousals led the ASDA to publish a scoring manual in 1992. The current review summarizes numerous studies that have examined scoring reliability for these EEG arousals. Validity of EEG arousals was explored by review of studies that empirically varied arousals and found deficits similar to those found after total sleep deprivation depending upon the rate and extent of sleep fragmentation. Additional data from patients with clinical sleep disorders prior to and after effective treatment has also shown a continuing relationship between reduction in pathology-related arousals and improved sleep and daytime function. Finally, many suggestions have been made to refine arousal scoring to include additional elements (e.g., CAP), change the time frame, or focus on other physiological responses such as heart rate or blood pressure changes. Evidence to support the reliability and validity of these measures is presented. It was concluded that the scoring of EEG arousals has added much to our understanding of the sleep process but that significant work on the neurophysiology of arousal needs to be done. Additional refinement of arousal scoring will provide improved insight into sleep pathology and recovery.

  20. Aerospace reliability applied to biomedicine.

    Science.gov (United States)

    Lalli, V. R.; Vargo, D. J.

    1972-01-01

    An analysis is presented that indicates that the reliability and quality assurance methodology selected by NASA to minimize failures in aerospace equipment can be applied directly to biomedical devices to improve hospital equipment reliability. The Space Electric Rocket Test project is used as an example of NASA application of reliability and quality assurance (R&QA) methods. By analogy a comparison is made to show how these same methods can be used in the development of transducers, instrumentation, and complex systems for use in medicine.

  1. Reliability of Lactation Assessment Tools Applied to Overweight and Obese Women.

    Science.gov (United States)

    Chapman, Donna J; Doughty, Katherine; Mullin, Elizabeth M; Pérez-Escamilla, Rafael

    2016-05-01

    The interrater reliability of lactation assessment tools has not been evaluated in overweight/obese women. This study aimed to compare the interrater reliability of 4 lactation assessment tools in this population. A convenience sample of 45 women (body mass index > 27.0) was videotaped while breastfeeding (twice daily on days 2, 4, and 7 postpartum). Three International Board Certified Lactation Consultants independently rated each videotaped session using 4 tools (Infant Breastfeeding Assessment Tool [IBFAT], modified LATCH [mLATCH], modified Via Christi [mVC], and Riordan's Tool [RT]). For each day and tool, we evaluated interrater reliability with 1-way repeated-measures analyses of variance, intraclass correlation coefficients (ICCs), and percentage absolute agreement between raters. Analyses of variance showed significant differences between raters' scores on day 2 (all scales) and day 7 (RT). Intraclass correlation coefficient values reflected good (mLATCH) to excellent reliability (IBFAT, mVC, and RT) on days 2 and 7. All day 4 ICCs reflected good reliability. The ICC for mLATCH was significantly lower than all others on day 2 and was significantly lower than IBFAT (day 7). Percentage absolute interrater agreement for scale components ranged from 31% (day 2: observable swallowing, RT) to 92% (day 7: IBFAT, fixing; and mVC, latch time). Swallowing scores on all scales had the lowest levels of interrater agreement (31%-64%). We demonstrated differences in the interrater reliability of 4 lactation assessment tools when applied to overweight/obese women, with the lowest values observed on day 4. Swallowing assessment was particularly unreliable. Researchers and clinicians using these scales should be aware of the differences in their psychometric behavior. © The Author(s) 2015.

  2. Reliability of scored patient generated subjective global assessment ...

    African Journals Online (AJOL)

    Objective: Establish the reliability of the scored Patient Generated-Subjective Global Assessment (PG-SGA) in determining nutritional status among Antiretroviral Therapy (ART) naive HIV-infected adults. Methods: A descriptive, cross sectional study among outpatient medical clinics, in The AIDS Support Organization ...

  3. Interrater reliability of Violence Risk Appraisal Guide scores provided in Canadian criminal proceedings.

    Science.gov (United States)

    Edens, John F; Penson, Brittany N; Ruchensky, Jared R; Cox, Jennifer; Smith, Shannon Toney

    2016-12-01

    Published research suggests that most violence risk assessment tools have relatively high levels of interrater reliability, but recent evidence of inconsistent scores among forensic examiners in adversarial settings raises concerns about the "field reliability" of such measures. This study specifically examined the reliability of Violence Risk Appraisal Guide (VRAG) scores in Canadian criminal cases identified in the legal database, LexisNexis. Over 250 reported cases were located that made mention of the VRAG, with 42 of these cases containing 2 or more scores that could be submitted to interrater reliability analyses. Overall, scores were skewed toward higher risk categories. The intraclass correlation (ICCA1) was .66, with pairs of forensic examiners placing defendants into the same VRAG risk "bin" in 68% of the cases. For categorical risk statements (i.e., low, moderate, high), examiners provided converging assessment results in most instances (86%). In terms of potential predictors of rater disagreement, there was no evidence for adversarial allegiance in our sample. Rater disagreement in the scoring of 1 VRAG item (Psychopathy Checklist-Revised; Hare, 2003), however, strongly predicted rater disagreement in the scoring of the VRAG (r = .58). (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  4. Development and Reliability of a Preliminary Foot Osteoarthritis Magnetic Resonance Imaging Score.

    Science.gov (United States)

    Halstead, Jill; Martín-Hervás, Carmen; Hensor, Elizabeth M A; McGonagle, Dennis; Keenan, Anne-Maree; Redmond, Anthony C; Conaghan, Philip G

    2017-08-01

    Foot osteoarthritis (OA) is very common but underinvestigated musculoskeletal condition and there is little consensus as to common magnetic resonance imaging (MRI) features. The aim of this study was to develop a preliminary foot OA MRI score (FOAMRIS) and evaluate its reliability. This preliminary semiquantitative score included the hindfoot, midfoot, and metatarsophalangeal joints. Joints were scored for joint space narrowing (JSN; 0-3), osteophytes (0-3), joint effusion/synovitis, and bone cysts (present/absent). Erosions and bone marrow lesions (BML) were scored (0-3) and BML were evaluated adjacent to entheses and at sub-tendon sites (present/absent). Additionally, tenosynovitis (0-3) and midfoot ligament pathology (present/absent) were scored. Reliability was evaluated in 15 people with foot pain and MRI-detected OA using 3.0T MRI multi-sequence protocols, and assessed using ICC as an overall score and per anatomical site. Intrareader agreement (ICC) was generally good to excellent across the foot in joint features (JSN 0.90, osteophytes 0.90, effusion/synovitis 0.46, cysts 0.87), bone features (BML 0.83, erosion 0.66, BML entheses 0.66, BML sub-tendon 0.60) and soft tissue features (tenosynovitis 0.83, ligaments 0.77). Interreader agreement was lower for joint features (JSN 0.43, osteophytes 0.27, effusion/synovitis 0.02, cysts 0.48), bone features (BML 0.68, erosion 0.00, BML entheses 0.34, BML sub-tendon 0.13), and soft tissue features (tenosynovitis 0.35, ligaments 0.33). This preliminary FOAMRIS demonstrated good intrareader reliability and fair interreader reliability when assessing the total feature scores. Further development is required in cohorts with a range of pathologies and to assess the psychometric measurement properties.

  5. Clinical use of the ABO-Scoring Index: reliability and subtraction frequency.

    Science.gov (United States)

    Lieber, William S; Carlson, Sean K; Baumrind, Sheldon; Poulton, Donald R

    2003-10-01

    This study tested the reliability and subtraction frequency of the study model-scoring system of the American Board of Orthodontists (ABO). We used a sample of 36 posttreatment study models that were selected randomly from six different orthodontic offices. Intrajudge and interjudge reliability was calculated using nonparametric statistics (Spearman rank coefficient, Wilcoxon, Kruskal-Wallis, and Mann-Whitney tests). We found differences ranging from 3 to 6 subtraction points (total score) for intrajudge scoring between two sessions. For overall total ABO score, the average correlation was .77. Intrajudge correlation was greatest for occlusal relationships and least for interproximal contacts. Interjudge correlation for ABO score averaged r = .85. Correlation was greatest for buccolingual inclination and least for overjet. The data show that some judges, on average, were much more lenient than others and that this resulted in a range of total scores between 19.7 and 27.5. Most of the deductions were found in the buccal segments and most were related to the second molars. We present these findings in the context of clinicians preparing for the ABO phase III examination and for orthodontists in their ongoing evaluation of clinical results.

  6. Increasing the reliability of the fluid/crystallized difference score from the Kaufman Adolescent and Adult Intelligence Test with reliable component analysis.

    Science.gov (United States)

    Caruso, J C

    2001-06-01

    The unreliability of difference scores is a well documented phenomenon in the social sciences and has led researchers and practitioners to interpret differences cautiously, if at all. In the case of the Kaufman Adult and Adolescent Intelligence Test (KAIT), the unreliability of the difference between the Fluid IQ and the Crystallized IQ is due to the high correlation between the two scales. The consequences of the lack of precision with which differences are identified are wide confidence intervals and unpowerful significance tests (i.e., large differences are required to be declared statistically significant). Reliable component analysis (RCA) was performed on the subtests of the KAIT in order to address these problems. RCA is a new data reduction technique that results in uncorrelated component scores with maximum proportions of reliable variance. Results indicate that the scores defined by RCA have discriminant and convergent validity (with respect to the equally weighted scores) and that differences between the scores, derived from a single testing session, were more reliable than differences derived from equal weighting for each age group (11-14 years, 15-34 years, 35-85+ years). This reliability advantage results in narrower confidence intervals around difference scores and smaller differences required for statistical significance.

  7. Revised scoring and improved reliability for the Communication Patterns Questionnaire.

    Science.gov (United States)

    Crenshaw, Alexander O; Christensen, Andrew; Baucom, Donald H; Epstein, Norman B; Baucom, Brian R W

    2017-07-01

    The Communication Patterns Questionnaire (CPQ; Christensen, 1987) is a widely used self-report measure of couple communication behavior and is well validated for assessing the demand/withdraw interaction pattern, which is a robust predictor of poor relationship and individual outcomes (Schrodt, Witt, & Shimkowski, 2014). However, no studies have examined the CPQ's factor structure using analytic techniques sufficient by modern standards, nor have any studies replicated the factor structure using additional samples. Further, the current scoring system uses fewer than half of the total items for its 4 subscales, despite the existence of unused items that have content conceptually consistent with those subscales. These characteristics of the CPQ have likely contributed to findings that subscale scores are often troubled by suboptimal psychometric properties such as low internal reliability (e.g., Christensen, Eldridge, Catta-Preta, Lim, & Santagata, 2006). The present study uses exploratory and confirmatory factor analyses on 4 samples to reexamine the factor structure of the CPQ to improve scale score reliability and to determine if including more items in the subscales is warranted. Results indicate that a 3-factor solution (constructive communication and 2 demand/withdraw scales) provides the best fit for the data. That factor structure was confirmed in the replication samples. Compared with the original scales, the revised scales include additional items that expand the conceptual range of the constructs, substantially improve reliability of scale scores, and demonstrate stronger associations with relationship satisfaction and sensitivity to change in therapy. Implications for research and treatment are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  8. Scoring sacroiliac joints by magnetic resonance imaging. A Multiple-reader reliability experiment

    DEFF Research Database (Denmark)

    Landewé, RB; Hermann, KG; van der Heijde, DM

    2005-01-01

    Magnetic resonance imaging (MRI) of the sacroiliac (SI) joints and the spine is increasingly important in the assessment of inflammatory activity and structural damage in clinical trials with patients with ankylosing spondylitis (AS). We investigated inter-reader reliability and sensitivity...... for 'depth' and 'intensity,' and the fifth method included the SPARCC slice with the maximum score. Inter-reader reliability was investigated by calculating intraclass correlation coefficients (ICC) for all readers together and for all possible reader pairs. Sensitivity to change was investigated...... values close to zero (no agreement) and highest observed values over 0.80 (excellent agreement). In general, agreement of status scores was somewhat better than agreement of change scores, and agreement of the comprehensive SPARCC scoring system was somewhat better than agreement of the more condensed...

  9. Feasibility and reliability of a newly developed antenatal risk score card in routine care

    NARCIS (Netherlands)

    E. Birnie; E.A.P. Steegers; Drs. H.W. Torij; M.J. Veen; J. Poeran; G.J. Bonsel

    2015-01-01

    A population-based cross-sectional study (feasibility) and a cohort study (inter-rater reliability) to study in routine care the feasibility and inter-rater reliability of the Rotterdam Reproductive Risk Reduction risk score card (R4U), a new semi-quantitative score card for use during the antenatal

  10. The Reliability and Validity of Zimbardo Time Perspective Inventory Scores in Academically Talented Adolescents

    Science.gov (United States)

    Worrell, Frank C.; Mello, Zena R.

    2007-01-01

    In this study, the authors examined the reliability, structural validity, and concurrent validity of Zimbardo Time Perspective Inventory (ZTPI) scores in a group of 815 academically talented adolescents. Reliability estimates of the purported factors' scores were in the low to moderate range. Exploratory factor analysis supported a five-factor…

  11. Reliability of the CMT neuropathy score (second version) in Charcot-Marie-Tooth disease.

    LENUS (Irish Health Repository)

    Murphy, Sinéad M

    2011-09-01

    The Charcot-Marie-Tooth neuropathy score (CMTNS) is a reliable and valid composite score comprising symptoms, signs, and neurophysiological tests, which has been used in natural history studies of CMT1A and CMT1X and as an outcome measure in treatment trials of CMT1A. Following an international workshop on outcome measures in Charcot-Marie-Tooth disease (CMT), the CMTNS was modified to attempt to reduce floor and ceiling effects and to standardize patient assessment, aiming to improve its sensitivity for detecting change over time and the effect of an intervention. After agreeing on the modifications made to the CMTNS (CMTNS2), three examiners evaluated 16 patients to determine inter-rater reliability; one examiner evaluated 18 patients twice within 8 weeks to determine intra-rater reliability. Three examiners evaluated 63 patients using the CMTNS and the CMTNS2 to determine how the modifications altered scoring. For inter- and intra-rater reliability, intra-class correlation coefficients (ICCs) were ≥0.96 for the CMT symptom score and the CMT examination score. There were small but significant differences in some of the individual components of the CMTNS compared with the CMTNS2, mainly in the components that had been modified the most. A longitudinal study is in progress to determine whether the CMTNS2 is more sensitive than the CMTNS for detecting change over time.

  12. Effects of Analytical and Holistic Scoring Patterns on Scorer Reliability in Biology Essay Tests

    Science.gov (United States)

    Ebuoh, Casmir N.

    2018-01-01

    Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…

  13. Validity and reliability of grade scoring in the diagnosis of exercise-induced laryngeal obstruction

    DEFF Research Database (Denmark)

    Walsted, Emil Schwarz; Hull, James H; Hvedstrup, Jeppe

    2017-01-01

    The current gold-standard method for diagnosing exercise-induced laryngeal obstruction (EILO) is continuous laryngoscopy during exercise (CLE), with severity classified by a visual grade scoring system. We evaluated the precision of this approach, by evaluating test-retest reliability of CLE...... grade scoring system does not appear to be a robust means for reliably classifying severity of EILO....

  14. Examiner Reliability of Fluorosis Scoring: A Comparison of Photographic and Clinical Examination Findings

    Science.gov (United States)

    Cruz-Orcutt, Noemi; Warren, John J.; Broffitt, Barbara; Levy, Steven M.; Weber-Gasparoni, Karin

    2012-01-01

    Objective To assess and compare examiner reliability of clinical and photographic fluorosis examinations using the Fluorosis Risk Index (FRI) among children in the Iowa Fluoride Study (IFS). Methods The IFS examined 538 children for fluorosis and dental caries at age 13 and obtained intra-oral photographs from nearly all of them. To assess examiner reliability, duplicate clinical examinations were conducted for 40 of the subjects. In addition, 200 of the photographs were scored independently for fluorosis by two examiners in a standardized manner. Fluorosis data were compared between examiners for the clinical exams and separately for the photographic exams, and a comparison was made between clinical and photographic exams. For all 3 comparisons, examiner reliability was assessed using kappa statistics at the tooth level. Results Inter-examiner reliability for the duplicate clinical exams on the sample of 40 subjects as measured by kappa was 0.59, while the repeat exams of the 200 photographs yielded a kappa of 0.64. For the comparison of photographic and clinical exams, inter-examiner reliability, as measured by weighted kappa, was 0.46. FRI scores obtained using the photographs were higher on average than those obtained from the clinical exams. Fluorosis prevalence was higher for photographs (33%) than found for clinical exam (18%). Conclusion Results suggest inter-examiner reliability is greater and fluorosis scores higher when using photographic compared to clinical examinations. PMID:22316120

  15. Reliable Change Indices and Standardized Regression-Based Change Score Norms for Evaluating Neuropsychological Change in Children with Epilepsy

    Science.gov (United States)

    Busch, Robyn M.; Lineweaver, Tara T.; Ferguson, Lisa; Haut, Jennifer S.

    2015-01-01

    Reliable change index scores (RCIs) and standardized regression-based change score norms (SRBs) permit evaluation of meaningful changes in test scores following treatment interventions, like epilepsy surgery, while accounting for test-retest reliability, practice effects, score fluctuations due to error, and relevant clinical and demographic factors. Although these methods are frequently used to assess cognitive change after epilepsy surgery in adults, they have not been widely applied to examine cognitive change in children with epilepsy. The goal of the current study was to develop RCIs and SRBs for use in children with epilepsy. Sixty-three children with epilepsy (age range 6–16; M=10.19, SD=2.58) underwent comprehensive neuropsychological evaluations at two time points an average of 12 months apart. Practice adjusted RCIs and SRBs were calculated for all cognitive measures in the battery. Practice effects were quite variable across the neuropsychological measures, with the greatest differences observed among older children, particularly on the Children’s Memory Scale and Wisconsin Card Sorting Test. There was also notable variability in test-retest reliabilities across measures in the battery, with coefficients ranging from 0.14 to 0.92. RCIs and SRBs for use in assessing meaningful cognitive change in children following epilepsy surgery are provided for measures with reliability coefficients above 0.50. This is the first study to provide RCIs and SRBs for a comprehensive neuropsychological battery based on a large sample of children with epilepsy. Tables to aid in evaluating cognitive changes in children who have undergone epilepsy surgery are provided for clinical use. An excel sheet to perform all relevant calculations is also available to interested clinicians or researchers. PMID:26043163

  16. How reliable are Psychopathy Checklist-Revised scores in Canadian criminal trials? A case law review.

    Science.gov (United States)

    Edens, John F; Cox, Jennifer; Smith, Shannon Toney; DeMatteo, David; Sörman, Karolina

    2015-06-01

    The Psychopathy Checklist-Revised (PCL-R; Hare, 2003) is a professional rating scale that enjoys widespread use in forensic and correctional settings, primarily as a tool to inform risk assessments in a variety of types of cases (e.g., parole determinations, sexually violent predator [SVP] civil commitment). Although widely described as "reliable and valid" in research reports, several recent field studies have suggested that PCL-R scores provided by examiners in forensic cases are significantly less reliable than the interrater reliability values reported in research studies. Most of these field studies, however, have had small samples and only examined SVP civil commitment cases. This study builds on existing research by examining the reliability of PCL-R scores provided by forensic examiners in a much more extensive sample of Canadian criminal cases. Using the LexisNexis database, we identified 102 cases in which at least 2 scores were reported (of 257 total PCL-R scores). The single-rater intraclass correlation coefficient (ICC(A1)) was .59, indicating that a large percentage of the variance in individual scores was attributable to some form of error. ICC values were somewhat higher for sexual offending cases (.66) than they were for nonsexual offending cases (.46), indicating that poor interrater reliability was not restricted specifically to the assessment of sexual offenders. These and earlier findings concerning field reliability in legal cases suggest that the standard error of measurement for PCL-R scores that are provided to the courts is likely to be much larger than the value of 2.90 reported in the instrument's manual. (c) 2015 APA, all rights reserved).

  17. Estimating the Reliability of Aggregated and Within-Person Centered Scores in Ecological Momentary Assessment

    Science.gov (United States)

    Huang, Po-Hsien; Weng, Li-Jen

    2012-01-01

    A procedure for estimating the reliability of test scores in the context of ecological momentary assessment (EMA) was proposed to take into account the characteristics of EMA measures. Two commonly used test scores in EMA were considered: the aggregated score (AGGS) and the within-person centered score (WPCS). Conceptually, AGGS and WPCS represent…

  18. A generic method for assignment of reliability scores applied to solvent accessibility predictions

    DEFF Research Database (Denmark)

    Petersen, Bent; Petersen, Thomas Nordahl; Andersen, Pernille

    2009-01-01

    : The performance of the neural networks was evaluated on a commonly used set of sequences known as the CB513 set. An overall Pearson's correlation coefficient of 0.72 was obtained, which is comparable to the performance of the currently best public available method, Real-SPINE. Both methods associate a reliability...... comparing the Pearson's correlation coefficient for the upper 20% of predictions sorted according to reliability. For this subset, values of 0.79 and 0.74 are obtained using our and the compared method, respectively. This tendency is true for any selected subset....

  19. High inter-tester reliability of the new mobility score in patients with hip fracture

    DEFF Research Database (Denmark)

    Kristensen, M.T.; Bandholm, T.; Foss, N.B.

    2008-01-01

    OBJECTIVE: To assess the inter-tester reliability of the New Mobility Score in patients with acute hip fracture. DESIGN: An inter-tester reliability study. SUBJECTS: Forty-eight consecutive patients with acute hip fracture at a median age of 84 (interquartile range, 76-89) years; 40 admitted from...... their own home and 8 from nursing homes to an acute orthopaedic hip fracture unit at a university hospital. METHODS: The New Mobility Score, which evaluates the prefracture functional level with a score from 0 (not able to walk at all) to 9 (fully independent), was assessed by 2 independent physiotherapists...... the prefracture functional level in patients with acute hip fracture Udgivelsesdato: 2008/7...

  20. Reliability, validity and sensitivity to change of neurogenic bowel dysfunction score in patients with spinal cord injury

    DEFF Research Database (Denmark)

    Erdem, D.; Hava, D.; Keskinoglu, P.

    2017-01-01

    cord injury (SCI). The reliability of NBD score was assessed by test-retest reliability and internal consistency. Cronbach's alpha coefficient was calculated to determine internal consistency. The construct validity was evaluated by exploring correlations between the NBD score and SF-36 scales, patient...... assessment of impact of NBD on quality of life (QoL) and the physician global assessment (PGA). The Global Rating of Change (GRC) scale was used to assess the change of NBD to investigate the sensitivity of the score to change. Results: Cronbach's alpha coefficient was 0.547. In test-retest reliability...

  1. The Pooling-score (P-score): inter- and intra-rater reliability in endoscopic assessment of the severity of dysphagia.

    Science.gov (United States)

    Farneti, D; Fattori, B; Nacci, A; Mancini, V; Simonelli, M; Ruoppolo, G; Genovese, E

    2014-04-01

    This study evaluated the intra- and inter-rater reliability of the Pooling score (P-score) in clinical endoscopic evaluation of severity of swallowing disorder, considering excess residue in the pharynx and larynx. The score (minimum 4 - maximum 11) is obtained by the sum of the scores given to the site of the bolus, the amount and ability to control residue/bolus pooling, the latter assessed on the basis of cough, raclage, number of dry voluntary or reflex swallowing acts ( 5). Four judges evaluated 30 short films of pharyngeal transit of 10 solid (1/4 of a cracker), 11 creamy (1 tablespoon of jam) and 9 liquid (1 tablespoon of 5 cc of water coloured with methlyene blue, 1 ml in 100 ml) boluses in 23 subjects (10 M/13 F, age from 31 to 76 yrs, mean age 58.56±11.76 years) with different pathologies. The films were randomly distributed on two CDs, which differed in terms of the sequence of the films, and were given to judges (after an explanatory session) at time 0, 24 hours later (time 1) and after 7 days (time 2). The inter- and intra-rater reliability of the P-score was calculated using the intra-class correlation coefficient (ICC; 3,k). The possibility that consistency of boluses could affect the scoring of the films was considered. The ICC for site, amount, management and the P-score total was found to be, respectively, 0.999, 0.997, 1.00 and 0.999. Clinical evaluation of a criterion of severity of a swallowing disorder remains a crucial point in the management of patients with pathologies that predispose to complications. The P-score, derived from static and dynamic parameters, yielded a very high correlation among the scores attributed by the four judges during observations carried out at different times. Bolus consistencies did not affect the outcome of the test: the analysis of variance, performed to verify if the scores attributed by the four judges to the parameters selected, might be influenced by the different consistencies of the boluses, was not

  2. Chest computed tomography-based scoring of thoracic sarcoidosis: Inter-rater reliability of CT abnormalities

    Energy Technology Data Exchange (ETDEWEB)

    Heuvel, D.A.V. den; Es, H.W. van; Heesewijk, J.P. van; Spee, M. [St. Antonius Hospital Nieuwegein, Department of Radiology, Nieuwegein (Netherlands); Jong, P.A. de [University Medical Center Utrecht, Department of Radiology, Utrecht (Netherlands); Zanen, P.; Grutters, J.C. [University Medical Center Utrecht, Division Heart and Lungs, Utrecht (Netherlands); St. Antonius Hospital Nieuwegein, Center of Interstitial Lung Diseases, Department of Pulmonology, Nieuwegein (Netherlands)

    2015-09-15

    To determine inter-rater reliability of sarcoidosis-related computed tomography (CT) findings that can be used for scoring of thoracic sarcoidosis. CT images of 51 patients with sarcoidosis were scored by five chest radiologists for various abnormal CT findings (22 in total) encountered in thoracic sarcoidosis. Using intra-class correlation coefficient (ICC) analysis, inter-rater reliability was analysed and reported according to the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) criteria. A pre-specified sub-analysis was performed to investigate the effect of training. Scoring was trained in a distinct set of 15 scans in which all abnormal CT findings were represented. Median age of the 51 patients (36 men, 70 %) was 43 years (range 26 - 64 years). All radiographic stages were present in this group. ICC ranged from 0.91 for honeycombing to 0.11 for nodular margin (sharp versus ill-defined). The ICC was above 0.60 in 13 of the 22 abnormal findings. Sub-analysis for the best-trained observers demonstrated an ICC improvement for all abnormal findings and values above 0.60 for 16 of the 22 abnormalities. In our cohort, reliability between raters was acceptable for 16 thoracic sarcoidosis-related abnormal CT findings. (orig.)

  3. Chest computed tomography-based scoring of thoracic sarcoidosis: Inter-rater reliability of CT abnormalities

    International Nuclear Information System (INIS)

    Heuvel, D.A.V. den; Es, H.W. van; Heesewijk, J.P. van; Spee, M.; Jong, P.A. de; Zanen, P.; Grutters, J.C.

    2015-01-01

    To determine inter-rater reliability of sarcoidosis-related computed tomography (CT) findings that can be used for scoring of thoracic sarcoidosis. CT images of 51 patients with sarcoidosis were scored by five chest radiologists for various abnormal CT findings (22 in total) encountered in thoracic sarcoidosis. Using intra-class correlation coefficient (ICC) analysis, inter-rater reliability was analysed and reported according to the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) criteria. A pre-specified sub-analysis was performed to investigate the effect of training. Scoring was trained in a distinct set of 15 scans in which all abnormal CT findings were represented. Median age of the 51 patients (36 men, 70 %) was 43 years (range 26 - 64 years). All radiographic stages were present in this group. ICC ranged from 0.91 for honeycombing to 0.11 for nodular margin (sharp versus ill-defined). The ICC was above 0.60 in 13 of the 22 abnormal findings. Sub-analysis for the best-trained observers demonstrated an ICC improvement for all abnormal findings and values above 0.60 for 16 of the 22 abnormalities. In our cohort, reliability between raters was acceptable for 16 thoracic sarcoidosis-related abnormal CT findings. (orig.)

  4. High inter-tester reliability of the new mobility score in patients with hip fracture

    DEFF Research Database (Denmark)

    Kristensen, M.T.; Bandholm, T.; Foss, N.B.

    2008-01-01

    OBJECTIVE: To assess the inter-tester reliability of the New Mobility Score in patients with acute hip fracture. DESIGN: An inter-tester reliability study. SUBJECTS: Forty-eight consecutive patients with acute hip fracture at a median age of 84 (interquartile range, 76-89) years; 40 admitted from...

  5. Reliability and validation of the Dutch Achilles tendon Total Rupture Score.

    Science.gov (United States)

    Opdam, K T M; Zwiers, R; Wiegerinck, J I; Kleipool, A E B; Haverlag, R; Goslings, J C; van Dijk, C N

    2018-03-01

    Patient-reported outcome measures (PROMs) have become a cornerstone for the evaluation of the effectiveness of treatment. The Achilles tendon Total Rupture Score (ATRS) is a PROM for outcome and assessment of an Achilles tendon rupture. The aim of this study was to translate the ATRS to Dutch and evaluate its reliability and validity in the Dutch population. A forward-backward translation procedure was performed according to the guidelines of cross-cultural adaptation process. The Dutch ATRS was evaluated for reliability and validity in patients treated for a total Achilles tendon rupture from 1 January 2012 to 31 December 2014 in one teaching hospital and one academic hospital. Reliability was assessed by the intraclass correlation coefficients (ICC), Cronbach's alpha and minimal detectable change (MDC). We assessed construct validity by calculation of Spearman's rho correlation coefficient with domains of the Foot and Ankle Outcome Score (FAOS), Victorian Institute of Sports Assessment-Achilles questionnaire (VISA-A) and Numeric Rating Scale (NRS) for pain in rest and during running. The Dutch ATRS had a good test-retest reliability (ICC = 0.852) and a high internal consistency (Cronbach's alpha = 0.96). MDC was 30.2 at individual level and 3.5 at group level. Construct validity was supported by 75 % of the hypothesized correlations. The Dutch ATRS had a strong correlation with NRS for pain during running (r = -0.746) and all the five subscales of the Dutch FAOS (r = 0.724-0.867). There was a moderate correlation with the VISA-A-NL (r = 0.691) and NRS for pain in rest (r = -0.580). The Dutch ATRS shows an adequate reliability and validity and can be used in the Dutch population for measuring the outcome of treatment of a total Achilles tendon rupture and for research purposes. Diagnostic study, Level I.

  6. Using Generalizability Theory to Assess the Score Reliability of Communication Skills of Dentistry Students

    Science.gov (United States)

    Uzun, N. Bilge; Aktas, Mehtap; Asiret, Semih; Yormaz, Seha

    2018-01-01

    The goal of this study is to determine the reliability of the performance points of dentistry students regarding communication skills and to examine the scoring reliability by generalizability theory in balanced random and fixed facet (mixed design) data, considering also the interactions of student, rater and duty. The study group of the research…

  7. The Danish Prostatic Symptom Score (DAN-PSS-1) questionnaire is reliable in stroke patients

    DEFF Research Database (Denmark)

    Tibaek, Sigrid; Jensen, Rigmor; Klarskov, Peter

    2006-01-01

    . The questionnaire consists of 12 questions related to lower urinary tract symptoms (LUTS). The participants were asked to state the frequency and severity of their symptoms (symptom score) and its impact on their daily life (bother score). Seventy-one stroke patients were included and 59 (83%) answered...... the questionnaire twice. The reliability test was done in two aspects: (a) detecting the frequency of each symptom and its bother factor, the scores were reduced to a two-category scale (=0, >0) and simple kappa statistics was used; (b) detecting the severity of each symptom and its bother factor, the total scale...... (kappa(w) = 0.48) to good (kappa(w) = 0.68). CONCLUSIONS: The DAN-PSS-1 questionnaire had acceptable test-retest reliability and may be suitable for measuring the frequency and severity of LUTS and its bother factor in stroke patients....

  8. The longitudinal reliability and responsiveness of the OMERACT Hand Osteoarthritis Magnetic Resonance Imaging Scoring System (HOAMRIS)

    DEFF Research Database (Denmark)

    Haugen, Ida K.; Eshed, Iris; Gandjbakhch, Frederique

    2015-01-01

    Objective. To evaluate the interreader reliability of change scores and the responsiveness of the OMERACT Hand Osteoarthritis (OA) Magnetic Resonance Image (MRI) Scoring System (HOAMRIS). Methods. Paired MRI (baseline and 5-yr followup) from 20 patients with hand OA were scored with known time se...

  9. HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species.

    Science.gov (United States)

    López, Yosvany; Nakai, Kenta; Patil, Ashwini

    2015-01-01

    HitPredict is a consolidated resource of experimentally identified, physical protein-protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein-protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of physical, genetic and predicted interactions. Automated integration of interactions is further complicated by varying levels of accuracy of database content and lack of adherence to standard formats. To address these issues, the latest version of HitPredict provides a manually curated dataset of 398 696 physical associations between 70 808 proteins from 105 species. Manual confirmation was used to resolve all issues encountered during data integration. For improved reliability assessment, this version combines a new score derived from the experimental information of the interactions with the original score based on the features of the interacting proteins. The combined interaction score performs better than either of the individual scores in HitPredict as well as the reliability score of another similar database. HitPredict provides a web interface to search proteins and visualize their interactions, and the data can be downloaded for offline analysis. Data usability has been enhanced by mapping protein identifiers across multiple reference databases. Thus, the latest version of HitPredict provides a significantly larger, more reliable and usable dataset of protein-protein interactions from several species for the study of gene groups. Database URL: http://hintdb.hgc.jp/htp. © The Author(s) 2015. Published by Oxford University Press.

  10. The Reliability and Structure of the Classroom Assessment Scoring System in German Pre-Schools

    Science.gov (United States)

    Stuck, Andrea; Kammermeyer, Gisela; Roux, Susanna

    2016-01-01

    This study examined the reliability and structure of the Classroom Assessment Scoring System (CLASS; Pianta, R. C., K. M. La Paro, and B. K. Hamre. 2008. "Classroom Assessment Scoring System. Manual Pre-K." Baltimore, MD: Brookes) and the quality of interactional processes in a German pre-school setting, drawing on a sample of 390…

  11. The Veterans Affairs Cardiac Risk Score: Recalibrating the Atherosclerotic Cardiovascular Disease Score for Applied Use.

    Science.gov (United States)

    Sussman, Jeremy B; Wiitala, Wyndy L; Zawistowski, Matthew; Hofer, Timothy P; Bentley, Douglas; Hayward, Rodney A

    2017-09-01

    Accurately estimating cardiovascular risk is fundamental to good decision-making in cardiovascular disease (CVD) prevention, but risk scores developed in one population often perform poorly in dissimilar populations. We sought to examine whether a large integrated health system can use their electronic health data to better predict individual patients' risk of developing CVD. We created a cohort using all patients ages 45-80 who used Department of Veterans Affairs (VA) ambulatory care services in 2006 with no history of CVD, heart failure, or loop diuretics. Our outcome variable was new-onset CVD in 2007-2011. We then developed a series of recalibrated scores, including a fully refit "VA Risk Score-CVD (VARS-CVD)." We tested the different scores using standard measures of prediction quality. For the 1,512,092 patients in the study, the Atherosclerotic cardiovascular disease risk score had similar discrimination as the VARS-CVD (c-statistic of 0.66 in men and 0.73 in women), but the Atherosclerotic cardiovascular disease model had poor calibration, predicting 63% more events than observed. Calibration was excellent in the fully recalibrated VARS-CVD tool, but simpler techniques tested proved less reliable. We found that local electronic health record data can be used to estimate CVD better than an established risk score based on research populations. Recalibration improved estimates dramatically, and the type of recalibration was important. Such tools can also easily be integrated into health system's electronic health record and can be more readily updated.

  12. Method of administration of PROMIS scales did not significantly impact score level, reliability, or validity

    DEFF Research Database (Denmark)

    Bjorner, Jakob B; Rose, Matthias; Gandek, Barbara

    2014-01-01

    OBJECTIVES: To test the impact of the method of administration (MOA) on score level, reliability, and validity of scales developed in the Patient Reported Outcomes Measurement Information System (PROMIS). STUDY DESIGN AND SETTING: Two nonoverlapping parallel forms each containing eight items from......, no significant mode differences were found and all confidence intervals were within the prespecified minimal important difference of 0.2 standard deviation. Parallel-forms reliabilities were very high (ICC = 0.85-0.93). Only one across-mode ICC was significantly lower than the same-mode ICC. Tests of validity...... questionnaire (PQ), personal digital assistant (PDA), or personal computer (PC) and a second form by PC, in the same administration. Method equivalence was evaluated through analyses of difference scores, intraclass correlations (ICCs), and convergent/discriminant validity. RESULTS: In difference score analyses...

  13. Reliability, Validity, and Responsiveness of InFLUenza Patient-Reported Outcome (FLU-PRO©) Scores in Influenza-Positive Patients.

    Science.gov (United States)

    Powers, John H; Bacci, Elizabeth D; Guerrero, M Lourdes; Leidy, Nancy Kline; Stringer, Sonja; Kim, Katherine; Memoli, Matthew J; Han, Alison; Fairchok, Mary P; Chen, Wei-Ju; Arnold, John C; Danaher, Patrick J; Lalani, Tahaniyat; Ridoré, Michelande; Burgess, Timothy H; Millar, Eugene V; Hernández, Andrés; Rodríguez-Zulueta, Patricia; Smolskis, Mary C; Ortega-Gallegos, Hilda; Pett, Sarah; Fischer, William; Gillor, Daniel; Macias, Laura Moreno; DuVal, Anna; Rothman, Richard; Dugas, Andrea; Ruiz-Palacios, Guillermo M

    2018-02-01

    To assess the reliability, validity, and responsiveness of InFLUenza Patient-Reported Outcome (FLU-PRO©) scores for quantifying the presence and severity of influenza symptoms. An observational prospective cohort study of adults (≥18 years) with influenza-like illness in the United States, the United Kingdom, Mexico, and South America was conducted. Participants completed the 37-item draft FLU-PRO daily for up to 14 days. Item-level and factor analyses were used to remove items and determine factor structure. Reliability of the final tool was estimated using Cronbach α and intraclass correlation coefficients (2-day reliability). Convergent and known-groups validity and responsiveness were assessed using global assessments of influenza severity and return to usual health. Of the 536 patients enrolled, 221 influenza-positive subjects comprised the analytical sample. The mean age of the patients was 40.7 years, 60.2% were women, and 59.7% were white. The final 32-item measure has six factors/domains (nose, throat, eyes, chest/respiratory, gastrointestinal, and body/systemic), with a higher order factor representing symptom severity overall (comparative fit index = 0.92; root mean square error of approximation = 0.06). Cronbach α was high (total = 0.92; domain range = 0.71-0.87); test-retest reliability (intraclass correlation coefficient, day 1-day 2) was 0.83 for total scores and 0.57 to 0.79 for domains. Day 1 FLU-PRO domain and total scores were moderately to highly correlated (≥0.30) with Patient Global Rating of Flu Severity (except nose and throat). Consistent with known-groups validity, scores differentiated severity groups on the basis of global rating (total: F = 57.2, P FLU-PRO score improvement by day 7 than did those who did not, suggesting score responsiveness. Results suggest that FLU-PRO scores are reliable, valid, and responsive to change in influenza-positive adults. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes

  14. Reliability of the Dutch translation of the Kujala Patellofemoral Score Questionnaire.

    Science.gov (United States)

    Ummels, P E J; Lenssen, A F; Barendrecht, M; Beurskens, A J H M

    2017-01-01

    There are no Dutch language disease-specific questionnaires for patients with patellofemoral pain syndrome available that could help Dutch physiotherapists to assess and monitor these symptoms and functional limitations. The aim of this study was to translate the original disease-specific Kujala Patellofemoral Score into Dutch and evaluate its reliability. The questionnaire was translated from English into Dutch in accordance with internationally recommended guidelines. Reliability was determined in 50 stable subjects with an interval of 1 week. The patient inclusion criteria were age between 14 and 60 years; knowledge of the Dutch language; and the presence of at least three of the following symptoms: pain while taking the stairs, pain when squatting, pain when running, pain when cycling, pain when sitting with knees flexed for a prolonged period, grinding of the patella and a positive clinical patella test. The internal consistency, test-retest reliability, measurement error and limits of agreement were calculated. Internal consistency was 0.78 for the first assessment and 0.80 for the second assessment. The intraclass correlation coefficient (ICC agreement ) between the first and second assessments was 0.98. The mean difference between the first and second measurements was 0.64, and standard deviation was 5.51. The standard error measurement was 3.9, and the smallest detectable change was 11. The Bland and Altman plot shows that the limits of agreement are -10.37 and 11.65. The results of the present study indicated that the test-retest reliability translated Dutch version of the Kujala Patellofemoral Score questionnaire is equivalent of the test-retest original English language version and has good internal consistency. Trial registration NTR (TC = 3258). Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  15. Possibilities and limitations of applying software reliability growth models to safety-critical software

    International Nuclear Information System (INIS)

    Kim, Man Cheol; Jang, Seung Cheol; Ha, Jae Joo

    2007-01-01

    It is generally known that software reliability growth models such as the Jelinski-Moranda model and the Goel-Okumoto's Non-Homogeneous Poisson Process (NHPP) model cannot be applied to safety-critical software due to a lack of software failure data. In this paper, by applying two of the most widely known software reliability growth models to sample software failure data, we demonstrate the possibility of using the software reliability growth models to prove the high reliability of safety-critical software. The high sensitivity of a piece of software's reliability to software failure data, as well as a lack of sufficient software failure data, is also identified as a possible limitation when applying the software reliability growth models to safety-critical software

  16. A Note on the Score Reliability for the Satisfaction with Life Scale: An RG Study

    Science.gov (United States)

    Vassar, Matt

    2008-01-01

    The purpose of the present study was to meta-analytically investigate the score reliability for the Satisfaction With Life Scale. Four-hundred and sixteen articles using the measure were located through electronic database searches and then separated to identify studies which had calculated reliability estimates from their own data. Sixty-two…

  17. A Reliability Generalization Study of Scores on Rotter's and Nowicki-Strickland's Locus of Control Scales

    Science.gov (United States)

    Beretvas, S. Natasha; Suizzo, Marie-Anne; Durham, Jennifer A.; Yarnell, Lisa M.

    2008-01-01

    The most commonly used measures of locus of control are Rotter's Internality-Externality Scale (I-E) and Nowicki and Strickland's Internality-Externality Scale (NSIE). A reliability generalization study is conducted to explore variability in I-E and NSIE score reliability. Studies are coded for aspects of the scales used (number of response…

  18. Does Changing Examiner Stations During UK Postgraduate Surgery Objective Structured Clinical Examinations Influence Examination Reliability and Candidates' Scores?

    Science.gov (United States)

    Brennan, Peter A; Croke, David T; Reed, Malcolm; Smith, Lee; Munro, Euan; Foulkes, John; Arnett, Richard

    2016-01-01

    Objective structured clinical examinations (OSCE) are widely used for summative assessment in surgery. Despite standardizing these as much as possible, variation, including examiner scoring, can occur which may affect reliability. In study of a high-stakes UK postgraduate surgical OSCE, we investigated whether examiners changing stations once during a long examining day affected marking, reliability, and overall candidates' scores compared with examiners who examined the same scenario all day. An observational study of 18,262 examiner-candidate interactions from the UK Membership of the Royal College of Surgeons examination was carried at 3 Surgical Colleges across the United Kingdom. Scores between examiners were compared using analysis of variance. Examination reliability was assessed with Cronbach's alpha, and the comparative distribution of total candidates' scores for each day was evaluated using t-tests of unit-weighted z scores. A significant difference was found in absolute scores differences awarded in the morning and afternoon sessions between examiners who changed stations at lunchtime and those who did not (p design and examiner experience in surgical OSCEs and beyond. Copyright © 2016 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.

  19. Reliability and Validity Evidence of Scores on the French Version of the Questionnaire about Interpersonal Difficulties for Adolescents

    Directory of Open Access Journals (Sweden)

    Beatriz Delgado

    2015-10-01

    Full Text Available This study examined the reliability and validity evidence drawn from the scores of the French version of the Questionnaire about Interpersonal Difficulties for Adolescents (QIDA in a sample of 957 adolescents (48.5% boys ranging in age from 11 to 18 years ('M' = 14.48, 'SD' = 1.85. A principal axis factoring (PAF and confirmatory factor analyses (CFA were performed to determine the fit of the factor structure of scores on the QIDA. PAF and CFA replicated the previously identified correlated five-factor structure of the QIDA: Assertiveness, Heterosexual Relationships, Public Speaking, Family Relationships, and Close Friendships. The QIDA yielded acceptable reliability scores for French adolescents. Validity evidence of QIDA was also established through correlations with scores on the School Anxiety Inventory and the Social Anxiety Scale for Adolescents. Most of the correlations were positive and exceeded the established criteria of statistical significance, but the magnitude of these varied according to the scales of the QIDA. Results supported the reliability and validity evidence drawn from the scores of the French version of the QIDA.

  20. Good validity and reliability of the forgotten joint score in evaluating the outcome of total knee arthroplasty

    DEFF Research Database (Denmark)

    Thomsen, Morten G; Latifi, Roshan; Kallemose, Thomas

    2016-01-01

    . We investigated the validity and reliability of the FJS. Patients and methods - A Danish version of the FJS questionnaire was created according to internationally accepted standards. 360 participants who underwent primary TKA were invited to participate in the study. Of these, 315 were included...... in a validity study and 150 in a reliability study. Correlation between the Oxford knee score (OKS) and the FJS was examined and test-retest evaluation was performed. A ceiling effect was defined as participants reaching a score within 15% of the maximum achievable score. Results - The validity study revealed...... of the FJS (ICC? 0.79). We found a high level of internal consistency (Cronbach's? = 0.96). The ceiling effect for the FJS was 16%, as compared to 37% for the OKS. Interpretation - The FJS showed good construct validity and test-retest reliability. It had a lower ceiling effect than the OKS. The FJS appears...

  1. Validity and reliability of a novel immunosuppressive adverse effects scoring system in renal transplant recipients.

    Science.gov (United States)

    Meaney, Calvin J; Arabi, Ziad; Venuto, Rocco C; Consiglio, Joseph D; Wilding, Gregory E; Tornatore, Kathleen M

    2014-06-12

    After renal transplantation, many patients experience adverse effects from maintenance immunosuppressive drugs. When these adverse effects occur, patient adherence with immunosuppression may be reduced and impact allograft survival. If these adverse effects could be prospectively monitored in an objective manner and possibly prevented, adherence to immunosuppressive regimens could be optimized and allograft survival improved. Prospective, standardized clinical approaches to assess immunosuppressive adverse effects by health care providers are limited. Therefore, we developed and evaluated the application, reliability and validity of a novel adverse effects scoring system in renal transplant recipients receiving calcineurin inhibitor (cyclosporine or tacrolimus) and mycophenolic acid based immunosuppressive therapy. The scoring system included 18 non-renal adverse effects organized into gastrointestinal, central nervous system and aesthetic domains developed by a multidisciplinary physician group. Nephrologists employed this standardized adverse effect evaluation in stable renal transplant patients using physical exam, review of systems, recent laboratory results, and medication adherence assessment during a clinic visit. Stable renal transplant recipients in two clinical studies were evaluated and received immunosuppressive regimens comprised of either cyclosporine or tacrolimus with mycophenolic acid. Face, content, and construct validity were assessed to document these adverse effect evaluations. Inter-rater reliability was determined using the Kappa statistic and intra-class correlation. A total of 58 renal transplant recipients were assessed using the adverse effects scoring system confirming face validity. Nephrologists (subject matter experts) rated the 18 adverse effects as: 3.1 ± 0.75 out of 4 (maximum) regarding clinical importance to verify content validity. The adverse effects scoring system distinguished 1.75-fold increased gastrointestinal adverse

  2. Do in-training evaluation reports deserve their bad reputations? A study of the reliability and predictive ability of ITER scores and narrative comments.

    Science.gov (United States)

    Ginsburg, Shiphra; Eva, Kevin; Regehr, Glenn

    2013-10-01

    Although scores on in-training evaluation reports (ITERs) are often criticized for poor reliability and validity, ITER comments may yield valuable information. The authors assessed across-rotation reliability of ITER scores in one internal medicine program, ability of ITER scores and comments to predict postgraduate year three (PGY3) performance, and reliability and incremental predictive validity of attendings' analysis of written comments. Numeric and narrative data from the first two years of ITERs for one cohort of residents at the University of Toronto Faculty of Medicine (2009-2011) were assessed for reliability and predictive validity of third-year performance. Twenty-four faculty attendings rank-ordered comments (without scores) such that each resident was ranked by three faculty. Mean ITER scores and comment rankings were submitted to regression analyses; dependent variables were PGY3 ITER scores and program directors' rankings. Reliabilities of ITER scores across nine rotations for 63 residents were 0.53 for both postgraduate year one (PGY1) and postgraduate year two (PGY2). Interrater reliabilities across three attendings' rankings were 0.83 for PGY1 and 0.79 for PGY2. There were strong correlations between ITER scores and comments within each year (0.72 and 0.70). Regressions revealed that PGY1 and PGY2 ITER scores collectively explained 25% of variance in PGY3 scores and 46% of variance in PGY3 rankings. Comment rankings did not improve predictions. ITER scores across multiple rotations showed decent reliability and predictive validity. Comment ranks did not add to the predictive ability, but correlation analyses suggest that trainee performance can be measured through these comments.

  3. Reliability and validity analysis of the open-source Chinese Foot and Ankle Outcome Score (FAOS).

    Science.gov (United States)

    Ling, Samuel K K; Chan, Vincent; Ho, Karen; Ling, Fona; Lui, T H

    2017-12-21

    Develop the first reliable and validated open-source outcome scoring system in the Chinese language for foot and ankle problems. Translation of the English FAOS into Chinese following regular protocols. First, two forward-translations were created separately, these were then combined into a preliminary version by an expert committee, and was subsequently back-translated into English. The process was repeated until the original and back translations were congruent. This version was then field tested on actual patients who provided feedback for modification. The final Chinese FAOS version was then tested for reliability and validity. Reliability analysis was performed on 20 subjects while validity analysis was performed on 50 subjects. Tools used to validate the Chinese FAOS were the SF36 and Pain Numeric Rating Scale (NRS). Internal consistency between the FAOS subgroups was measured using Cronbach's alpha. Spearman's correlation was calculated between each subgroup in the FAOS, SF36 and NRS. The Chinese FAOS passed both reliability and validity testing; meaning it is reliable, internally consistent and correlates positively with the SF36 and the NRS. The Chinese FAOS is a free, open-source scoring system that can be used to provide a relatively standardised outcome measure for foot and ankle studies. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Reliability of a visual scoring system with fluorescent tracers to assess dermal pesticide exposure.

    Science.gov (United States)

    Aragon, Aurora; Blanco, Luis; Lopez, Lylliam; Liden, Carola; Nise, Gun; Wesseling, Catharina

    2004-10-01

    We modified Fenske's semi-quantitative 'visual scoring system' of fluorescent tracer deposited on the skin of pesticide applicators and evaluated its reproducibility in the Nicaraguan setting. The body surface of 33 farmers, divided into 31 segments, was videotaped in the field after spraying with a pesticide solution containing a fluorescent tracer. A portable UV lamp was used for illumination in a foldaway dark room. The videos of five farmers were randomly selected. The scoring was based on a matrix with extension of fluorescent patterns (scale 0-5) on the ordinate and intensity (scale 0-5) on the abscissa, with the product of these two ranks as the final score for each body segment (0-25). Five medical students rated and evaluated the quality of 155 video images having undergone 4 h of training. Cronbach alpha coefficients and two-way random effects intraclass correlation coefficients (ICC) with absolute agreement were computed to assess inter-rater reliability. Consistency was high (Cronbach alpha = 0.96), but the scores differed substantially between raters. The overall ICC was satisfactory [0.75; 95% confidence interval (CI) = 0.62-0.83], but it was lower for intensity (0.54; 95% CI = 0.40-0.66) and higher for extension (0.80; 95% CI = 0.71-0.86). ICCs were lowest for images with low scores and evaluated as low quality, and highest for images with high scores and high quality. Inter-rater reliability coefficients indicate repeatability of the scoring system. However, field conditions for recording fluorescence should be improved to achieve higher quality images, and training should emphasize a better mechanism for the reading of body areas with low contamination.

  5. Pulmonary Exacerbation Score in Cystlc Fibrosis Patients: Reliability and Validity Testing

    OpenAIRE

    Keller, F.

    2016-01-01

    Background: Lung disease in cystic fibrosis (CF) is characterized by recurrent pulmonary exacerbations (PEs), but consensus on diagnostic criteria for PE is lacking. The use of a consistent definition of PE as an outcome measure in CF clinical trials would allow meaningful comparison across centers. The aim of this study was to assess the reliability and validity of a simplified version of the Seattle Pulmonary Exacerbation Score (SPEX). Materials and Methods: A cross-sectional observational ...

  6. A pediatric FOUR score coma scale: interrater reliability and predictive validity.

    Science.gov (United States)

    Czaikowski, Brianna L; Liang, Hong; Stewart, C Todd

    2014-04-01

    The Full Outline of UnResponsiveness (FOUR) Score is a coma scale that consists of four components (eye and motor response, brainstem reflexes, and respiration). It was originally validated among the adult population and recently in a pediatric population. To enhance clinical assessment of pediatric intensive care unit patients, including those intubated and/or sedated, at our children's hospital, we modified the FOUR Score Scale for this population. This modified scale would provide many of the same advantages as the original, such as interrater reliability, simplicity, and elimination of the verbal component that is not compatible with the Glasgow Coma Scale (GCS), creating a more valuable neurological assessment tool for the nursing community. Our goal was to potentially provide greater information than the formally used GCS when assessing critically ill, neurologically impaired patients, including those sedated and/or intubated. Experienced pediatric intensive care unit nurses were trained as "expert raters." Two different nurses assessed each subject using the Pediatric FOUR Score Scale (PFSS), GCS, and Richmond Agitation Sedation Scale at three different time points. Data were compared with the Pediatric Cerebral Performance Category (PCPC) assessed by another nurse. Our hypothesis was that the PFSS and PCPC should highly correlate and the GCS and PCPC should correlate lower. Study results show that the PFSS is excellent for interrater reliability for trained nurse-rater pairs and prediction of poor outcome and in-hospital mortality, under various situations, but there were no statistically significant differences between the PFSS and the GCS. However, the PFSS does have the potential to provide greater neurological assessment in the intubated and/or sedated patient based on the outcomes of our study.

  7. Validity and reliability of the Achilles tendon total rupture score.

    Science.gov (United States)

    Ganestam, Ann; Barfod, Kristoffer; Klit, Jakob; Troelsen, Anders

    2013-01-01

    The best treatment of acute Achilles tendon rupture remains debated. Patient-reported outcome measures have become cornerstones in treatment evaluations. The Achilles tendon total rupture score (ATRS) has been developed for this purpose but requires additional validation. The purpose of the present study was to validate a Danish translation of the ATRS. The ATRS was translated into Danish according to internationally adopted standards. Of 142 patients, 90 with previous rupture of the Achilles tendon participated in the validity study and 52 in the reliability study. The ATRS showed moderately strong correlations with the physical subscores of the Medical Outcomes Study 36-item Short-Form Health Survey (r = .70 to .75; p questionnaire (r = .71; p validity. For study and follow-up purposes, the ATRS seems reliable for comparisons of groups of patients. Its usability is limited for repeated assessment of individual patients. The development of analysis guidelines would be desirable. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  8. Spinal appearance questionnaire: factor analysis, scoring, reliability, and validity testing.

    Science.gov (United States)

    Carreon, Leah Y; Sanders, James O; Polly, David W; Sucato, Daniel J; Parent, Stefan; Roy-Beaudry, Marjolaine; Hopkins, Jeffrey; McClung, Anna; Bratcher, Kelly R; Diamond, Beverly E

    2011-08-15

    Cross sectional. This study presents the factor analysis of the Spinal Appearance Questionnaire (SAQ) and its psychometric properties. Although the SAQ has been administered to a large sample of patients with adolescent idiopathic scoliosis (AIS) treated surgically, its psychometric properties have not been fully evaluated. This study presents the factor analysis and scoring of the SAQ and evaluates its psychometric properties. The SAQ and the Scoliosis Research Society-22 (SRS-22) were administered to AIS patients who were being observed, braced or scheduled for surgery. Standard demographic data and radiographic measures including Lenke type and curve magnitude were also collected. Of the 1802 patients, 83% were female; with a mean age of 14.8 years and mean initial Cobb angle of 55.8° (range, 0°-123°). From the 32 items of the SAQ, 15 loaded on two factors with consistent and significant correlations across all Lenke types. There is an Appearance (items 1-10) and an Expectations factor (items 12-15). Responses are summed giving a range of 5 to 50 for the Appearance domain and 5 to 20 for the Expectations domain. The Cronbach's α was 0.88 for both domains and Total score with a test-retest reliability of 0.81 for Appearance and 0.91 for Expectations. Correlations with major curve magnitude were higher for the SAQ Appearance and SAQ Total scores compared to correlations between the SRS Appearance and SRS Total scores. The SAQ and SRS-22 Scores were statistically significantly different in patients who were scheduled for surgery compared to those who were observed or braced. The SAQ is a valid measure of self-image in patients with AIS with greater correlation to curve magnitude than SRS Appearance and Total score. It also discriminates between patients who require surgery from those who do not.

  9. The reliability of the McCabe score as a marker of co-morbidity in healthcare-associated infection point prevalence studies.

    Science.gov (United States)

    Reilly, J S; Coignard, B; Price, L; Godwin, J; Cairns, S; Hopkins, S; Lyytikäinen, O; Hansen, S; Malcolm, W; Hughes, G J

    2016-05-01

    This study aimed to ascertain the reliability of the McCabe score in a healthcare-associated infection point prevalence survey.   A 10 European Union Member States survey in 20 hospitals (n = 1912) indicated that there was a moderate level of agreement (κ = 0.57) with the score. The reliability of the application of the score could be increased by training data collectors, particularly with reference to the ultimately fatal criteria. This is important if the score is to be used to risk adjust data to drive infection prevention and control interventions.

  10. Attenuation of the Squared Canonical Correlation Coefficient under Varying Estimates of Score Reliability

    Science.gov (United States)

    Wilson, Celia M.

    2010-01-01

    Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability.…

  11. Reliability and Validity of Composite Scores from the NIH Toolbox Cognition Battery in Adults

    Science.gov (United States)

    Heaton, Robert K.; Akshoomoff, Natacha; Tulsky, David; Mungas, Dan; Weintraub, Sandra; Dikmen, Sureyya; Beaumont, Jennifer; Casaletto, Kaitlin B.; Conway, Kevin; Slotkin, Jerry; Gershon, Richard

    2014-01-01

    This study describes psychometric properties of the NIH Toolbox Cognition Battery (NIHTB-CB) Composite Scores in an adult sample. The NIHTB-CB was designed for use in epidemiologic studies and clinical trials for ages 3 to 85. A total of 268 self-described healthy adults were recruited at four university-based sites, using stratified sampling guidelines to target demographic variability for age (20–85 years), gender, education, and ethnicity. The NIHTB-CB contains seven computer-based instruments assessing five cognitive sub-domains: Language, Executive Function, Episodic Memory, Processing Speed, and Working Memory. Participants completed the NIHTB-CB, corresponding gold standard validation measures selected to tap the same cognitive abilities, and sociodemographic questionnaires. Three Composite Scores were derived for both the NIHTB-CB and gold standard batteries: “Crystallized Cognition Composite,” “Fluid Cognition Composite,” and “Total Cognition Composite” scores. NIHTB Composite Scores showed acceptable internal consistency (Cronbach’s alphas = 0.84 Crystallized, 0.83 Fluid, 0.77 Total), excellent test–retest reliability (r: 0.86–0.92), strong convergent (r: 0.78–0.90) and discriminant (r: 0.19–0.39) validities versus gold standard composites, and expected age effects (r = 0.18 crystallized, r = − 0.68 fluid, r = − 0.26 total). Significant relationships with self-reported prior school difficulties and current health status, employment, and presence of a disability provided evidence of external validity. The NIH Toolbox Cognition Battery Composite Scores have excellent reliability and validity, suggesting they can be used effectively in epidemiologic and clinical studies. PMID:24960398

  12. Development and Reliability of the OMERACT Thumb Base Osteoarthritis Magnetic Resonance Imaging Scoring System

    DEFF Research Database (Denmark)

    Kroon, Féline P B; Conaghan, Philip G; Foltz, Violaine

    2017-01-01

    : The TOMS assessed the first carpometacarpal (CMC-1) and scaphotrapeziotrapezoid (STT) joints for synovitis, subchondral bone defects (including erosions, cysts, and bone attrition), osteophytes, cartilage, and bone marrow lesions on a 0-3 scale (normal to severe). Subluxation was evaluated only in the CMC......, with better performance for subchondral bone defects, subluxation, and bone marrow lesions. CONCLUSION: A thumb base OA MRI scoring system has been developed. The OMERACT TOMS demonstrated good intrareader and interreader reliability. Longitudinal studies are warranted to investigate reliability of change...

  13. The Reliability of Disease Activity Score in 28 Joints-C-Reactive Protein Might Be Overestimated in a Subgroup of Rheumatoid Arthritis Patients, When the Score Is Solely Based on Subjective Parameters

    DEFF Research Database (Denmark)

    Jensen Hansen, Inger Marie; Asmussen Andreasen, Rikke; Van Bui Hansen, Mark Nam

    2017-01-01

    BACKGROUND: Disease Activity Score in 28 Joints (DAS28) is a scoring system to evaluate disease activity and treatment response in rheumatoid arthritis (RA). A DAS28 score of greater than 3.2 is a well-described limit for treatment intensification; however, the reliability of DAS28 might be overe......BACKGROUND: Disease Activity Score in 28 Joints (DAS28) is a scoring system to evaluate disease activity and treatment response in rheumatoid arthritis (RA). A DAS28 score of greater than 3.2 is a well-described limit for treatment intensification; however, the reliability of DAS28 might...... be overestimated. OBJECTIVE: The aim of this study was to evaluate the reliability of DAS28 in RA, especially focusing on a subgroup of patients with a DAS28 score of greater than 3.2. METHODS: Data from RA patients registered in the local part of Danish DANBIO Registry were collected in May 2015. Patients were....... Patients with central sensitization and psychological problems and those with false-positive diagnosis of RA are at high risk of overtreatment.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where...

  14. A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

    Science.gov (United States)

    Lee, Guemin; Park, In-Yong

    2012-01-01

    Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…

  15. [Reliability and validity of the Chinese version on Comprehensive Scores for Financial Toxicity based on the patient-reported outcome measures].

    Science.gov (United States)

    Yu, H H; Bi, X; Liu, Y Y

    2017-08-10

    Objective: To evaluate the reliability and validity of the Chinese version on comprehensive scores for financial toxicity (COST), based on the patient-reported outcome measures. Methods: A total of 118 cancer patients were face-to-face interviewed by well-trained investigators. Cronbach's α and Pearson correlation coefficient were used to evaluate reliability. Content validity index (CVI) and exploratory factor analysis (EFA) were used to evaluate the content validity and construct validity, respectively. Results: The Cronbach's α coefficient appeared as 0.889 for the whole questionnaire, with the results of test-retest were between 0.77 and 0.98. Scale-content validity index (S-CVI) appeared as 0.82, with item-content validity index (I-CVI) between 0.83 and 1.00. Two components were extracted from the Exploratory factor analysis, with cumulative rate as 68.04% and loading>0.60 on every item. Conclusion: The Chinese version of COST scale showed high reliability and good validity, thus can be applied to assess the financial situation in cancer patients.

  16. Reliability of ultrasound grading traditional score and new global OMERACT-EULAR score system (GLOESS): results from an inter- and intra-reading exercise by rheumatologists.

    Science.gov (United States)

    Ventura-Ríos, Lucio; Hernández-Díaz, Cristina; Ferrusquia-Toríz, Diana; Cruz-Arenas, Esteban; Rodríguez-Henríquez, Pedro; Alvarez Del Castillo, Ana Laura; Campaña-Parra, Alfredo; Canul, Efrén; Guerrero Yeo, Gerardo; Mendoza-Ruiz, Juan Jorge; Pérez Cristóbal, Mario; Sicsik, Sandra; Silva Luna, Karina

    2017-12-01

    This study aims to test the reliability of ultrasound to graduate synovitis in static and video images, evaluating separately grayscale and power Doppler (PD), and combined. Thirteen trained rheumatologist ultrasonographers participated in two separate rounds reading 42 images, 15 static and 27 videos, of the 7-joint count [wrist, 2nd and 3rd metacarpophalangeal (MCP), 2nd and 3rd interphalangeal (IPP), 2nd and 5th metatarsophalangeal (MTP) joints]. The images were from six patients with rheumatoid arthritis, performed by one ultrasonographer. Synovitis definition was according to OMERACT. Scoring system in grayscale, PD separately, and combined (GLOESS-Global OMERACT-EULAR Score System) were reviewed before exercise. Reliability intra- and inter-reading was calculated with Cohen's kappa weighted, according to Landis and Koch. Kappa values for inter-reading were good to excellent. The minor kappa was for GLOESS in static images, and the highest was for the same scoring in videos (k 0.59 and 0.85, respectively). Excellent values were obtained for static PD in 5th MTP joint and for PD video in 2nd MTP joint. Results for GLOESS in general were good to moderate. Poor agreement was observed in 3rd MCP and 3rd IPP in all kinds of images. Intra-reading agreement were greater in grayscale and GLOESS in static images than in videos (k 0.86 vs. 0.77 and k 0.86 vs. 0.71, respectively), but PD was greater in videos than in static images (k 1.0 vs. 0.79). The reliability of the synovitis scoring through static images and videos is in general good to moderate when using grayscale and PD separately or combined.

  17. Effect of rater training on reliability and accuracy of mini-CEX scores: a randomized, controlled trial.

    Science.gov (United States)

    Cook, David A; Dupras, Denise M; Beckman, Thomas J; Thomas, Kris G; Pankratz, V Shane

    2009-01-01

    Mini-CEX scores assess resident competence. Rater training might improve mini-CEX score interrater reliability, but evidence is lacking. Evaluate a rater training workshop using interrater reliability and accuracy. Randomized trial (immediate versus delayed workshop) and single-group pre/post study (randomized groups combined). Academic medical center. Fifty-two internal medicine clinic preceptors (31 randomized and 21 additional workshop attendees). The workshop included rater error training, performance dimension training, behavioral observation training, and frame of reference training using lecture, video, and facilitated discussion. Delayed group received no intervention until after posttest. Mini-CEX ratings at baseline (just before workshop for workshop group), and four weeks later using videotaped resident-patient encounters; mini-CEX ratings of live resident-patient encounters one year preceding and one year following the workshop; rater confidence using mini-CEX. Among 31 randomized participants, interrater reliabilities in the delayed group (baseline intraclass correlation coefficient [ICC] 0.43, follow-up 0.53) and workshop group (baseline 0.40, follow-up 0.43) were not significantly different (p = 0.19). Mean ratings were similar at baseline (delayed 4.9 [95% confidence interval 4.6-5.2], workshop 4.8 [4.5-5.1]) and follow-up (delayed 5.4 [5.0-5.7], workshop 5.3 [5.0-5.6]; p = 0.88 for interaction). For the entire cohort, rater confidence (1 = not confident, 6 = very confident) improved from mean (SD) 3.8 (1.4) to 4.4 (1.0), p = 0.018. Interrater reliability for ratings of live encounters (entire cohort) was higher after the workshop (ICC 0.34) than before (ICC 0.18) but the standard error of measurement was similar for both periods. Rater training did not improve interrater reliability or accuracy of mini-CEX scores. clinicaltrials.gov identifier NCT00667940

  18. Using the Hemophilia Joint Health Score for assessment of children: Reliability of the Spanish version.

    Science.gov (United States)

    R, Cuesta-Barriuso; A, Torres-Ortuño; S, Pérez-Alenda; J, Carrasco Juan; F, Querol; J, Nieto-Munuera; Ja, López-Pina

    2018-02-27

    Numerous measuring instruments for the evaluation of hemophilic arthropathy have been developed. One of the most used systems is the Hemophilia Joint Health Score (HJHS) given its sensitivity to clinical changes appearing in the joints because of recurrent hemarthrosis. Assessing the interrater reliability, using the Spanish version of the HJHS (version 2.1) in children with hemophilia. Reliability study to assess the interrater reliability of the Spanish version of HJHS. A sample of 36 children aged 7-13 years diagnosed with hemophilia A or B was used. Two physiotherapists performed physical assessments with the Spanish version of the HJHS. Descriptive statistics (range, mean, standard deviation) and the analysis of interrater reliability were calculated. The interrater reliability was heterogeneous since the Kappa coefficient range (ĸ), although significant (p reliability of the Spanish population version of the HJHS is high. This scale should be used generically in evaluating musculoskeletal pediatric patients with hemophilia.

  19. Intra- and inter-rater reliability of the Knee Society Knee Score when used by two physiotherapists in patients post total knee arthroplasty

    Directory of Open Access Journals (Sweden)

    S. Gopal

    2010-01-01

    Full Text Available Background and Purpose: It has yet to be shown whether routine physiotherapy plays a role in the rehabilitation of patients post totalknee arthroplasty (Rajan et al 2004. Physiotherapists should be using validoutcome measures to provide evidence of the benefit of their intervention. The aim of this study was to establish the intra and inter-rater reliability of the Knee Society Knee Score, a scoring system developed by Insall et al(1989. The Knee Society Knee Score can be used to assess the integrity of theknee joint of patients undergoing total knee arthroplasty. Since the scoreinvolves clinical testing, the intra-rater reliability of the clinician should be established prior to using the scores as datain clinical research. W here multiple clinicians are involved, inter-rater reliability should also be established.Design: This was a correlation study.Subjects: A  sample of thirty patients post total knee arthroplasty attending the arthroplasty clinic at Johannesburg Hospital between six weeks and twelve months postoperatively.M ethod: Recruited patients were evaluated twice with a time interval of one hour between each assessment. Statistical A nalysis: The intra- and inter-rater reliability were estimated using Intraclass Correlation Coefficient (ICC. R esults: The intra-rater reliability showed excellent reliability (h= 0.95 for Examiner A  and good reliability (h= 0.71for Examiner B. The inter-rater reliability showed moderate reliability (h= 0.67 during test one and h= 0.66 during test two.Conclusion: The KSKS has good intra-rater reliability when tested within a period of one hour. The KSKS demonstrated moderate agreement for inter rater reliability.

  20. Validity and Reliability of Scores Obtained on Multiple-Choice Questions: Why Functioning Distractors Matter

    Science.gov (United States)

    Ali, Syed Haris; Carr, Patrick A.; Ruit, Kenneth G.

    2016-01-01

    Plausible distractors are important for accurate measurement of knowledge via multiple-choice questions (MCQs). This study demonstrates the impact of higher distractor functioning on validity and reliability of scores obtained on MCQs. Freeresponse (FR) and MCQ versions of a neurohistology practice exam were given to four cohorts of Year 1 medical…

  1. Reliability and Validity of SERVQUAL Scores Used To Evaluate Perceptions of Library Service Quality.

    Science.gov (United States)

    Thompson, Bruce; Cook, Colleen

    Research libraries are increasingly supplementing collection counts with perceptions of service quality as indices of status and productivity. The present study was undertaken to explore the reliability and validity of scores from the SERVQUAL measurement protocol (A. Parasuraman and others, 1991), which has previously been used in this type of…

  2. Translation and validation of the new version of the Knee Society Score - The 2011 KS Score - into Brazilian Portuguese.

    Science.gov (United States)

    Silva, Adriana Lucia Pastore E; Croci, Alberto Tesconi; Gobbi, Riccardo Gomes; Hinckel, Betina Bremer; Pecora, José Ricardo; Demange, Marco Kawamura

    2017-01-01

    Translation, cultural adaptation, and validation of the new version of the Knee Society Score - The 2011 KS Score - into Brazilian Portuguese and verification of its measurement properties, reproducibility, and validity. In 2012, the new version of the Knee Society Score was developed and validated. This scale comprises four separate subscales: (a) objective knee score (seven items: 100 points); (b) patient satisfaction score (five items: 40 points); (c) patient expectations score (three items: 15 points); and (d) functional activity score (19 items: 100 points). A total of 90 patients aged 55-85 years were evaluated in a clinical cross-sectional study. The pre-operative translated version was applied to patients with TKA referral, and the post-operative translated version was applied to patients who underwent TKA. Each patient answered the same questionnaire twice and was evaluated by two experts in orthopedic knee surgery. Evaluations were performed pre-operatively and three, six, or 12 months post-operatively. The reliability of the questionnaire was evaluated using the intraclass correlation coefficient (ICC) between the two applications. Internal consistency was evaluated using Cronbach's alpha. The ICC found no difference between the means of the pre-operative, three-month, and six-month post-operative evaluations between sub-scale items. The Brazilian Portuguese version of The 2011 KS Score is a valid and reliable instrument for objective and subjective evaluation of the functionality of Brazilian patients who undergo TKA and revision TKA.

  3. Preliminary testing of the reliability and feasibility of SAGE: a system to measure and score engagement with and use of research in health policies and programs.

    Science.gov (United States)

    Makkar, Steve R; Williamson, Anna; D'Este, Catherine; Redman, Sally

    2017-12-19

    Few measures of research use in health policymaking are available, and the reliability of such measures has yet to be evaluated. A new measure called the Staff Assessment of Engagement with Evidence (SAGE) incorporates an interview that explores policymakers' research use within discrete policy documents and a scoring tool that quantifies the extent of policymakers' research use based on the interview transcript and analysis of the policy document itself. We aimed to conduct a preliminary investigation of the usability, sensitivity, and reliability of the scoring tool in measuring research use by policymakers. Nine experts in health policy research and two independent coders were recruited. Each expert used the scoring tool to rate a random selection of 20 interview transcripts, and each independent coder rated 60 transcripts. The distribution of scores among experts was examined, and then, interrater reliability was tested within and between the experts and independent coders. Average- and single-measure reliability coefficients were computed for each SAGE subscales. Experts' scores ranged from the limited to extensive scoring bracket for all subscales. Experts as a group also exhibited at least a fair level of interrater agreement across all subscales. Single-measure reliability was at least fair except for three subscales: Relevance Appraisal, Conceptual Use, and Instrumental Use. Average- and single-measure reliability among independent coders was good to excellent for all subscales. Finally, reliability between experts and independent coders was fair to excellent for all subscales. Among experts, the scoring tool was comprehensible, usable, and sensitive to discriminate between documents with varying degrees of research use. Secondly, the scoring tool yielded scores with good reliability among the independent coders. There was greater variability among experts, although as a group, the tool was fairly reliable. The alignment between experts' and independent

  4. Possibilities and Limitations of Applying Software Reliability Growth Models to Safety- Critical Software

    International Nuclear Information System (INIS)

    Kim, Man Cheol; Jang, Seung Cheol; Ha, Jae Joo

    2006-01-01

    As digital systems are gradually introduced to nuclear power plants (NPPs), the need of quantitatively analyzing the reliability of the digital systems is also increasing. Kang and Sung identified (1) software reliability, (2) common-cause failures (CCFs), and (3) fault coverage as the three most critical factors in the reliability analysis of digital systems. For the estimation of the safety-critical software (the software that is used in safety-critical digital systems), the use of Bayesian Belief Networks (BBNs) seems to be most widely used. The use of BBNs in reliability estimation of safety-critical software is basically a process of indirectly assigning a reliability based on various observed information and experts' opinions. When software testing results or software failure histories are available, we can use a process of directly estimating the reliability of the software using various software reliability growth models such as Jelinski- Moranda model and Goel-Okumoto's nonhomogeneous Poisson process (NHPP) model. Even though it is generally known that software reliability growth models cannot be applied to safety-critical software due to small number of expected failure data from the testing of safety-critical software, we try to find possibilities and corresponding limitations of applying software reliability growth models to safety critical software

  5. Examining Reliability and Validity of an Online Score (ALiEM AIR) for Rating Free Open Access Medical Education Resources.

    Science.gov (United States)

    Chan, Teresa Man-Yee; Grock, Andrew; Paddock, Michael; Kulasegaram, Kulamakan; Yarris, Lalena M; Lin, Michelle

    2016-12-01

    Since 2014, Academic Life in Emergency Medicine (ALiEM) has used the Approved Instructional Resources (AIR) score to critically appraise online content. The primary goals of this study are to determine the interrater reliability (IRR) of the ALiEM AIR rating score and determine its correlation with expert educator gestalt. We also determine the minimum number of educator-raters needed to achieve acceptable reliability. Eight educators each rated 83 online educational posts with the ALiEM AIR scale. Items include accuracy, usage of evidence-based medicine, referencing, utility, and the Best Evidence in Emergency Medicine rating score. A generalizability study was conducted to determine IRR and rating variance contributions of facets such as rater, blogs, posts, and topic. A randomized selection of 40 blog posts previously rated through ALiEM AIR was then rated again by a blinded group of expert medical educators according to their gestalt. Their gestalt impression was subsequently correlated with the ALiEM AIR score. The IRR for the ALiEM AIR rating scale was 0.81 during the 6-month pilot period. Decision studies showed that at least 9 raters were required to achieve this reliability. Spearman correlations between mean AIR score and the mean expert gestalt ratings were 0.40 for recommendation for learners and 0.35 for their colleagues. The ALiEM AIR scale is a moderately to highly reliable, 5-question tool when used by medical educators for rating online resources. The score displays a fair correlation with expert educator gestalt in regard to the quality of the resources. The score displays a fair correlation with educator gestalt. Copyright © 2016 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.

  6. The OMERACT Psoriatic Arthritis Magnetic Resonance Imaging Score (PsAMRIS) is reliable and sensitive to change: results from an OMERACT workshop

    DEFF Research Database (Denmark)

    Bøyesen, Pernille; McQueen, Fiona M; Gandjbakhch, Frédérique

    2011-01-01

    The aim of this multireader exercise was to assess the reliability and sensitivity to change of the psoriatic arthritis magnetic resonance imaging score (PsAMRIS) in PsA patients followed for 1 year.......The aim of this multireader exercise was to assess the reliability and sensitivity to change of the psoriatic arthritis magnetic resonance imaging score (PsAMRIS) in PsA patients followed for 1 year....

  7. Gait Deviation Index, Gait Profile Score and Gait Variable Score in children with spastic cerebral palsy: Intra-rater reliability and agreement across two repeated sessions.

    Science.gov (United States)

    Rasmussen, Helle Mätzke; Nielsen, Dennis Brandborg; Pedersen, Niels Wisbech; Overgaard, Søren; Holsgaard-Larsen, Anders

    2015-07-01

    The Gait Deviation Index (GDI) and Gait Profile Score (GPS) are the most used summary measures of gait in children with cerebral palsy (CP). However, the reliability and agreement of these indices have not been investigated, limiting their clinimetric quality for research and clinical practice. The aim of this study was to investigate the intra-rater reliability and agreement of summary measures of gait (GDI; GPS; and the Gait Variable Score (GVS) derived from the GPS). The intra-rater reliability and agreement were investigated across two repeated sessions in 18 children aged 5-12 years diagnosed with spastic CP. No systematic bias was observed between the sessions and no heteroscedasticity was observed in Bland-Altman plots. For the GDI and GPS, excellent reliability with intraclass correlation coefficient (ICC) values of 0.8-0.9 was found, while the GVS was found to have fair to good reliability with ICCs of 0.4-0.7. The agreement for the GDI and the logarithmically transformed GPS, in terms of the standard error of measurement as a percentage of the grand mean (SEM%) varied from 4.1 to 6.7%, whilst the smallest detectable change in percent (SDC%) ranged from 11.3 to 18.5%. For the logarithmically transformed GVS, we found a fair to large variation in SEM% from 7 to 29% and in SDC% from 18 to 81%. The GDI and GPS demonstrated excellent reliability and acceptable agreement proving that they can both be used in research and clinical practice. However, the observed large variability for some of the GVS requires cautious consideration when selecting outcome measures. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Effect of Clinically Discriminating, Evidence-Based Checklist Items on the Reliability of Scores from an Internal Medicine Residency OSCE

    Science.gov (United States)

    Daniels, Vijay J.; Bordage, Georges; Gierl, Mark J.; Yudkowsky, Rachel

    2014-01-01

    Objective structured clinical examinations (OSCEs) are used worldwide for summative examinations but often lack acceptable reliability. Research has shown that reliability of scores increases if OSCE checklists for medical students include only clinically relevant items. Also, checklists are often missing evidence-based items that high-achieving…

  9. Quality Evaluation Scores are no more Reliable than Gestalt in Evaluating the Quality of Emergency Medicine Blogs: A METRIQ Study.

    Science.gov (United States)

    Thoma, Brent; Sebok-Syer, Stefanie S; Colmers-Gray, Isabelle; Sherbino, Jonathan; Ankel, Felix; Trueger, N Seth; Grock, Andrew; Siemens, Marshall; Paddock, Michael; Purdy, Eve; Kenneth Milne, William; Chan, Teresa M

    2018-01-30

    Construct: We investigated the quality of emergency medicine (EM) blogs as educational resources. Online medical education resources such as blogs are increasingly used by EM trainees and clinicians. However, quality evaluations of these resources using gestalt are unreliable. We investigated the reliability of two previously derived quality evaluation instruments for blogs. Sixty English-language EM websites that published clinically oriented blog posts between January 1 and February 24, 2016, were identified. A random number generator selected 10 websites, and the 2 most recent clinically oriented blog posts from each site were evaluated using gestalt, the Academic Life in Emergency Medicine (ALiEM) Approved Instructional Resources (AIR) score, and the Medical Education Translational Resources: Impact and Quality (METRIQ-8) score, by a sample of medical students, EM residents, and EM attendings. Each rater evaluated all 20 blog posts with gestalt and 15 of the 20 blog posts with the ALiEM AIR and METRIQ-8 scores. Pearson's correlations were calculated between the average scores for each metric. Single-measure intraclass correlation coefficients (ICCs) evaluated the reliability of each instrument. Our study included 121 medical students, 88 EM residents, and 100 EM attendings who completed ratings. The average gestalt rating of each blog post correlated strongly with the average scores for ALiEM AIR (r = .94) and METRIQ-8 (r = .91). Single-measure ICCs were fair for gestalt (0.37, IQR 0.25-0.56), ALiEM AIR (0.41, IQR 0.29-0.60) and METRIQ-8 (0.40, IQR 0.28-0.59). The average scores of each blog post correlated strongly with gestalt ratings. However, neither ALiEM AIR nor METRIQ-8 showed higher reliability than gestalt. Improved reliability may be possible through rater training and instrument refinement.

  10. HitPredict version 4: comprehensive reliability scoring of physical protein?protein interactions from more than 100 species

    OpenAIRE

    L?pez, Yosvany; Nakai, Kenta; Patil, Ashwini

    2015-01-01

    HitPredict is a consolidated resource of experimentally identified, physical protein?protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein?protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of p...

  11. Reliability Models Applied to a System of Power Converters in Particle Accelerators

    OpenAIRE

    Siemaszko, D; Speiser, M; Pittet, S

    2012-01-01

    Several reliability models are studied when applied to a power system containing a large number of power converters. A methodology is proposed and illustrated in the case study of a novel linear particle accelerator designed for reaching high energies. The proposed methods result in the prediction of both reliability and availability of the considered system for optimisation purposes.

  12. Reliability of scoring arousals in normal children and children with obstructive sleep apnea syndrome.

    Science.gov (United States)

    Wong, Tat Kong; Galster, Patricia; Lau, Tai Shing; Lutz, Janita M; Marcus, Carole L

    2004-09-15

    Scoring of arousals in children is based on an extension of adult criteria, as defined by the American Sleep Disorders Association (ASDA). By this, a minimum duration of 3 seconds is required. A few recent studies utilized modified criteria for the study of children, with durations as short as 1 second. However, the validity and reliability of scoring these shorter arousals have never been verified. Based on studies in adults, we hypothesized that interscorer agreement for scoring arousals shorter than 3 seconds was poor. Retrospective review of polysomnograms by 2 experienced sleep practitioners who independently scored arousals according to the ASDA 3-second criteria and modified duration criteria of 1 and 2 seconds. Academic hospital. 20 polysomnographic studies from children aged 3 to 8 years with mild to severe obstructive sleep apnea syndrome, and 16 polysomnographic studies from normal children. None. The intraclass correlation coefficient for scoring ASDA arousals was 0.90 (95% confidence interval: 0.81-0.95), indicating excellent interscorer agreement. The intraclass correlation coefficient for scoring modified 1-second and 2-second arousals were 0.35 (95% confidence interval: 0.02-0.61) and 0.42 (95% confidence interval: 0.12-0.65) respectively, indicating poor to fair interscorer agreement. Furthermore, modified 1-second and 2-second arousals accounted for less than 15% of all arousals scored. We conclude that there is much poorer interscorer agreement for scoring arousals shorter than 3 seconds, when compared to the standard ASDA criteria. We propose that scoring of arousals in children should follow the standard ASDA criteria.

  13. Prediction of true test scores from observed item scores and ancillary data.

    Science.gov (United States)

    Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

    2015-05-01

    In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.

  14. Photovoltaic and Wind Turbine Integration Applying Cuckoo Search for Probabilistic Reliable Optimal Placement

    OpenAIRE

    R. A. Swief; T. S. Abdel-Salam; Noha H. El-Amary

    2018-01-01

    This paper presents an efficient Cuckoo Search Optimization technique to improve the reliability of electrical power systems. Various reliability objective indices such as Energy Not Supplied, System Average Interruption Frequency Index, System Average Interruption, and Duration Index are the main indices indicating reliability. The Cuckoo Search Optimization (CSO) technique is applied to optimally place the protection devices, install the distributed generators, and to determine the size of ...

  15. Scoring of the radiological picture of idiopathic interstitial pneumonia: a study to verify the reliability of the method

    International Nuclear Information System (INIS)

    Kocova, Eva; Vanasek, Jiri; Koblizek, Vladimir; Novosad, Jakub; Elias, Pavel; Bartos, Vladimir; Sterclova, Martina

    2015-01-01

    Idiopathic pulmonary fibrosis (IPF) is a clinical form of usual interstitial pneumonia (UIP). Computed chest tomography (CT) has a fundamental role in the multidisciplinary diagnostics. However, it has not been verified if and how a subjective opinion of a radiologists or pneumologists can influence the assessment and overall diagnostic summary. To verify the reliability of the scoring system. Assessment of conformity of the radiological score of high-resolution CT (HRCT) of lungs in patients with IPF was performed by a group of radiologists and pneumologists. Personal data were blinded and the assessment was performed independently using the Dutka/Vasakova scoring system (modification of the Gay system). The final score of the single assessors was then evaluated by means of the paired Spearman’s correlation and analysis of the principal components. Two principal components explaining cumulatively a 62% or 73% variability of the assessment of the single assessors were extracted during the analysis. The groups did not differ both in terms of specialty and experience with the assessment of the HRCT findings. According to our study, scoring of a radiological image using the Dutka/Vasakova system is a reliable method in the hands of experienced radiologists. Significant differences occur during the assessment performed by pneumologists especially during the evaluation of the alveolar changes

  16. Applying the Upper Integral to the Biometric Score Fusion Problem in the Identification Model

    Directory of Open Access Journals (Sweden)

    Khalid Fakhar

    2015-08-01

    Full Text Available This paper presents a new biometric score fusion approach in an identification system using the upper integral with respect to Sugeno’s fuzzy measure. First, the proposed method considers each individual matcher as a fuzzy set in order to handle uncertainty and imperfection in matching scores. Then, the corresponding fuzzy entropy estimates the reliability of the information provided by each biometric matcher. Next, the fuzzy densities are generated based on rank information and training accuracy. Finally, the results are aggregated using the upper fuzzy integral. Experimental results compared with other fusion methods demonstrate the good performance of the proposed approach.

  17. Reliability, Validity, and Optimal Cutoff Score of the Montreal Cognitive Assessment (Changsha Version) in Ischemic Cerebrovascular Disease Patients of Hunan Province, China

    Science.gov (United States)

    Tu, Qiu-yun; Jin, Hui; Ding, Bin-rong; Yang, Xia; Lei, Zeng-hui; Bai, Song; Zhang, Ying-dong; Tang, Xiang-qi

    2013-01-01

    Background/Aims The goal of this study was to examine the reliability and validity of the Changsha version of the Montreal Cognitive Assessment (MoCA-CS) in ischemic cerebrovascular disease patients of Hunan Province, China, and to explore the optimal cutoff score for detecting vascular cognitive impairment-no dementia (VCI-ND) and vascular dementia (VD). Methods Three hundred and thirty-eight ischemic cerebrovascular disease patients (131 with normal cognition, 111 with VCI-ND, and 96 with VD) and 132 healthy controls were recruited. All participants accepted examination by the MoCA-CS, Mini-Mental State Examination (MMSE), and other related scales. A detailed neuropsychological battery was used for making a final cognitive diagnosis. SPSS 16.0 statistical software was used for reliability, validity examination, and optimal cutoff score detection. Results Cronbach's α of the MoCA-CS was 0.884, and test-retest and interrater reliability of the MoCA-CS were 0.966 and 0.926, respectively. MoCA-CS scores were highly correlated with MMSE scores (r = 0.867) and simplified intelligence quotients (r = 0.822). The results indicate that 1 point should be added for subjects with less than 6 years of education, and that the optimal cutoff score for detecting VCI-ND is 26/27 (sensitivity 96.1%, specificity 75.6%), whereas the optimal cutoff score for detecting VD is 16/17 (sensitivity 92.7%, specificity 96.3%). Conclusion The MoCA-CS has good reliability and validity, and is a useful cognitive screening instrument for detecting VCI in the Chinese population. PMID:23637698

  18. Reliability, Validity, and Optimal Cutoff Score of the Montreal Cognitive Assessment (Changsha Version in Ischemic Cerebrovascular Disease Patients of Hunan Province, China

    Directory of Open Access Journals (Sweden)

    Qiu-yun Tu

    2013-02-01

    Full Text Available Background/Aims: The goal of this study was to examine the reliability and validity of the Changsha version of the Montreal Cognitive Assessment (MoCA-CS in ischemic cerebrovascular disease patients of Hunan Province, China, and to explore the optimal cutoff score for detecting vascular cognitive impairment-no dementia (VCI-ND and vascular dementia (VD. Methods: Three hundred and thirty-eight ischemic cerebrovascular disease patients (131 with normal cognition, 111 with VCI-ND, and 96 with VD and 132 healthy controls were recruited. All participants accepted examination by the MoCA-CS, Mini-Mental State Examination (MMSE, and other related scales. A detailed neuropsychological battery was used for making a final cognitive diagnosis. SPSS 16.0 statistical software was used for reliability, validity examination, and optimal cutoff score detection. Results: Cronbach’s α of the MoCA-CS was 0.884, and test-retest and interrater reliability of the MoCA-CS were 0.966 and 0.926, respectively. MoCA-CS scores were highly correlated with MMSE scores (r = 0.867 and simplified intelligence quotients (r = 0.822. The results indicate that 1 point should be added for subjects with less than 6 years of education, and that the optimal cutoff score for detecting VCI-ND is 26/27 (sensitivity 96.1%, specificity 75.6%, whereas the optimal cutoff score for detecting VD is 16/17 (sensitivity 92.7%, specificity 96.3%. Conclusion: The MoCA-CS has good reliability and validity, and is a useful cognitive screening instrument for detecting VCI in the Chinese population.

  19. Validity and Reliability of the Achilles Tendon Total Rupture Score

    DEFF Research Database (Denmark)

    Ganestam, Ann; Barfod, Kristoffer; Klit, Jakob

    2013-01-01

    study was to validate a Danish translation of the ATRS. The ATRS was translated into Danish according to internationally adopted standards. Of 142 patients, 90 with previous rupture of the Achilles tendon participated in the validity study and 52 in the reliability study. The ATRS showed moderately......The best treatment of acute Achilles tendon rupture remains debated. Patient-reported outcome measures have become cornerstones in treatment evaluations. The Achilles tendon total rupture score (ATRS) has been developed for this purpose but requires additional validation. The purpose of the present...... = .07). The limits of agreement were ±18.53. A strong correlation was found between test and retest (intercorrelation coefficient .908); the standard error of measurement was 6.7, and the minimal detectable change was 18.5. The Danish version of the ATRS showed moderately strong criterion validity...

  20. The Score Reliability of Draw-a-Person Intellectual Ability Test (DAP: IQ) for Rural Malawi Students

    Science.gov (United States)

    Khasu, Denis S.; Williams, Thomas O., Jr.

    2016-01-01

    In this brief article, the reliability of scores for the Draw-A-Person Intellectual Ability Test for Children, Adolescents, and Adults (DAP: IQ; Reynolds & Hickman, 2004) was examined through several analyses with a sample of 147 children from rural Malawi, Africa using a Chichewa translation of instructions. Cronbach alpha coefficients for…

  1. Reliable categorisation of visual scoring of coronary artery calcification on low-dose CT for lung cancer screening: validation with the standard Agatston score

    Energy Technology Data Exchange (ETDEWEB)

    Huang, Yi-Luan; Wu, Fu-Zong; Wang, Yen-Chi [Kaohsiung Veterans General Hospital, Department of Radiology, Kaohsiung 813 (China); National Yang Ming University, Faculty of Medicine, School of Medicine, Taipei (China); Ju, Yu-Jeng [National Taiwan University, Department of Psychology, Taipei (China); Mar, Guang-Yuan [Kaohsiung Veterans General Hospital, Division of Cardiology, Department of Medicine, Kaohsiung 813 (China); Chuo, Chiung-Chen [Kaohsiung Veterans General Hospital, Department of Radiology, Kaohsiung 813 (China); Lin, Huey-Shyan [Fooyin University, School of Nursing, Kaohsiung (China); Wu, Ming-Ting [Kaohsiung Veterans General Hospital, Department of Radiology, Kaohsiung 813 (China); National Yang Ming University, Faculty of Medicine, School of Medicine, Taipei (China); National Yang Ming University, Institute of Clinical Medicine, Taipei (China)

    2013-05-15

    To validate the reliability of the visual coronary artery calcification score (VCACS) on low-dose CT (LDCT) for concurrent screening of CAC and lung cancer. We enrolled 401 subjects receiving LDCT for lung cancer screening and ECG-gated CT for the Agatston score (AS). LDCT was reconstructed with 3- and 5-mm slice thickness (LDCT-3mm and LDCT-5mm respectively) for VCACS to obtain VCACS-3mm and VCACS-5mm respectively. After a training session comprising 32 cases, two observers performed four-scale VCACS (absent, mild, moderate, severe) of 369 data sets independently, the results were compared with four-scale AS (0, 1-100, 101-400, >400). CACs were present in 39.6 % (146/369) of subjects. The sensitivity of VCACS-3mm was higher than for VCACS-5mm (83.6 % versus 74.0 %). The median of AS of the 24 false-negative cases in VCACS-3mm was 2.3 (range 1.1-21.1). The false-negative rate for detecting AS {>=} 10 on LDCT-3mm was 1.9 %. VCACS-3mm had higher concordance with AS than VCACS-5mm (k = 0.813 versus k = 0.685). An extended test of VCACS-3mm for four junior observers showed high inter-observer reliability (intra-class correlation = 0.90) and good concordance with AS (k = 0.662-0.747). This study validated the reliability of VCACS on LDCT for lung cancer screening and showed that LDCT-3mm was more feasible than LDCT-5mm for CAD risk stratification. (orig.)

  2. Applying reliability analysis to design electric power systems for More-electric aircraft

    Science.gov (United States)

    Zhang, Baozhu

    The More-Electric Aircraft (MEA) is a type of aircraft that replaces conventional hydraulic and pneumatic systems with electrically powered components. These changes have significantly challenged the aircraft electric power system design. This thesis investigates how reliability analysis can be applied to automatically generate system topologies for the MEA electric power system. We first use a traditional method of reliability block diagrams to analyze the reliability level on different system topologies. We next propose a new methodology in which system topologies, constrained by a set reliability level, are automatically generated. The path-set method is used for analysis. Finally, we interface these sets of system topologies with control synthesis tools to automatically create correct-by-construction control logic for the electric power system.

  3. The use of the SF-36 questionnaire in adult survivors of childhood cancer: evaluation of data quality, score reliability, and scaling assumptions

    Directory of Open Access Journals (Sweden)

    Winter David L

    2006-10-01

    Full Text Available Abstract Background The SF-36 has been used in a number of previous studies that have investigated the health status of childhood cancer survivors, but it never has been evaluated regarding data quality, scaling assumptions, and reliability in this population. As health status among childhood cancer survivors is being increasingly investigated, it is important that the measurement instruments are reliable, validated and appropriate for use in this population. The aim of this paper was to determine whether the SF-36 questionnaire is a valid and reliable instrument in assessing self-perceived health status of adult survivors of childhood cancer. Methods We examined the SF-36 to see how it performed with respect to (1 data completeness, (2 distribution of the scale scores, (3 item-internal consistency, (4 item-discriminant validity, (5 internal consistency, and (6 scaling assumptions. For this investigation we used SF-36 data from a population-based study of 10,189 adult survivors of childhood cancer. Results Overall, missing values ranged per item from 0.5 to 2.9 percent. Ceiling effects were found to be highest in the role limitation-physical (76.7% and role limitation-emotional (76.5% scales. All correlations between items and their hypothesised scales exceeded the suggested standard of 0.40 for satisfactory item-consistency. Across all scales, the Cronbach's alpha coefficient of reliability was found to be higher than the suggested value of 0.70. Consistent across all cancer groups, the physical health related scale scores correlated strongly with the Physical Component Summary (PCS scale scores and weakly with the Mental Component Summary (MCS scale scores. Also, the mental health and role limitation-emotional scales correlated strongly with the MCS scale score and weakly with the PCS scale score. Moderate to strong correlations with both summary scores were found for the general health perception, energy/vitality, and social functioning

  4. An analysis of reliability and validity of the papilla index score of implant-supported single crowns of maxillary central incisors

    DEFF Research Database (Denmark)

    Peng, Min; Fei, Wei; Hosseini, Mandana

    2012-01-01

    Objectives: To test the reliability and validity of the papilla index scores of the implant-supported single crowns (ISSCs) of maxillary central incisors. Materials and Methods: Twenty-five patients with 25 ISSCs were included. Two prosthodontists evaluated the papilla index score (PIS) of three...... inter-observer agreement. The PIS score demonstrated significant correlation to the corresponding PP value (rs=.567, p=.000). Conclusions: The feasibility, reliability and validity of the PIS made the parameter useful for quality control of the pri-implant soft tissue of ISSCs....... fill percent (PP) was calculated. The validity of PIS was tested against the corresponding papilla fill percent (PP) by using the Spearman correlation analysis. Results: The intra-observer agreement was >70% in 4/5 and >50% in all observations, the pooled Cohen’s ¿ was 0.64 and 0.70 for two observers...

  5. High inter-rater reliability, agreement, and convergent validity of Constant score in patients with clavicle fractures

    DEFF Research Database (Denmark)

    Ban, Ilija; Troelsen, Anders; Kristensen, Morten Tange

    2016-01-01

    BACKGROUND: The Constant score (CS) has been the primary endpoint in most studies on clavicle fractures. However, the CS was not developed to assess patients with clavicle fractures. Our aim was to examine inter-rater reliability and agreement of the CS in patients with clavicle fractures...... standardized CS assessment at a mean of 6.8 weeks (SD, 1.0 weeks) after injury. Reliability and agreement of the CS were determined by 2 raters. The interclass correlation coefficient (ICC2,1), standard error of measurement, minimal detectable change, Cronbach α coefficient, and Pearson correlation coefficient...... were estimated. RESULTS: Inter-rater reliability of the total CS was excellent (interclass correlation coefficient, 0.94; 95% confidence interval, 0.88-0.97), with no systematic difference between the 2 raters (P = .75). The standard error of measurement (measurement error at the group level) was 4...

  6. Evaluation of Fracture and Osteotomy Union in the Setting of Osteogenesis Imperfecta: Reliability of the Modified Radiographic Union Score for Tibial Fractures (RUST).

    Science.gov (United States)

    Franzone, Jeanne M; Finkelstein, Mark S; Rogers, Kenneth J; Kruse, Richard W

    2017-09-08

    Evaluation of the union of osteotomies and fractures in patients with osteogenesis imperfecta (OI) is a critical component of patient care. Studies of the OI patient population have so far used varied criteria to evaluate bony union. The radiographic union score for tibial fractures (RUST), which was subsequently revised to the modified RUST, is an objective standardized method of evaluating fracture healing. We sought to evaluate the reliability of the modified RUST in the setting of the tibias of patients with OI. Tibial radiographs of 30 patients with OI fractures, or osteotomies were scored by 3 observers on 2 separate occasions. Each of the 4 cortices was given a score (1=no callus, 2=callus present, 3=bridging callus, and 4=remodeled, fracture not visible) and the modified RUST is the sum of these scores (range, 4 to 16). The interobserver and intraobserver reliabilities were evaluated using intraclass coefficients (ICC) with 95% confidence intervals. The ICC representing the interobserver reliability for the first iteration of scores was 0.926 (0.864 to 0.962) and for the second series was 0.915 (0.845 to 0.957). The ICCs representing the intraobserver reliability for each of the 3 reviewers for the measurements in series 1 and 2 were 0.860 (0.707 to 0.934), 0.994 (0.986 to 0.997), and 0.974 (0.946 to 0.988). The modified RUST has excellent interobserver and intraobserver reliability in the setting of OI despite challenges related to the poor quality of the bone and its dysplastic nature. The application and routine use of the modified RUST in the OI population will help standardize our evaluation of osteotomy and fracture healing. Level III-retrospective study of nonconsecutive patients.

  7. An alternative to the balance error scoring system: using a low-cost balance board to improve the validity/reliability of sports-related concussion balance testing.

    Science.gov (United States)

    Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J

    2014-05-01

    Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.

  8. Photovoltaic and Wind Turbine Integration Applying Cuckoo Search for Probabilistic Reliable Optimal Placement

    Directory of Open Access Journals (Sweden)

    R. A. Swief

    2018-01-01

    Full Text Available This paper presents an efficient Cuckoo Search Optimization technique to improve the reliability of electrical power systems. Various reliability objective indices such as Energy Not Supplied, System Average Interruption Frequency Index, System Average Interruption, and Duration Index are the main indices indicating reliability. The Cuckoo Search Optimization (CSO technique is applied to optimally place the protection devices, install the distributed generators, and to determine the size of distributed generators in radial feeders for reliability improvement. Distributed generator affects reliability and system power losses and voltage profile. The volatility behaviour for both photovoltaic cells and the wind turbine farms affect the values and the selection of protection devices and distributed generators allocation. To improve reliability, the reconfiguration will take place before installing both protection devices and distributed generators. Assessment of consumer power system reliability is a vital part of distribution system behaviour and development. Distribution system reliability calculation will be relayed on probabilistic reliability indices, which can expect the disruption profile of a distribution system based on the volatility behaviour of added generators and load behaviour. The validity of the anticipated algorithm has been tested using a standard IEEE 69 bus system.

  9. The Portuguese version of the Outcome Questionnaire (OQ-45): Normative data, reliability, and clinical significance cut-offs scores.

    Science.gov (United States)

    Machado, Paulo P P; Fassnacht, Daniel B

    2015-12-01

    The Outcome Questionnaire (OQ-45) is one of the most extensively used standardized self-report instruments to monitor psychotherapy outcomes. The questionnaire is designed specifically for the assessment of change during psychotherapy treatments. Therefore, it is crucial to provide norms and clinical cut-off values for clinicians and researchers. The current study aims at providing study provides norms, reliability indices, and clinical cut-off values for the Portuguese version of the scale. Data from two large non-clinical samples (high school/university, N = 1,669; community, N = 879) and one clinical sample (n = 201) were used to investigate psychometric properties and derive normative data for all OQ-45 subscales and the total score. Significant and substantial differences were found for all subscales between the clinical and non-clinical sample. The Portuguese version also showed adequate reliabilities (internal consistency, test-retest), which were comparable to the original version. To assess individual clinical change, clinical cut-off values and reliable change indices were calculated allowing clinicians and researchers to monitor and evaluate clients' individual change. The Portuguese version of the OQ-45 is a reliable instrument with comparable Portuguese norms and cut-off scores to those from the original version. This allows clinicians and researchers to use this instrument for evaluating change and outcome in psychotherapy. This study provides norms for non-clinical and clinical Portuguese samples and investigates the reliability (internal consistency and test-retest) of the OQ-45. Cut-off values and reliable change index are provided allowing clinicians to evaluate clinical change and clients' response to treatment, monitoring the quality of mental health care services. These can be used, in routine clinical practice, as benchmarks for treatment progress and to empirically base clinical decisions such as continuation of treatment or considering

  10. Spousal concordance and reliability of the 'Prudence Score' as a summary of diet and lifestyle.

    Science.gov (United States)

    Parekh, Sanjoti; King, David; Owen, Neville; Jamrozik, Konrad

    2009-08-01

    This paper describes a composite 'Prudence Score' summarising self-reported behavioural risk factors for non-communicable diseases. If proved robust, the 'Prudence score' might be used widely to encourage large numbers of individuals to adopt and maintain simple, healthy changes in their lifestyle. We calculated the 'Prudence Score' based on responses collected in late 2006 to a postal questionnaire sent to 225 adult patients aged 25 to 75 years identified from the records of two general medical practices in Brisbane, Australia. Participants completed the behavioural, dietary and lifestyle items in relation to their spouse as well as themselves. The spouse or partner of each addressee completed their own copy of the study questionnaire. Kappa scores for spousal concordance with probands' reports (n = 45 pairs) on diet-related items varied between 0.35 (for vegetable intake) to 0.77 (for usual type of milk consumed). Spousal concordance values for other behaviours were 0.67 (physical activity), 0.82 (alcohol intake) and 1.0 (smoking habits). Kappa scores for test-retest reliability (n = 53) varied between 0.47 (vegetable intake) and 0.98 (smoking habits). The veracity of self-reported data is a challenge for studies of behavioural change. Our results indicate moderate to substantial agreement from life partners regarding individuals' self-reports for most of the behavioural risk items included in the 'Prudence Score'. This increases confidence that key aspects of diet and lifestyle can be assessed by self-report. The 'Prudence Score' potentially has wide application as a simple and robust tool for health promotion programs.

  11. Reliability of Scores Obtained from Self-, Peer-, and Teacher-Assessments on Teaching Materials Prepared by Teacher Candidates

    Science.gov (United States)

    Nalbantoglu Yilmaz, Funda

    2017-01-01

    This study aims to determine the reliability of scores obtained from self-, peer-, and teacher-assessments in terms of teaching materials prepared by teacher candidates. The study group of this research constitutes 56 teacher candidates. In the scope of research, teacher candidates were asked to develop teaching material related to their study.…

  12. Reliability and validity of the new Tanaka B Intelligence Scale scores: a group intelligence test.

    Science.gov (United States)

    Uno, Yota; Mizukami, Hitomi; Ando, Masahiko; Yukihiro, Ryoji; Iwasaki, Yoko; Ozaki, Norio

    2014-01-01

    The present study evaluated the reliability and concurrent validity of the new Tanaka B Intelligence Scale, which is an intelligence test that can be administered on groups within a short period of time. The new Tanaka B Intelligence Scale and Wechsler Intelligence Scale for Children-Third Edition were administered to 81 subjects (mean age ± SD 15.2 ± 0.7 years) residing in a juvenile detention home; reliability was assessed using Cronbach's alpha coefficient, and concurrent validity was assessed using the one-way analysis of variance intraclass correlation coefficient. Moreover, receiver operating characteristic analysis for screening for individuals who have a deficit in intellectual function (an FIQIntelligence Scale IQ (BIQ) was 0.86, and the intraclass correlation coefficient with FIQ was 0.83. Receiver operating characteristic analysis demonstrated an area under the curve of 0.89 (95% CI: 0.85-0.96). In addition, the stratum-specific likelihood ratio for the BIQ≤65 stratum was 13.8 (95% CI: 3.9-48.9), and the stratum-specific likelihood ratio for the BIQ≥76 stratum was 0.1 (95% CI: 0.03-0.4). Thus, intellectual disability could be ruled out or determined. The present results demonstrated that the new Tanaka B Intelligence Scale score had high reliability and concurrent validity with the Wechsler Intelligence Scale for Children-Third Edition score. Moreover, the post-test probability for the BIQ could be calculated when screening for individuals who have a deficit in intellectual function. The new Tanaka B Intelligence Test is convenient and can be administered within a variety of settings. This enables evaluation of intellectual development even in settings where performing intelligence tests have previously been difficult.

  13. Validity and reliability of Thai version of the Foot and Ankle Outcome Score in patients with arthritis of the foot and ankle.

    Science.gov (United States)

    Angthong, Chayanin

    2016-12-01

    Although the Foot and Ankle Outcome Score (FAOS) is commonly used in several languages for a variety of foot disorders, it has not been validated specifically for foot and ankle arthritic conditions. The aims of the present study were to translate the original English FAOS into Thai and to evaluate the validity and reliability of the Thai version of the FAOS for the foot and ankle arthritic conditions. The original FAOS was translated into Thai using forward-backward translation. The Thai FAOS and validated Thai Short Form-36 (SF-36 ® ) questionnaires were distributed to 44 Thai patients suffering from arthritis of the foot and ankle to complete. For validation, Thai FAOS scores were correlated with SF-36 scores. Test-retest reliability and internal consistency were also analyzed in this study. The Thai FAOS score demonstrated sufficient correlation with SF-36 total score in Pain (Pearson's correlation coefficient (r)=0.45, p=0.002), Symptoms (r=0.45, p=0.002), Activities of Daily Living (ADL) (r=0.47, p=0.001), and Quality of Life (QOL) (r=0.38, p=0.011) subscales. The Sports and Recreational Activities (Sports & Rec) subscale did not correlate significantly with the SF-36 ® (r=0.20, p=0.20). Cronbach's alpha, a measure of internal consistency, for the five subscales was as follows: Pain, 0.94 (pvalidity for the evaluation of foot and ankle arthritis. Although reliability was satisfactory for the major subscale ADL, it was not sufficient for the minor subscales. Our findings suggest that it can be used as a disease-specific instrument to evaluate foot and ankle arthritis and can complement other reliable outcome surveys. Copyright © 2015 European Foot and Ankle Society. Published by Elsevier Ltd. All rights reserved.

  14. Standards and reliability in evaluation: when rules of thumb don't apply.

    Science.gov (United States)

    Norcini, J J

    1999-10-01

    The purpose of this paper is to identify situations in which two rules of thumb in evaluation do not apply. The first rule is that all standards should be absolute. When selection decisions are being made or when classroom tests are given, however, relative standards may be better. The second rule of thumb is that every test should have a reliability of .80 or better. Depending on the circumstances, though, the standard error of measurement, the consistency of pass/fail classifications, and the domain-referenced reliability coefficients may be better indicators of reproducibility.

  15. Malingering in Toxic Exposure. Classification Accuracy of Reliable Digit Span and WAIS-III Digit Span Scaled Scores

    Science.gov (United States)

    Greve, Kevin W.; Springer, Steven; Bianchini, Kevin J.; Black, F. William; Heinly, Matthew T.; Love, Jeffrey M.; Swift, Douglas A.; Ciota, Megan A.

    2007-01-01

    This study examined the sensitivity and false-positive error rate of reliable digit span (RDS) and the WAIS-III Digit Span (DS) scaled score in persons alleging toxic exposure and determined whether error rates differed from published rates in traumatic brain injury (TBI) and chronic pain (CP). Data were obtained from the files of 123 persons…

  16. Reliability and validity of the new Tanaka B Intelligence Scale scores: a group intelligence test.

    Directory of Open Access Journals (Sweden)

    Yota Uno

    Full Text Available OBJECTIVE: The present study evaluated the reliability and concurrent validity of the new Tanaka B Intelligence Scale, which is an intelligence test that can be administered on groups within a short period of time. METHODS: The new Tanaka B Intelligence Scale and Wechsler Intelligence Scale for Children-Third Edition were administered to 81 subjects (mean age ± SD 15.2 ± 0.7 years residing in a juvenile detention home; reliability was assessed using Cronbach's alpha coefficient, and concurrent validity was assessed using the one-way analysis of variance intraclass correlation coefficient. Moreover, receiver operating characteristic analysis for screening for individuals who have a deficit in intellectual function (an FIQ<70 was performed. In addition, stratum-specific likelihood ratios for detection of intellectual disability were calculated. RESULTS: The Cronbach's alpha for the new Tanaka B Intelligence Scale IQ (BIQ was 0.86, and the intraclass correlation coefficient with FIQ was 0.83. Receiver operating characteristic analysis demonstrated an area under the curve of 0.89 (95% CI: 0.85-0.96. In addition, the stratum-specific likelihood ratio for the BIQ≤65 stratum was 13.8 (95% CI: 3.9-48.9, and the stratum-specific likelihood ratio for the BIQ≥76 stratum was 0.1 (95% CI: 0.03-0.4. Thus, intellectual disability could be ruled out or determined. CONCLUSION: The present results demonstrated that the new Tanaka B Intelligence Scale score had high reliability and concurrent validity with the Wechsler Intelligence Scale for Children-Third Edition score. Moreover, the post-test probability for the BIQ could be calculated when screening for individuals who have a deficit in intellectual function. The new Tanaka B Intelligence Test is convenient and can be administered within a variety of settings. This enables evaluation of intellectual development even in settings where performing intelligence tests have previously been difficult.

  17. The reliability of the Glasgow Coma Scale: a systematic review.

    Science.gov (United States)

    Reith, Florence C M; Van den Brande, Ruben; Synnot, Anneliese; Gruen, Russell; Maas, Andrew I R

    2016-01-01

    The Glasgow Coma Scale (GCS) provides a structured method for assessment of the level of consciousness. Its derived sum score is applied in research and adopted in intensive care unit scoring systems. Controversy exists on the reliability of the GCS. The aim of this systematic review was to summarize evidence on the reliability of the GCS. A literature search was undertaken in MEDLINE, EMBASE and CINAHL. Observational studies that assessed the reliability of the GCS, expressed by a statistical measure, were included. Methodological quality was evaluated with the consensus-based standards for the selection of health measurement instruments checklist and its influence on results considered. Reliability estimates were synthesized narratively. We identified 52 relevant studies that showed significant heterogeneity in the type of reliability estimates used, patients studied, setting and characteristics of observers. Methodological quality was good (n = 7), fair (n = 18) or poor (n = 27). In good quality studies, kappa values were ≥0.6 in 85%, and all intraclass correlation coefficients indicated excellent reliability. Poor quality studies showed lower reliability estimates. Reliability for the GCS components was higher than for the sum score. Factors that may influence reliability include education and training, the level of consciousness and type of stimuli used. Only 13% of studies were of good quality and inconsistency in reported reliability estimates was found. Although the reliability was adequate in good quality studies, further improvement is desirable. From a methodological perspective, the quality of reliability studies needs to be improved. From a clinical perspective, a renewed focus on training/education and standardization of assessment is required.

  18. A G-function-based reliability-based design methodology applied to a cam roller system

    International Nuclear Information System (INIS)

    Wang, W.; Sui, P.; Wu, Y.T.

    1996-01-01

    Conventional reliability-based design optimization methods treats the reliability function as an ordinary function and applies existing mathematical programming techniques to solve the design problem. As a result, the conventional approach requires nested loops with respect to g-function, and is very time consuming. A new reliability-based design method is proposed in this paper that deals with the g-function directly instead of the reliability function. This approach has the potential of significantly reducing the number of calls for g-function calculations since it requires only one full reliability analysis in a design iteration. A cam roller system in a typical high pressure fuel injection diesel engine is designed using both the proposed and the conventional approach. The proposed method is much more efficient for this application

  19. Reliability and sensitivity to change of the OMERACT rheumatoid arthritis magnetic resonance imaging score in a multireader, longitudinal setting

    DEFF Research Database (Denmark)

    Haavardsholm, ea; Østergaard, Mikkel; Kvan, NP

    2005-01-01

    OBJECTIVE: To assess the intra- and interreader reliability and the sensitivity to change of the Outcome Measures in Rheumatology Clinical Trials (OMERACT) Rheumatoid Arthritis Magnetic Resonance Imaging Score (RAMRIS) system on digital images of the wrist joints of patients with early or establi...

  20. Reliability of a retail food store survey and development of an accompanying retail scoring system to communicate survey findings and identify vendors for healthful food and marketing initiatives.

    Science.gov (United States)

    Ghirardelli, Alyssa; Quinn, Valerie; Sugerman, Sharon

    2011-01-01

    To develop a retail grocery instrument with weighted scoring to be used as an indicator of the food environment. Twenty six retail food stores in low-income areas in California. Observational. Inter-rater reliability for grocery store survey instrument. Description of store scoring methodology weighted to emphasize availability of healthful food. Type A intra-class correlation coefficients (ICC) with absolute agreement definition or a κ test for measures using ranges as categories. Measures of availability and price of fruits and vegetables performed well in reliability testing (κ = 0.681-0.800). Items for vegetable quality were better than for fruit (ICC 0.708 vs 0.528). Kappa scores indicated low to moderate agreement (0.372-0.674) on external store marketing measures and higher scores for internal store marketing. "Next to" the checkout counter was more reliable than "within 6 feet." Health departments using the store scoring system reported it as the most useful communication of neighborhood findings. There was good reliability of the measures among the research pairs. The local store scores can show the need to bring in resources and to provide access to fruits and vegetables and other healthful food. Copyright © 2011 Society for Nutrition Education. Published by Elsevier Inc. All rights reserved.

  1. A practical guide to propensity score analysis for applied clinical research.

    Science.gov (United States)

    Lee, Jaehoon; Little, Todd D

    2017-11-01

    Observational studies are often the only viable options in many clinical settings, especially when it is unethical or infeasible to randomly assign participants to different treatment régimes. In such case propensity score (PS) analysis can be applied to accounting for possible selection bias and thereby addressing questions of causal inference. Many PS methods exist, yet few guidelines are available to aid applied researchers in their conduct and evaluation of a PS analysis. In this article we give an overview of available techniques for PS estimation and application, balance diagnostic, treatment effect estimation, and sensitivity assessment, as well as recent advances. We also offer a tutorial that can be used to emulate the steps of PS analysis. Our goal is to provide information that will bring PS analysis within the reach of applied clinical researchers and practitioners. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Reliability analysis applied to structural tests

    Science.gov (United States)

    Diamond, P.; Payne, A. O.

    1972-01-01

    The application of reliability theory to predict, from structural fatigue test data, the risk of failure of a structure under service conditions because its load-carrying capability is progressively reduced by the extension of a fatigue crack, is considered. The procedure is applicable to both safe-life and fail-safe structures and, for a prescribed safety level, it will enable an inspection procedure to be planned or, if inspection is not feasible, it will evaluate the life to replacement. The theory has been further developed to cope with the case of structures with initial cracks, such as can occur in modern high-strength materials which are susceptible to the formation of small flaws during the production process. The method has been applied to a structure of high-strength steel and the results are compared with those obtained by the current life estimation procedures. This has shown that the conventional methods can be unconservative in certain cases, depending on the characteristics of the structure and the design operating conditions. The suitability of the probabilistic approach to the interpretation of the results from full-scale fatigue testing of aircraft structures is discussed and the assumptions involved are examined.

  3. Test Reliability at the Individual Level

    Science.gov (United States)

    Hu, Yueqin; Nesselroade, John R.; Erbacher, Monica K.; Boker, Steven M.; Burt, S. Alexandra; Keel, Pamela K.; Neale, Michael C.; Sisk, Cheryl L.; Klump, Kelly

    2016-01-01

    Reliability has a long history as one of the key psychometric properties of a test. However, a given test might not measure people equally reliably. Test scores from some individuals may have considerably greater error than others. This study proposed two approaches using intraindividual variation to estimate test reliability for each person. A simulation study suggested that the parallel tests approach and the structural equation modeling approach recovered the simulated reliability coefficients. Then in an empirical study, where forty-five females were measured daily on the Positive and Negative Affect Schedule (PANAS) for 45 consecutive days, separate estimates of reliability were generated for each person. Results showed that reliability estimates of the PANAS varied substantially from person to person. The methods provided in this article apply to tests measuring changeable attributes and require repeated measures across time on each individual. This article also provides a set of parallel forms of PANAS. PMID:28936107

  4. THE RELIABILITY OF THE MANKIN SCORE FOR OSTEOARTHRITIS

    NARCIS (Netherlands)

    van der Sluijs, J.A.; GEESINK, RGT; van der Linden, A.J.; BULSTRA, SK; Kuijer, Roelof; DRUKKER, J

    For the histopathological classification of the severity of osteoarthritic lesions of cartilage, the Mankin score is frequently used. A necessary constraint on the validity of this scoring system is the consistency with which cartilage lesions are classified. The intra- and interobserver agreement

  5. Test-retest reliability at the item level and total score level of the Norwegian version of the Spinal Cord Injury Falls Concern Scale (SCI-FCS).

    Science.gov (United States)

    Roaldsen, Kirsti Skavberg; Måøy, Åsa Blad; Jørgensen, Vivien; Stanghelle, Johan Kvalvik

    2016-05-01

    Translation of the Spinal Cord Injury Falls Concern Scale (SCI-FCS), and investigation of test-retest reliability on item-level and total-score-level. Translation, adaptation and test-retest study. A specialized rehabilitation setting in Norway. Fifty-four wheelchair users with a spinal cord injury. The median age of the cohort was 49 years, and the median number of years after injury was 13. Interventions/measurements: The SCI-FCS was translated and back-translated according to guidelines. Individuals answered the SCI-FCS twice over the course of one week. We investigated item-level test-retest reliability using Svensson's rank-based statistical method for disagreement analysis of paired ordinal data. For relative reliability, we analyzed the total-score-level test-retest reliability with intraclass correlation coefficients (ICC2.1), the standard error of measurement (SEM), and the smallest detectable change (SDC) for absolute reliability/measurement-error assessment and Cronbach's alpha for internal consistency. All items showed satisfactory percentage agreement (≥69%) between test and retest. There were small but non-negligible systematic disagreements among three items; we recovered an 11-13% higher chance for a lower second score. There was no disagreement due to random variance. The test-retest agreement (ICC2.1) was excellent (0.83). The SEM was 2.6 (12%), and the SDC was 7.1 (32%). The Cronbach's alpha was high (0.88). The Norwegian SCI-FCS is highly reliable for wheelchair users with chronic spinal cord injuries.

  6. ERP Reliability Analysis (ERA) Toolbox: An open-source toolbox for analyzing the reliability of event-related brain potentials.

    Science.gov (United States)

    Clayson, Peter E; Miller, Gregory A

    2017-01-01

    Generalizability theory (G theory) provides a flexible, multifaceted approach to estimating score reliability. G theory's approach to estimating score reliability has important advantages over classical test theory that are relevant for research using event-related brain potentials (ERPs). For example, G theory does not require parallel forms (i.e., equal means, variances, and covariances), can handle unbalanced designs, and provides a single reliability estimate for designs with multiple sources of error. This monograph provides a detailed description of the conceptual framework of G theory using examples relevant to ERP researchers, presents the algorithms needed to estimate ERP score reliability, and provides a detailed walkthrough of newly-developed software, the ERP Reliability Analysis (ERA) Toolbox, that calculates score reliability using G theory. The ERA Toolbox is open-source, Matlab software that uses G theory to estimate the contribution of the number of trials retained for averaging, group, and/or event types on ERP score reliability. The toolbox facilitates the rigorous evaluation of psychometric properties of ERP scores recommended elsewhere in this special issue. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. The Reliability and Validity of Weighted Composite Scores.

    Science.gov (United States)

    Kane, Michael; Case, Susan

    The scores on two distinct tests (e.g., essay and objective) are often combined into a composite score, which is used to make decisions. The validity of the observed composite can sometimes be evaluated relative to a separate criterion. In cases where no criterion is available, the observed composite has generally been evaluated in terms of its…

  8. High intertester reliability of the cumulated ambulation score for the evaluation of basic mobility in patients with hip fracture

    DEFF Research Database (Denmark)

    Kristensen, Morten Tange; Andersen, Lene; Bech-Jensen, Rie

    2009-01-01

    OBJECTIVE: To examine the intertester reliability of the three activities of the Cumulated Ambulation Score (CAS) and the total CAS, and to define limits for the smallest change in basic mobility that indicates a real change in patients with hip fracture. DESIGN: An intertester reliability study....... SETTING: An acute 20-bed orthopaedic hip fracture unit. SUBJECTS: Fifty consecutive patients with a median age of 83 (25-75% quartile, 68-86) years. INTERVENTIONS: The CAS, which describes the patient's independency in three activities - (1) getting in and out of bed, (2) sit to stand from a chair, and (3...

  9. Validation of use of subsets of teeth when applying the total mouth periodontal score (TMPS) system in dogs.

    Science.gov (United States)

    Harvey, Colin E; Laster, Larry; Shofer, Frances S

    2012-01-01

    A total mouth periodontal score (TMPS) system in dogs has been described previously. Use of buccal and palatal/lingual surfaces of all teeth requires observation and recording of 120 gingivitis scores and 120 periodontitis scores. Although the result is a reliable, repeatable assessment of the extent of periodontal disease in the mouth, observing and recording 240 data points is time-consuming. Using data from a previously reported study of periodontal disease in dogs, correlation analysis was used to determine whether use of any of seven different subsets of teeth can generate TMPS subset gingivitis and periodontitis scores that are highly correlated with TMPS all-site, all-teeth scores. Overall, gingivitis scores were less highly correlated than periodontitis scores. The minimal tooth set with a significant intra-class correlation (> or = 0.9 of means of right and left sides) for both gingivitis scores and attachment loss measurements consisted of the buccal surface of the maxillary third incisor canine, third premolar fourth premolar; and first molar teeth; and, the mandibular canine, third premolar, fourth premolar and first molar teeth on one side (9 teeth, 15 root sites). Use of this subset of teeth, which reduces the number of data points per dog from 240 to 30 for gingivitis and periodontitis at each scoring episode, is recommended when calculating the gingivitis and periodontitis scores using the TMPS system.

  10. Reliability, validity and responsiveness of the German self-reported foot and ankle score (SEFAS) in patients with foot or ankle surgery.

    Science.gov (United States)

    Arbab, Dariusch; Kuhlmann, Katharina; Schnurr, Christoph; Bouillon, Bertil; Lüring, Christian; König, Dietmar

    2017-10-10

    Patient-reported outcome measures are a critical tool in evaluating the efficacy of orthopedic procedures and are increasingly used in clinical trials to assess outcomes of health care. The intention of this study was to develop and culturally adapt a German version of the Self-reported Foot and Ankle Score (SEFAS) and to evaluate reliability, validity and responsiveness. According to Cross Cultural Adaptation of Self-Reported Measure guidelines forward and backward translation has been performed. The German SEFAS was investigated in 177 consecutive patients. 177 Patients completed the German SEFAS, Foot and Ankle Outcome Score (FAOS), Short-Form 36 and numeric scales for pain and disability (NRS) before and 118 patients 6 months after foot or ankle surgery. Test-Retest reliability, internal consistency, floor and ceiling effects, construct validity and minimal important change were analyzed. The German SEFAS demonstrated excellent test-retest reliability with ICC values of 0.97. Cronbach's alpha (α) value of 0.89 demonstrated strong internal consistency. No floor or ceiling effects were observed for the German version of the SEFAS. As hypothesized SEFAS correlated strongly with FAOS and SF-36 domains. It showed moderate (ES/SRM > 0.5) responsiveness between preoperative assessment and postoperative follow-up. The German version of the SEFAS demonstrated good psychometric properties. It proofed to be a valid and reliable instrument for use in foot and ankle patients. DRKS00007585.

  11. Reliability concepts applied to cutting tool change time

    Energy Technology Data Exchange (ETDEWEB)

    Patino Rodriguez, Carmen Elena, E-mail: cpatino@udea.edu.c [Department of Industrial Engineering, University of Antioquia, Medellin (Colombia); Department of Mechatronics and Mechanical Systems, Polytechnic School, University of Sao Paulo, Sao Paulo (Brazil); Francisco Martha de Souza, Gilberto [Department of Mechatronics and Mechanical Systems, Polytechnic School, University of Sao Paulo, Sao Paulo (Brazil)

    2010-08-15

    This paper presents a reliability-based analysis for calculating critical tool life in machining processes. It is possible to determine the running time for each tool involved in the process by obtaining the operations sequence for the machining procedure. Usually, the reliability of an operation depends on three independent factors: operator, machine-tool and cutting tool. The reliability of a part manufacturing process is mainly determined by the cutting time for each job and by the sequence of operations, defined by the series configuration. An algorithm is presented to define when the cutting tool must be changed. The proposed algorithm is used to evaluate the reliability of a manufacturing process composed of turning and drilling operations. The reliability of the turning operation is modeled based on data presented in the literature, and from experimental results, a statistical distribution of drilling tool wear was defined, and the reliability of the drilling process was modeled.

  12. Reliability concepts applied to cutting tool change time

    International Nuclear Information System (INIS)

    Patino Rodriguez, Carmen Elena; Francisco Martha de Souza, Gilberto

    2010-01-01

    This paper presents a reliability-based analysis for calculating critical tool life in machining processes. It is possible to determine the running time for each tool involved in the process by obtaining the operations sequence for the machining procedure. Usually, the reliability of an operation depends on three independent factors: operator, machine-tool and cutting tool. The reliability of a part manufacturing process is mainly determined by the cutting time for each job and by the sequence of operations, defined by the series configuration. An algorithm is presented to define when the cutting tool must be changed. The proposed algorithm is used to evaluate the reliability of a manufacturing process composed of turning and drilling operations. The reliability of the turning operation is modeled based on data presented in the literature, and from experimental results, a statistical distribution of drilling tool wear was defined, and the reliability of the drilling process was modeled.

  13. SIGI: score-based identification of genomic islands

    Directory of Open Access Journals (Sweden)

    Merkl Rainer

    2004-03-01

    Full Text Available Abstract Background Genomic islands can be observed in many microbial genomes. These stretches of DNA have a conspicuous composition with regard to sequence or encoded functions. Genomic islands are assumed to be frequently acquired via horizontal gene transfer. For the analysis of genome structure and the study of horizontal gene transfer, it is necessary to reliably identify and characterize these islands. Results A scoring scheme on codon frequencies Score_G1G2(cdn = log(f_G2(cdn / f_G1(cdn was utilized. To analyse genes of a species G1 and to test their relatedness to species G2, scores were determined by applying the formula to log-odds derived from mean codon frequencies of the two genomes. A non-redundant set of nearly 400 codon usage tables comprising microbial species was derived; its members were used alternatively at position G2. Genes having at least one score value above a species-specific and dynamically determined cut-off value were analysed further. By means of cluster analysis, genes were identified that comprise clusters of statistically significant size. These clusters were predicted as genomic islands. Finally and individually for each of these genes, the taxonomical relation among those species responsible for significant scores was interpreted. The validity of the approach and its limitations were made plausible by an extensive analysis of natural genes and synthetic ones aimed at modelling the process of gene amelioration. Conclusions The method reliably allows to identify genomic island and the likely origin of alien genes.

  14. Reliability analysis of visual ranking of coronary artery calcification on low-dose CT of the thorax for lung cancer screening: comparison with ECG-gated calcium scoring CT.

    Science.gov (United States)

    Kim, Yoon Kyung; Sung, Yon Mi; Cho, So Hyun; Park, Young Nam; Choi, Hye-Young

    2014-12-01

    Coronary artery calcification (CAC) is frequently detected on low-dose CT (LDCT) of the thorax. Concurrent assessment of CAC and lung cancer screening using LDCT is beneficial in terms of cost and radiation dose reduction. The aim of our study was to evaluate the reliability of visual ranking of positive CAC on LDCT compared to Agatston score (AS) on electrocardiogram (ECG)-gated calcium scoring CT. We studied 576 patients who were consecutively registered for health screening and undergoing both LDCT and ECG-gated calcium scoring CT. We excluded subjects with an AS of zero. The final study cohort included 117 patients with CAC (97 men; mean age, 53.4 ± 8.5). AS was used as the gold standard (mean score 166.0; range 0.4-3,719.3). Two board-certified radiologists and two radiology residents participated in an observer performance study. Visual ranking of CAC was performed according to four categories (1-10, 11-100, 101-400, and 401 or higher) for coronary artery disease risk stratification. Weighted kappa statistics were used to measure the degree of reliability on visual ranking of CAC on LDCT. The degree of reliability on visual ranking of CAC on LDCT compared to ECG-gated calcium scoring CT was excellent for board-certified radiologists and good for radiology residents. A high degree of association was observed with 71.6% of visual rankings in the same category as the Agatston category and 98.9% varying by no more than one category. Visual ranking of positive CAC on LDCT is reliable for predicting AS rank categorization.

  15. Using G-Theory to Enhance Evidence of Reliability and Validity for Common Uses of the Paulhus Deception Scales.

    Science.gov (United States)

    Vispoel, Walter P; Morris, Carrie A; Kilinc, Murat

    2018-01-01

    We applied a new approach to Generalizability theory (G-theory) involving parallel splits and repeated measures to evaluate common uses of the Paulhus Deception Scales based on polytomous and four types of dichotomous scoring. G-theory indices of reliability and validity accounting for specific-factor, transient, and random-response measurement error supported use of polytomous over dichotomous scores as contamination checks; as control, explanatory, and outcome variables; as aspects of construct validation; and as indexes of environmental effects on socially desirable responding. Polytomous scoring also provided results for flagging faking as dependable as those when using dichotomous scoring methods. These findings argue strongly against the nearly exclusive use of dichotomous scoring for the Paulhus Deception Scales in practice and underscore the value of G-theory in demonstrating this. We provide guidelines for applying our G-theory techniques to other objectively scored clinical assessments, for using G-theory to estimate how changes to a measure might improve reliability, and for obtaining software to conduct G-theory analyses free of charge.

  16. MRI-based radiologic scoring system for extent of brain injury in children with hemiplegia.

    Science.gov (United States)

    Shiran, S I; Weinstein, M; Sirota-Cohen, C; Myers, V; Ben Bashat, D; Fattal-Valevski, A; Green, D; Schertz, M

    2014-12-01

    Brain MR imaging is recommended in children with cerebral palsy. Descriptions of MR imaging findings lack uniformity, due to the absence of a validated quantitative approach. We developed a quantitative scoring method for brain injury based on anatomic MR imaging and examined the reliability and validity in correlation to motor function in children with hemiplegia. Twenty-seven children with hemiplegia underwent MR imaging (T1, T2-weighted sequences, DTI) and motor assessment (Manual Ability Classification System, Gross Motor Functional Classification System, Assisting Hand Assessment, Jebsen Taylor Test of Hand Function, and Children's Hand Experience Questionnaire). A scoring system devised in our center was applied to all scans. Radiologic score covered 4 domains: number of affected lobes, volume and type of white matter injury, extent of gray matter damage, and major white matter tract injury. Inter- and intrarater reliability was evaluated and the relationship between radiologic score and motor assessments determined. Mean total radiologic score was 11.3 ± 4.5 (range 4-18). Good inter- (ρ = 0.909, P classification systems (ρ = 0.708, P high inter- and intrarater reliability and significant associations with manual ability classification systems and motor evaluations. This score provides a standardized radiologic assessment of brain injury extent in hemiplegic patients with predominantly unilateral injury, allowing comparison between groups, and providing an additional tool for counseling families. © 2014 by American Journal of Neuroradiology.

  17. Establishing Reliable Cognitive Change in Children with Epilepsy: The Procedures and Results for a Sample with Epilepsy

    Science.gov (United States)

    van Iterson, Loretta; Augustijn, Paul B.; de Jong, Peter F.; van der Leij, Aryan

    2013-01-01

    The goal of this study was to investigate reliable cognitive change in epilepsy by developing computational procedures to determine reliable change index scores (RCIs) for the Dutch Wechsler Intelligence Scales for Children. First, RCIs were calculated based on stability coefficients from a reference sample. Then, these RCIs were applied to a…

  18. Divorce and Child Behavior Problems: Applying Latent Change Score Models to Life Event Data

    Science.gov (United States)

    Malone, Patrick S.; Lansford, Jennifer E.; Castellino, Domini R.; Berlin, Lisa J.; Dodge, Kenneth A.; Bates, John E.; Pettit, Gregory S.

    2004-01-01

    Effects of parents' divorce on children's adjustment have been studied extensively. This article applies new advances in trajectory modeling to the problem of disentangling the effects of divorce on children's adjustment from related factors such as the child's age at the time of divorce and the child's gender. Latent change score models were used…

  19. Improving the validity of quantitative measures in applied linguistics research

    NARCIS (Netherlands)

    Purpura, J.E.; Brown, J.D.; Schoonen, R.

    2015-01-01

    In empirical applied linguistics research it is essential that the key variables are operationalized in a valid and reliable way, and that the scores are treated appropriately, allowing for a proper testing of the hypotheses under investigation. The current article addresses several theoretical and

  20. The Americleft Speech Project: A Training and Reliability Study.

    Science.gov (United States)

    Chapman, Kathy L; Baylis, Adriane; Trost-Cardamone, Judith; Cordero, Kelly Nett; Dixon, Angela; Dobbelsteyn, Cindy; Thurmes, Anna; Wilson, Kristina; Harding-Bell, Anne; Sweeney, Triona; Stoddard, Gregory; Sell, Debbie

    2016-01-01

    To describe the results of two reliability studies and to assess the effect of training on interrater reliability scores. The first study (1) examined interrater and intrarater reliability scores (weighted and unweighted kappas) and (2) compared interrater reliability scores before and after training on the use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A) with British English-speaking children. The second study examined interrater and intrarater reliability on a modified version of the CAPS-A (CAPS-A Americleft Modification) with American and Canadian English-speaking children. Finally, comparisons were made between the interrater and intrarater reliability scores obtained for Study 1 and Study 2. The participants were speech-language pathologists from the Americleft Speech Project. In Study 1, interrater reliability scores improved for 6 of the 13 parameters following training on the CAPS-A protocol. Comparison of the reliability results for the two studies indicated lower scores for Study 2 compared with Study 1. However, this appeared to be an artifact of the kappa statistic that occurred due to insufficient variability in the reliability samples for Study 2. When percent agreement scores were also calculated, the ratings appeared similar across Study 1 and Study 2. The findings of this study suggested that improvements in interrater reliability could be obtained following a program of systematic training. However, improvements were not uniform across all parameters. Acceptable levels of reliability were achieved for those parameters most important for evaluation of velopharyngeal function.

  1. Modified personal interviews: resurrecting reliable personal interviews for admissions?

    Science.gov (United States)

    Hanson, Mark D; Kulasegaram, Kulamakan Mahan; Woods, Nicole N; Fechtig, Lindsey; Anderson, Geoff

    2012-10-01

    Traditional admissions personal interviews provide flexible faculty-student interactions but are plagued by low inter-interview reliability. Axelson and Kreiter (2009) retrospectively showed that multiple independent sampling (MIS) may improve reliability of personal interviews; thus, the authors incorporated MIS into the admissions process for medical students applying to the University of Toronto's Leadership Education and Development Program (LEAD). They examined the reliability and resource demands of this modified personal interview (MPI) format. In 2010-2011, LEAD candidates submitted written applications, which were used to screen for participation in the MPI process. Selected candidates completed four brief (10-12 minutes) independent MPIs each with a different interviewer. The authors blueprinted MPI questions to (i.e., aligned them with) leadership attributes, and interviewers assessed candidates' eligibility on a five-point Likert-type scale. The authors analyzed inter-interview reliability using the generalizability theory. Sixteen candidates submitted applications; 10 proceeded to the MPI stage. Reliability of the written application components was 0.75. The MPI process had overall inter-interview reliability of 0.79. Correlation between the written application and MPI scores was 0.49. A decision study showed acceptable reliability of 0.74 with only three MPIs scored using one global rating. Furthermore, a traditional admissions interview format would take 66% more time than the MPI format. The MPI format, used during the LEAD admissions process, achieved high reliability with minimal faculty resources. The MPI format's reliability and effective resource use were possible through MIS and employment of expert interviewers. MPIs may be useful for other admissions tasks.

  2. A flexible latent class approach to estimating test-score reliability

    NARCIS (Netherlands)

    van der Palm, D.W.; van der Ark, L.A.; Sijtsma, K.

    2014-01-01

    The latent class reliability coefficient (LCRC) is improved by using the divisive latent class model instead of the unrestricted latent class model. This results in the divisive latent class reliability coefficient (DLCRC), which unlike LCRC avoids making subjective decisions about the best solution

  3. Assessment of the reliability and consistency of the "malnutrition inflammation score" (MIS) in Mexican adults with chronic kidney disease for diagnosis of protein-energy wasting syndrome (PEW).

    Science.gov (United States)

    González-Ortiz, Ailema Janeth; Arce-Santander, Celene Viridiana; Vega-Vega, Olynka; Correa-Rotter, Ricardo; Espinosa-Cuevas, María de Los Angeles

    2014-10-04

    The protein-energy wasting syndrome (PEW) is a condition of malnutrition, inflammation, anorexia and wasting of body reserves resulting from inflammatory and non-inflammatory conditions in patients with chronic kidney disease (CKD).One way of assessing PEW, extensively described in the literature, is using the Malnutrition Inflammation Score (MIS). To assess the reliability and consistency of MIS for diagnosis of PEW in Mexican adults with CKD on hemodialysis (HD). Study of diagnostic tests. A sample of 45 adults with CKD on HD were analyzed during the period June-July 2014.The instrument was applied on 2 occasions; the test-retest reliability was calculated using the Intraclass Correlation Coefficient (ICC); the internal consistency of the questionnaire was analyzed using Cronbach's αcoefficient. A weighted Kappa test was used to estimate the validity of the instrument; the result was subsequently compared with the Bilbrey nutritional index (BNI). The reliability of the questionnaires, evaluated in the patient sample, was ICC=0.829.The agreement between MIS observations was considered adequate, k= 0.585 (p <0.001); when comparing it with BNI, a value of k = 0.114 was obtained (p <0.001).In order to estimate the tendency, a correlation test was performed. The r² correlation coefficient was 0.488 (P <0.001). MIS has adequate reliability and validity for diagnosing PEW in the population with chronic kidney disease on HD. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.

  4. Translation, adaptation and inter-rater reliability of the administration manual for the Fugl-Meyer assessment.

    Science.gov (United States)

    Michaelsen, Stella M; Rocha, André S; Knabben, Rodrigo J; Rodrigues, Luciano P; Fernandes, Claudia G C

    2011-01-01

    Recently, the reliability of the Brazilian version of the Fugl-Meyer Assessment (FMA) was assessed through the scoring given according to observations made by a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. Eighteen adults (59±10 years) with chronic hemiparesis (38±35 months after a stroke) took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98) and lower limbs (ICC=0.90), as well as for movement sense (ICC=0.98) and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively). The reliability was moderate for tactile sensitivity (0.75). The joint pain assessment presented low reliability. The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.

  5. Coronary calcium screening with dual-source CT: reliability of ungated, high-pitch chest CT in comparison with dedicated calcium-scoring CT

    Energy Technology Data Exchange (ETDEWEB)

    Hutt, Antoine; Faivre, Jean-Baptiste; Remy, Jacques; Remy-Jardin, Martine [CHRU et Universite de Lille, Department of Thoracic Imaging, Hospital Calmette (EA 2694), Lille (France); Duhamel, Alain; Deken, Valerie [CHRU et Universite de Lille, Department of Biostatistics (EA 2694), Lille (France); Molinari, Francesco [Centre Hospitalier General de Tourcoing, Department of Radiology, Tourcoing (France)

    2016-06-15

    To investigate the reliability of ungated, high-pitch dual-source CT for coronary artery calcium (CAC) screening. One hundred and eighty-five smokers underwent a dual-source CT examination with acquisition of two sets of images during the same session: (a) ungated, high-pitch and high-temporal resolution acquisition over the entire thorax (i.e., chest CT); (b) prospectively ECG-triggered acquisition over the cardiac cavities (i.e., cardiac CT). Sensitivity and specificity of chest CT for detecting positive CAC scores were 96.4 % and 100 %, respectively. There was excellent inter-technique agreement for determining the quantitative CAC score (ICC = 0.986). The mean difference between the two techniques was 11.27, representing 1.81 % of the average of the two techniques. The inter-technique agreement for categorizing patients into the four ranks of severity was excellent (weighted kappa = 0.95; 95 % CI 0.93-0.98). The inter-technique differences for quantitative CAC scores did not correlate with BMI (r = 0.05, p = 0.575) or heart rate (r = -0.06, p = 0.95); 87.2 % of them were explained by differences at the level of the right coronary artery (RCA: 0.8718; LAD: 0.1008; LCx: 0.0139; LM: 0.0136). Ungated, high-pitch dual-source CT is a reliable imaging mode for CAC screening in the conditions of routine chest CT examinations. (orig.)

  6. Improving the Validity of Quantitative Measures in Applied Linguistics Research

    Science.gov (United States)

    Purpura, James E.; Brown, James Dean; Schoonen, Rob

    2015-01-01

    In empirical applied linguistics research it is essential that the key variables are operationalized in a valid and reliable way, and that the scores are treated appropriately, allowing for a proper testing of the hypotheses under investigation. The current article addresses several theoretical and practical issues regarding the use of measurement…

  7. Construct validity and reliability of the Finnish version of the Knee Injury and Osteoarthritis Outcome Score.

    Science.gov (United States)

    Multanen, Juhani; Honkanen, Mikko; Häkkinen, Arja; Kiviranta, Ilkka

    2018-05-22

    The Knee Injury and Osteoarthritis Outcome Score (KOOS) is a commonly used knee assessment and outcome tool in both clinical work and research. However, it has not been formally translated and validated in Finnish. The purpose of this study was to translate and culturally adapt the KOOS questionnaire into Finnish and to determine its validity and reliability among Finnish middle-aged patients with knee injuries. KOOS was translated and culturally adapted from English into Finnish. Subsequently, 59 patients with knee injuries completed the Finnish version of KOOS, Western Ontario and McMaster Osteoarthritis Index (WOMAC), Short-Form 36 Health Survey (SF-36) and Numeric Pain Rating Scale (Pain-NRS). The same KOOS questionnaire was re-administered 2 weeks later. Psychometric assessment of the Finnish KOOS was performed by testing its construct validity and reliability by using internal consistency, test-retest reliability and measurement error. The floor and ceiling effects were also examined. The cross-cultural adaptation revealed only minor cultural differences and was well received by the patients. For construct validity, high to moderate Spearman's Correlation Coefficients were found between the KOOS subscales and the WOMAC, SF-36, and Pain-NRS subscales. The Cronbach's alpha was from 0.79 to 0.96 for all subscales indicating acceptable internal consistency. The test-retest reliability was good to excellent, with Intraclass Correlation Coefficients ranging from 0.73 to 0.86 for all KOOS subscales. The minimal detectable change ranged from 17 to 34 on an individual level and from 2 to 4 on a group level. No floor or ceiling effects were observed. This study yielded an appropriately translated and culturally adapted Finnish version of KOOS which demonstrated good validity and reliability. Our data indicate that the Finnish version of KOOS is suitable for assessment of the knee status of Finnish patients with different knee complaints. Further studies are needed to

  8. Validity and reliability of the Dutch version of the Copenhagen Hip And Groin Outcome Score (HAGOS-NL in patients with hip pathology.

    Directory of Open Access Journals (Sweden)

    Hilde Giezen

    Full Text Available The Copenhagen Hip And Groin Outcome Score (HAGOS was developed to assess disease-specific consequences in young to middle-aged, physically active hip and/or groin patients. The study aimed to determine validity and reliability of the Dutch version of the HAGOS (HAGOS-NL for middle-aged patients with hip complaints.To assess validity, 117 participants completed five questionnaires: HAGOS-NL, international Hip Outcome Tool (iHOT-12NL, Hip disability and Osteoarthritis Outcome Score (HOOS, RAND-36 Health Survey and Tegner activity scale. Structural validity was determined by conducting confirmatory factor analysis. Construct validity was analyzed by formulating predefined hypotheses regarding relationships between the HAGOS-NL and subscales of the iHOT-12NL, HOOS, RAND-36 and Tegner activity scale. The HAGOS-NL was filled out again by 67 patients to explore test-retest reliability. Reliability was assessed in terms of Cronbach's alpha, Intraclass Correlation Coefficient (ICC, Standard Error of Measurement (SEM and Minimal Detectable Change (MDC. The Bland and Altman method was used to explore absolute agreement.Factor analysis confirmed that the HAGOS-NL consists of six subscales. All hypotheses were confirmed, indicating good construct validity. Internal consistency was good, with Cronbach's alpha values ranging from 0.89 to 0.98. Test-retest reliability was considered good, with ICC values of 0.80 and higher. The SEM ranged from 6.6 to 12.3, and MDC at individual level from 18.3 to 34.1 and at group level from 2.3 to 4.4. Bland and Altman analyses showed no bias.The HAGOS-NL is a reliable and valid instrument for measuring pain, physical functioning and quality of life in middle-aged patients with hip complaints.

  9. Validation and Reliability of a Smartphone Application for the International Prostate Symptom Score Questionnaire: A Randomized Repeated Measures Crossover Study

    Science.gov (United States)

    Shim, Sung Ryul; Sun, Hwa Yeon; Ko, Young Myoung; Chun, Dong-Il; Yang, Won Jae

    2014-01-01

    Background Smartphone-based assessment may be a useful diagnostic and monitoring tool for patients. There have been many attempts to create a smartphone diagnostic tool for clinical use in various medical fields but few have demonstrated scientific validity. Objective The purpose of this study was to develop a smartphone application of the International Prostate Symptom Score (IPSS) and to demonstrate its validity and reliability. Methods From June 2012 to May 2013, a total of 1581 male participants (≥40 years old), with or without lower urinary tract symptoms (LUTS), visited our urology clinic via the health improvement center at Soonchunhyang University Hospital (Republic of Korea) and were enrolled in this study. A randomized repeated measures crossover design was employed using a smartphone application of the IPSS and the conventional paper form of the IPSS. Paired t test under a hypothesis of non-inferior trial was conducted. For the reliability test, the intraclass correlation coefficient (ICC) was measured. Results The total score of the IPSS (P=.289) and each item of the IPSS (P=.157-1.000) showed no differences between the paper version and the smartphone version of the IPSS. The mild, moderate, and severe LUTS groups showed no differences between the two versions of the IPSS. A significant correlation was noted in the total group (ICC=.935, Psmartphones could participate. Conclusions The validity and reliability of the smartphone application version were comparable to the conventional paper version of the IPSS. The smartphone application of the IPSS could be an effective method for measuring lower urinary tract symptoms. PMID:24513507

  10. Rhythm and Melody Tasks for School-Aged Children With and Without Musical Training: Age-Equivalent Scores and Reliability

    Directory of Open Access Journals (Sweden)

    Kierla Ireland

    2018-04-01

    Full Text Available Measuring musical abilities in childhood can be challenging. When music training and maturation occur simultaneously, it is difficult to separate the effects of specific experience from age-based changes in cognitive and motor abilities. The goal of this study was to develop age-equivalent scores for two measures of musical ability that could be reliably used with school-aged children (7–13 with and without musical training. The children's Rhythm Synchronization Task (c-RST and the children's Melody Discrimination Task (c-MDT were adapted from adult tasks developed and used in our laboratories. The c-RST is a motor task in which children listen and then try to synchronize their taps with the notes of a woodblock rhythm while it plays twice in a row. The c-MDT is a perceptual task in which the child listens to two melodies and decides if the second was the same or different. We administered these tasks to 213 children in music camps (musicians, n = 130 and science camps (non-musicians, n = 83. We also measured children's paced tapping, non-paced tapping, and phonemic discrimination as baseline motor and auditory abilities We estimated internal-consistency reliability for both tasks, and compared children's performance to results from studies with adults. As expected, musically trained children outperformed those without music lessons, scores decreased as difficulty increased, and older children performed the best. Using non-musicians as a reference group, we generated a set of age-based z-scores, and used them to predict task performance with additional years of training. Years of lessons significantly predicted performance on both tasks, over and above the effect of age. We also assessed the relation between musician's scores on music tasks, baseline tasks, auditory working memory, and non-verbal reasoning. Unexpectedly, musician children outperformed non-musicians in two of three baseline tasks. The c-RST and c-MDT fill an important need for

  11. Rhythm and Melody Tasks for School-Aged Children With and Without Musical Training: Age-Equivalent Scores and Reliability.

    Science.gov (United States)

    Ireland, Kierla; Parker, Averil; Foster, Nicholas; Penhune, Virginia

    2018-01-01

    Measuring musical abilities in childhood can be challenging. When music training and maturation occur simultaneously, it is difficult to separate the effects of specific experience from age-based changes in cognitive and motor abilities. The goal of this study was to develop age-equivalent scores for two measures of musical ability that could be reliably used with school-aged children (7-13) with and without musical training. The children's Rhythm Synchronization Task (c-RST) and the children's Melody Discrimination Task (c-MDT) were adapted from adult tasks developed and used in our laboratories. The c-RST is a motor task in which children listen and then try to synchronize their taps with the notes of a woodblock rhythm while it plays twice in a row. The c-MDT is a perceptual task in which the child listens to two melodies and decides if the second was the same or different. We administered these tasks to 213 children in music camps (musicians, n = 130) and science camps (non-musicians, n = 83). We also measured children's paced tapping, non-paced tapping, and phonemic discrimination as baseline motor and auditory abilities We estimated internal-consistency reliability for both tasks, and compared children's performance to results from studies with adults. As expected, musically trained children outperformed those without music lessons, scores decreased as difficulty increased, and older children performed the best. Using non-musicians as a reference group, we generated a set of age-based z-scores, and used them to predict task performance with additional years of training. Years of lessons significantly predicted performance on both tasks, over and above the effect of age. We also assessed the relation between musician's scores on music tasks, baseline tasks, auditory working memory, and non-verbal reasoning. Unexpectedly, musician children outperformed non-musicians in two of three baseline tasks. The c-RST and c-MDT fill an important need for researchers

  12. Reliability and validity of the foot and ankle outcome score: a validation study from Iran.

    Science.gov (United States)

    Negahban, Hossein; Mazaheri, Masood; Salavati, Mahyar; Sohani, Soheil Mansour; Askari, Marjan; Fanian, Hossein; Parnianpour, Mohamad

    2010-05-01

    The aims of this study were to culturally adapt and validate the Persian version of Foot and Ankle Outcome Score (FAOS) and present data on its psychometric properties for patients with different foot and ankle problems. The Persian version of FAOS was developed after a standard forward-backward translation and cultural adaptation process. The sample included 93 patients with foot and ankle disorders who were asked to complete two questionnaires: FAOS and Short-Form 36 Health Survey (SF-36). To determine test-retest reliability, 60 randomly chosen patients completed the FAOS again 2 to 6 days after the first administration. Test-retest reliability and internal consistency were assessed using intraclass correlation coefficient (ICC) and Cronbach's alpha, respectively. To evaluate convergent and divergent validity of FAOS compared to similar and dissimilar concepts of SF-36, the Spearman's rank correlation was used. Dimensionality was determined by assessing item-subscale correlation corrected for overlap. The results of test-retest reliability show that all the FAOS subscales have a very high ICC, ranging from 0.92 to 0.96. The minimum Cronbach's alpha level of 0.70 was exceeded by most subscales. The Spearman's correlation coefficient for convergent construct validity fell within 0.32 to 0.58 for the main hypotheses presented a priori between FAOS and SF-36 subscales. For dimensionality, the minimum Spearman's correlation coefficient of 0.40 was exceeded by most items. In conclusion, the results of our study show that the Persian version of FAOS seems to be suitable for Iranian patients with various foot and ankle problems especially lateral ankle sprain. Future studies are needed to establish stronger psychometric properties for patients with different foot and ankle problems.

  13. Applying reliability centered maintenance analysis principles to inservice testing

    International Nuclear Information System (INIS)

    Flude, J.W.

    1994-01-01

    Federal regulations require nuclear power plants to use inservice test (IST) programs to ensure the operability of safety-related equipment. IST programs are based on American Society of Mechanical Engineers (ASME) Boiler and Pressure Vessel Code requirements. Many of these plants also use Reliability Centered Maintenance (RCM) to optimize system maintenance. ASME Code requirements are hard to change. The process for requesting authority to use an alternate strategy is long and expensive. The difficulties of obtaining this authority make the use of RCM method on safety-related systems not cost effective. An ASME research task force on Risk Based Inservice Testing is investigating changing the Code. The change will allow plants to apply RCM methods to the problem of maintenance strategy selection for safety-related systems. The research task force is working closely with the Codes and Standards sections to develop a process related to the RCM process. Some day plants will be able to use this process to develop more efficient and safer maintenance strategies

  14. Reliability of sonographic assessment of tendinopathy in tennis elbow.

    Science.gov (United States)

    Poltawski, Leon; Ali, Syed; Jayaram, Vijay; Watson, Tim

    2012-01-01

    To assess the reliability and compute the minimum detectable change using sonographic scales to quantify the extent of pathology and hyperaemia in the common extensor tendon in people with tennis elbow. The lateral elbows of 19 people with tennis elbow were assessed sonographically twice, 1-2 weeks apart. Greyscale and power Doppler images were recorded for subsequent rating of abnormalities. Tendon thickening, hypoechogenicity, fibrillar disruption and calcification were each rated on four-point scales, and scores were summed to provide an overall rating of structural abnormality; hyperaemia was scored on a five point scale. Inter-rater reliability was established using the intraclass correlation coefficient (ICC) to compare scores assigned independently to the same set of images by a radiologist and a physiotherapist with training in musculoskeletal imaging. Test-retest reliability was assessed by comparing scores assigned by the physiotherapist to images recorded at the two sessions. The minimum detectable change (MDC) was calculated from the test-retest reliability data. ICC values for inter-rater reliability ranged from 0.35 (95% CI: 0.05, 0.60) for fibrillar disruption to 0.77 (0.55, 0.88) for overall greyscale score, and 0.89 (0.79, 0.95) for hyperaemia. Test-retest reliability ranged from 0.70 (0.48, 0.84) for tendon thickening to 0.82 (0.66, 0.90) for overall greyscale score and 0.86 (0.73, 0.93) for calcification. The MDC for the greyscale total score was 2.0/12 and for the hyperaemia score was 1.1/5. The sonographic scoring system used in this study may be used reliably to quantify tendon abnormalities and change over time. A relatively inexperienced imager can conduct the assessment and use the rating scales reliably.

  15. Automatic Sleep Scoring in Normals and in Individuals with Neurodegenerative Disorders According to New International Sleep Scoring Criteria

    DEFF Research Database (Denmark)

    Jensen, Peter S.; Sørensen, Helge Bjarup Dissing; Leonthin, Helle

    2010-01-01

    The aim of this study was to develop a fully automatic sleep scoring algorithm on the basis of a reproduction of new international sleep scoring criteria from the American Academy of Sleep Medicine. A biomedical signal processing algorithm was developed, allowing for automatic sleep depth....... Based on an observed reliability of the manual scorer of 92.5% (Cohen's Kappa: 0.87) in the normal group and 85.3% (Cohen's Kappa: 0.73) in the abnormal group, this study concluded that although the developed algorithm was capable of scoring normal sleep with an accuracy around the manual interscorer...... reliability, it failed in accurately scoring abnormal sleep as encountered for the Parkinson disease/multiple system atrophy patients....

  16. Automatic sleep scoring in normals and in individuals with neurodegenerative disorders according to new international sleep scoring criteria

    DEFF Research Database (Denmark)

    Jensen, Peter S; Sorensen, Helge B D; Jennum, Poul

    2010-01-01

    The aim of this study was to develop a fully automatic sleep scoring algorithm on the basis of a reproduction of new international sleep scoring criteria from the American Academy of Sleep Medicine. A biomedical signal processing algorithm was developed, allowing for automatic sleep depth....... Based on an observed reliability of the manual scorer of 92.5% (Cohen's Kappa: 0.87) in the normal group and 85.3% (Cohen's Kappa: 0.73) in the abnormal group, this study concluded that although the developed algorithm was capable of scoring normal sleep with an accuracy around the manual interscorer...... reliability, it failed in accurately scoring abnormal sleep as encountered for the Parkinson disease/multiple system atrophy patients....

  17. A method to evaluate performance reliability of individual subjects in laboratory research applied to work settings.

    Science.gov (United States)

    1978-10-01

    This report presents a method that may be used to evaluate the reliability of performance of individual subjects, particularly in applied laboratory research. The method is based on analysis of variance of a tasks-by-subjects data matrix, with all sc...

  18. The validity, reliability and normative scores of the parent, teacher and self report versions of the Strengths and Difficulties Questionnaire in China

    Directory of Open Access Journals (Sweden)

    Coghill David

    2008-04-01

    Full Text Available Abstract Background The Strengths and Difficulties Questionnaire (SDQ has become one of the most widely used measurement tools in child and adolescent mental health work across the globe. The SDQ was originally developed and validated within the UK and whilst its reliability and validity have been replicated in several countries important cross cultural issues have been raised. We describe normative data, reliability and validity of the Chinese translation of the SDQ (parent, teacher and self report versions in a large group of children from Shanghai. Methods The SDQ was administered to the parents and teachers of students from 12 of Shanghai's 19 districts, aged between 3 and 17 years old, and to those young people aged between 11 and 17 years. Retest data was collected from parents and teachers for 45 students six weeks later. Data was analysed to describe normative scores, bandings and cut-offs for normal, borderline and abnormal scores. Reliability was assessed from analyses of internal consistency, inter-rater agreement, and temporal stability. Structural validity, convergent and discriminant validity were assessed. Results Full parent and teacher data was available for 1965 subjects and self report data for 690 subjects. Normative data for this Chinese urban population with bandings and cut-offs for borderline and abnormal scores are described. Principle components analysis indicates partial agreement with the original five factored subscale structure however this appears to hold more strongly for the Prosocial Behaviour, Hyperactivity – Inattention and Emotional Symptoms subscales than for Conduct Problems and Peer Problems. Internal consistency as measured by Cronbach's α coefficient were generally low ranging between 0.30 and 0.83 with only parent and teacher Hyperactivity – Inattention and teacher Prosocial Behaviour subscales having α > 0.7. Inter-rater correlations were similar to those reported previously (range 0.23 – 0

  19. Reliability Generalization of the Alcohol Use Disorder Identification Test.

    Science.gov (United States)

    Shields, Alan L.; Caruso, John C.

    2002-01-01

    Evaluated the reliability of scores from the Alcohol Use Disorders Identification Test (AUDIT; J. Sounders and others, 1993) in a reliability generalization study based on 17 empirical journal articles. Results show AUDIT scores to be generally reliable for basic assessment. (SLD)

  20. Validity and Reliability Study of Bahasa Malaysia Version of Voice Handicap Index-10.

    Science.gov (United States)

    Ong, Fei Ming; Husna Nik Hassan, Nik Fariza; Azman, Mawaddah; Sani, Abdullah; Mat Baki, Marina

    2018-05-21

    This study aimed to determine the validity and reliability of Bahasa Malaysia version of Voice Handicap Index-10 (mVHI-10). This cross-sectional study was carried out in the Otorhinolaryngology, Head and Neck Surgery Department of Universiti Kebangsaan Malaysia Medical Centre (UKMMC) from June 2015 to May 2016. The mVHI-10 was produced following a rigorous forward and backward translation. One hundred participants, including 50 healthy volunteers (17 male, 33 female) and 50 patients with voice disorders (26 male, 24 female), were recruited to complete the mVHI-10 before flexible laryngoscopic examinations and acoustic analysis. The mVHI-10 was repeated in 2 weeks via telephone interview or clinic visit. Its reliability and validity were assessed using interclass correlation. The test-retest reliability for total mVHI-10 and each item score was high, with the Cronbach alpha of >0.90. The total mVHI-10 score and domain scores were significantly higher (P Kaiser-Meyer-Olkin measure was 0.92, which depicted excellent construct validity. There was a significant positive correlation between the mVHI-10 score and jitter and shimmer result (P < 0.001). The present study showed good reliability and validity of the mVHI-10 when applied to both healthy volunteers and patients with voice disorders. We recommend the use of the mVHI-10 in daily clinical practice among Bahasa Malaysia-speaking population. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  1. Translation and Adaptation of Knee Injury and Osteoarthritis Outcome Score (KOOS in to Persian and Testing Persian Version Reliability Among Iranians with Osteoarthritis

    Directory of Open Access Journals (Sweden)

    Solaleh Saraei-Pour

    2007-04-01

    Full Text Available Objective: To achieve a reliable tool for measuring health related quality of life among Iranians with knee osteoarthritis, by translating and culturally adapting the Knee injury and Osteoarthritis Outcome Score(KOOS to Persian and testing the reliability and internal consistency of the Iranian version. Materials & Methods: It was a non experimental methodology study. KOOS was translated and adapted culturally to Persian language and culture in three phases with respect to IQOLA project. For examining test-retest reliability Iranians version of KOOS was corresponded twice with in at least two days or at most one week interval, by 30 Iranian people with knee OA whom were referred to Municipality and 110 physiotherapy clinics of Tehran with PT order by physicians. It was a non experimental methodological research and we used sample of convenience and non probability design for sampling. Psychometric evaluation: the collected data from the questionnaires was rated and analyzed with SPSS software from the aspects of test-retest reliability, absolute reliability, subscale and item internal consistency. Results: Internal consistency which was calculated by Cronbach '&alpha was high for all the subscales (at least 0.76, except for "symptom" subscale which was moderate, and showed that items of each subscale measured the same construct. Item internal consistency after correction for overlap, was higher than optimal value (0.4, except for the items of" symptom" subscale , which demonstrated good item internal consistency. SEM and ICC which were used for evaluating the absolute and test-retest reliability in respect showed that all the subscales had good test-retest reliability (0.7 and the absolute reliability was also very good in such away that the highest calculated SEM for Persian version was 7.44 which was less than Minimal Perceptible Clinical Improvement (MPCI that is estimated 8 to 10 for the KOOS questionnaire. Conclusion: With the Persian

  2. The Reliability of Clock Drawing Test Scoring Systems Modeled on the Normative Data in Healthy Aging and Nonamnestic Mild Cognitive Impairment.

    Science.gov (United States)

    Mazancova, Adela Fendrych; Nikolai, Tomas; Stepankova, Hana; Kopecek, Miloslav; Bezdicek, Ondrej

    2017-10-01

    The Clock Drawing Test (CDT) is a commonly used tool in clinical practice and research for cognitive screening among older adults. The main goal of the present study was to analyze the interrater reliability of three different CDT scoring systems (by Shulman et al., Babins et al., and Cohen et al.). We used a clock with a predrawn circle. The CDT was evaluated by three independent raters based on the normative data set of healthy older and very old adults and patients with nonamnestic mild cognitive impairment (naMCI; N = 438; aged 61-94). We confirmed a high interrater reliability measured by the intraclass correlation coefficients (ICCs): Shulman ICC = .809, Babins ICC = .894, and Cohen ICC = .862, all p < .001. We found that age and education levels have a significant effect on CDT performance, yet there was no influence of gender. Finally, the scoring systems differentiated between naMCI and age- and education-matched controls: Shulman's area under the receiver operating characteristic curve (AUC) = .84, Cohen AUC = .71, all p < .001; and a slightly lower discriminative ability was shown by Babins: AUC = .65, p = .012.

  3. Automatic sleep scoring in normals and in individuals with neurodegenerative disorders according to new international sleep scoring criteria

    DEFF Research Database (Denmark)

    Jensen, Peter S.; Sørensen, Helge Bjarup Dissing; Jennum, P. J.

    2010-01-01

    Medicine (AASM). Methods: A biomedical signal processing algorithm was developed, allowing for automatic sleep depth quantification of routine polysomnographic (PSG) recordings through feature extraction, supervised probabilistic Bayesian classification, and heuristic rule-based smoothing. The performance......Introduction: Reliable polysomnographic classification is the basis for evaluation of sleep disorders in neurological diseases. Aim: To develop a fully automatic sleep scoring algorithm on the basis of a reproduction of new international sleep scoring criteria from the American Academy of Sleep....... Conclusion: The developed algorithm was capable of scoring normal sleep with an accuracy around the manual inter-scorer reliability, it failed in accurately scoring abnormal sleep as encountered for the PD/MSA patients, which is due to the abnormal micro- and macrostructure pattern in these patients....

  4. Interobserver variability of the neurological optimality score

    NARCIS (Netherlands)

    Monincx, W. M.; Smolders-de Haas, H.; Bonsel, G. J.; Zondervan, H. A.

    1999-01-01

    To assess the interobserver reliability of the neurological optimality score. The neurological optimality score of 21 full term healthy, neurologically normal newborn infants was determined by two well trained observers. The interclass correlation coefficient was 0.31. Kappa for optimality (score of

  5. Basic Concepts in Classical Test Theory: Tests Aren't Reliable, the Nature of Alpha, and Reliability Generalization as a Meta-analytic Method.

    Science.gov (United States)

    Helms, LuAnn Sherbeck

    This paper discusses the fact that reliability is about scores and not tests and how reliability limits effect sizes. The paper also explores the classical reliability coefficients of stability, equivalence, and internal consistency. Stability is concerned with how stable test scores will be over time, while equivalence addresses the relationship…

  6. Conditional Standard Errors of Measurement for Scale Scores.

    Science.gov (United States)

    Kolen, Michael J.; And Others

    1992-01-01

    A procedure is described for estimating the reliability and conditional standard errors of measurement of scale scores incorporating the discrete transformation of raw scores to scale scores. The method is illustrated using a strong true score model, and practical applications are described. (SLD)

  7. Gestational Weight Gain-for-Gestational Age Z-Score Charts Applied across U.S. Populations.

    Science.gov (United States)

    Leonard, Stephanie A; Hutcheon, Jennifer A; Bodnar, Lisa M; Petito, Lucia C; Abrams, Barbara

    2018-03-01

    Gestational weight gain may be a modifiable contributor to infant health outcomes, but the effect of gestational duration on gestational weight gain has limited the identification of optimal weight gain ranges. Recently developed z-score and percentile charts can be used to classify gestational weight gain independent of gestational duration. However, racial/ethnic variation in gestational weight gain and the possibility that optimal weight gain differs among racial/ethnic groups could affect generalizability of the z-score charts. The objectives of this study were (1) to apply the weight gain z-score charts in two different U.S. populations as an assessment of generalisability and (2) to determine whether race/ethnicity modifies the weight gain range associated with minimal risk of preterm birth. The study sample included over 4 million live, singleton births in California (2007-2012) and Pennsylvania (2003-2013). We implemented a noninferiority margin approach in stratified subgroups to determine weight gain ranges for which the adjusted predicted marginal risk of preterm birth (gestation gain between California and Pennsylvania births, and among several racial/ethnic groups in California. The optimal ranges decreased as severity of prepregnancy obesity increased in all groups. The findings support the use of weight gain z-score charts for studying gestational age-dependent outcomes in diverse U.S. populations and do not support weight gain recommendations tailored to race/ethnicity. © 2017 John Wiley & Sons Ltd.

  8. Assessment of reliability, validity, responsiveness and minimally important change of the German Hip dysfunction and osteoarthritis outcome score (HOOS) in patients with osteoarthritis of the hip.

    Science.gov (United States)

    Arbab, Dariusch; van Ochten, Johannes H M; Schnurr, Christoph; Bouillon, Bertil; König, Dietmar

    2017-12-01

    Patient-reported outcome measures are a critical tool in evaluating the efficacy of orthopedic procedures. The intention of this study was to evaluate reliability, validity, responsiveness and minimally important change of the German version of the Hip dysfunction and osteoarthritis outcome score (HOOS). The German HOOS was investigated in 251 consecutive patients before and 6 months after total hip arthroplasty. All patients completed HOOS, Oxford-Hip Score, Short-Form (SF-36) and numeric scales for pain and disability. Test-retest reliability, internal consistency, floor and ceiling effects, construct validity and minimal important change were analyzed. The German HOOS demonstrated excellent test-retest reliability with intraclass correlation coefficient values > 0.7. Cronbach´s alpha values demonstrated strong internal consistency. As hypothesized, HOOS subscales strongly correlated with corresponding OHS and SF-36 domains. All subscales showed excellent (effect size/standardized response means > 0.8) responsiveness between preoperative assessment and postoperative follow-up. The HOOS and all subdomains showed higher changes than the minimal detectable change which indicates true changes. The German version of the HOOS demonstrated good psychometric properties. It proved to be valid, reliable and responsive to the changes instrument for use in patients with hip osteoarthritis undergoing total hip replacement.

  9. The possibilities of applying a risk-oriented approach to the NPP reliability and safety enhancement problem

    Science.gov (United States)

    Komarov, Yu. A.

    2014-10-01

    An analysis and some generalizations of approaches to risk assessments are presented. Interconnection between different interpretations of the "risk" notion is shown, and the possibility of applying the fuzzy set theory to risk assessments is demonstrated. A generalized formulation of the risk assessment notion is proposed in applying risk-oriented approaches to the problem of enhancing reliability and safety in nuclear power engineering. The solution of problems using the developed risk-oriented approaches aimed at achieving more reliable and safe operation of NPPs is described. The results of studies aimed at determining the need (advisability) to modernize/replace NPP elements and systems are presented together with the results obtained from elaborating the methodical principles of introducing the repair concept based on the equipment technical state. The possibility of reducing the scope of tests and altering the NPP systems maintenance strategy is substantiated using the risk-oriented approach. A probabilistic model for estimating the validity of boric acid concentration measurements is developed.

  10. Translation, reliability, and clinical utility of the Melbourne Assessment 2.

    Science.gov (United States)

    Gerber, Corinna N; Plebani, Anael; Labruyère, Rob

    2017-10-12

    The aims were to (i) provide a German translation of the Melbourne Assessment 2 (MA2), a quantitative test to measure unilateral upper limb function in children with neurological disabilities and (ii) to evaluate its reliability and aspects of clinical utility. After its translation into German and approval of the back translation by the original authors, the MA2 was performed and videotaped twice with 30 children with neuromotor disorders. For each participant, two raters scored the video of the first test for inter-rater reliability. To determine test-retest reliability, one rater additionally scored the video of the second test while the other rater repeated the scoring of the first video to evaluate intra-rater reliability. Time needed for rater training, test administration, and scoring was recorded. The four subscale scores showed excellent intra-, inter-rater, and test-retest reliability with intraclass correlation coefficients of 0.90-1.00 (95%-confidence intervals 0.78-1.00). Score items revealed substantial to almost perfect intra-rater reliability (weighted kappa k w  = 0.66-1.00) for the more affected side. Score item inter-rater and test-retest reliability of the same extremity were, with one exception, moderate to almost perfect (k w  = 0.42-0.97; k w  = 0.40-0.89). Furthermore, the MA2 was feasible and acceptable for patients and clinicians. The MA2 showed excellent subscale and moderate to almost perfect score item reliability. Implications for Rehabilitation There is a lack of high-quality studies about psychometric properties of upper limb measurement tools in the neuropediatric population. The Melbourne Assessment 2 is a promising tool for reliable measurement of unilateral upper limb movement quality in the neuropediatric population. The Melbourne Assessment 2 is acceptable and practicable to therapists and patients for routine use in clinical care.

  11. Inter-device reliability of an automatic-scoring actigraph for measuring sleep in healthy adults

    Directory of Open Access Journals (Sweden)

    Matthew Driller

    2016-07-01

    Full Text Available Actigraphy has become a common method of measuring sleep due to its non-invasive, cost-effective nature. An actigraph (Readiband™ that utilizes automatic scoring algorithms has been used in the research, but is yet to be evaluated for its inter-device reliability. A total of 77 nights of sleep data from 11 healthy adult participants was collected while participants were concomitantly wearing two Readiband™ actigraphs attached together (ACT1 and ACT2. Sleep indices including total sleep time (TST, sleep latency (SL, sleep efficiency (SE%, wake after sleep onset (WASO, total time in bed (TTB, wake episodes per night (WE, sleep onset variance (SOV and wake variance (WV were assessed between the two devices using mean differences, 95% levels of agreement, intraclass correlation coefficients (ICC, typical error of measurement (TEM and coefficient of variation (CV% analysis. There were no significant differences between devices for any of the measured sleep variables (p>0.05. TST, SE, SL, TTB, SOV and WV all resulted in very high ICC's (>0.90, with WASO and WE resulting in high ICC's between devices (0.85 and 0.80, respectively. Mean differences of −2.1 and 0.2 min for TST and SL were associated with a low TEM between devices (9.5 and 3.8 min, respectively. SE resulted in a 0.3% mean difference between devices. The Readiband™ is a reliable tool for researchers using multiple devices of this brand in sleep studies to assess basic measures of sleep quality and quantity in healthy adult populations.

  12. External Validation and Evaluation of Reliability and Validity of the Modified Seoul National University Renal Stone Complexity Scoring System to Predict Stone-Free Status After Retrograde Intrarenal Surgery.

    Science.gov (United States)

    Park, Juhyun; Kang, Minyong; Jeong, Chang Wook; Oh, Sohee; Lee, Jeong Woo; Lee, Seung Bae; Son, Hwancheol; Jeong, Hyeon; Cho, Sung Yong

    2015-08-01

    The modified Seoul National University Renal Stone Complexity scoring system (S-ReSC-R) for retrograde intrarenal surgery (RIRS) was developed as a tool to predict stone-free rate (SFR) after RIRS. We externally validated the S-ReSC-R. We retrospectively reviewed 159 patients who underwent RIRS. The S-ReSC-R was assigned from 1 to 12 according to the location and number of sites involved. The stone-free status was defined as no evidence of a stone or with clinically insignificant residual fragment stones less than 2 mm. Interobserver and test-retest reliabilities were evaluated. Statistical performance of the prediction model was assessed by its predictive accuracy, predictive probability, and clinical usefulness. Overall SFR was 73.0%. The SFRs were 86.7%, 70.2%, and 48.6% in low-score (1-2), intermediate-score (3-4), and high-score (5-12) groups, respectively (pR revealed an area under the curve (AUC) of 0.731 (95% CI 0.650-0.813). The AUC of the three-titered S-ReSC-R was 0.701 (95% CI 0.609-0.794). The calibration plot showed that the predicted probability of SFR had a concordance comparable to that of observed frequency. The Hosmer-Lemeshow goodness of fit test revealed a p-value of 0.01 for the S-ReSC-R and 0.90 for the three-titered S-ReSC-R. Interobserver and test-retest reliabilities revealed an almost perfect level of agreement. The present study proved the predictive value of S-ReSC-R to predict SFR following RIRS in an independent cohort. Interobserver and test-retest reliabilities confirmed that S-ReSC-R was reliable and valid.

  13. Sway Area and Velocity Correlated With MobileMat Balance Error Scoring System (BESS) Scores.

    Science.gov (United States)

    Caccese, Jaclyn B; Buckley, Thomas A; Kaminski, Thomas W

    2016-08-01

    The Balance Error Scoring System (BESS) is often used for sport-related concussion balance assessment. However, moderate intratester and intertester reliability may cause low initial sensitivity, suggesting that a more objective balance assessment method is needed. The MobileMat BESS was designed for objective BESS scoring, but the outcome measures must be validated with reliable balance measures. Thus, the purpose of this investigation was to compare MobileMat BESS scores to linear and nonlinear measures of balance. Eighty-eight healthy collegiate student-athletes (age: 20.0 ± 1.4 y, height: 177.7 ± 10.7 cm, mass: 74.8 ± 13.7 kg) completed the MobileMat BESS. MobileMat BESS scores were compared with 95% area, sway velocity, approximate entropy, and sample entropy. MobileMat BESS scores were significantly correlated with 95% area for single-leg (r = .332) and tandem firm (r = .474), and double-leg foam (r = .660); and with sway velocity for single-leg (r = .406) and tandem firm (r = .601), and double-leg (r = .575) and single-leg foam (r = .434). MobileMat BESS scores were not correlated with approximate or sample entropy. MobileMat BESS scores were low to moderately correlated with linear measures, suggesting the ability to identify changes in the center of mass-center of pressure relationship, but not higher-order processing associated with nonlinear measures. These results suggest that the MobileMat BESS may be a clinically-useful tool that provides objective linear balance measures.

  14. Standardized Ki67 Diagnostics Using Automated Scoring--Clinical Validation in the GeparTrio Breast Cancer Study.

    Science.gov (United States)

    Klauschen, Frederick; Wienert, Stephan; Schmitt, Wolfgang D; Loibl, Sibylle; Gerber, Bernd; Blohmer, Jens-Uwe; Huober, Jens; Rüdiger, Thomas; Erbstößer, Erhard; Mehta, Keyur; Lederer, Bianca; Dietel, Manfred; Denkert, Carsten; von Minckwitz, Gunter

    2015-08-15

    Scoring proliferation through Ki67 immunohistochemistry is an important component in predicting therapy response to chemotherapy in patients with breast cancer. However, recent studies have cast doubt on the reliability of "visual" Ki67 scoring in the multicenter setting, particularly in the lower, yet clinically important, proliferation range. Therefore, an accurate and standardized Ki67 scoring is pivotal both in routine diagnostics and larger multicenter studies. We validated a novel fully automated Ki67 scoring approach that relies on only minimal a priori knowledge on cell properties and requires no training data for calibration. We applied our approach to 1,082 breast cancer samples from the neoadjuvant GeparTrio trial and compared the performance of automated and manual Ki67 scoring. The three groups of autoKi67 as defined by low (≤ 15%), medium (15.1%-35%), and high (>35%) automated scores showed pCR rates of 5.8%, 16.9%, and 29.5%, respectively. AutoKi67 was significantly linked to prognosis with overall and progression-free survival P values P(OS) cancer that correlated with clinical endpoints and is deployable in routine diagnostics. It may thus help to solve recently reported reliability concerns in Ki67 diagnostics. ©2014 American Association for Cancer Research.

  15. Reliability of histologic assessment in patients with eosinophilic oesophagitis.

    Science.gov (United States)

    Warners, M J; Ambarus, C A; Bredenoord, A J; Verheij, J; Lauwers, G Y; Walsh, J C; Katzka, D A; Nelson, S; van Viegen, T; Furuta, G T; Gupta, S K; Stitt, L; Zou, G; Parker, C E; Shackelton, L M; D Haens, G R; Sandborn, W J; Dellon, E S; Feagan, B G; Collins, M H; Jairath, V; Pai, R K

    2018-04-01

    The validity of the eosinophilic oesophagitis (EoE) histologic scoring system (EoEHSS) has been demonstrated, but only preliminary reliability data exist. Formally assess the reliability of the EoEHSS and additional histologic features. Four expert gastrointestinal pathologists independently reviewed slides from adult patients with EoE (N = 45) twice, in random order, using standardised training materials and scoring conventions for the EoEHSS and additional histologic features agreed upon during a modified Delphi process. Intra- and inter-rater reliability for scoring the EoEHSS, a visual analogue scale (VAS) of overall histopathologic disease severity, and additional histologic features were assessed using intra-class correlation coefficients (ICCs). Almost perfect intra-rater reliability was observed for the composite EoEHSS scores and the VAS. Inter-rater reliability was also almost perfect for the composite EoEHSS scores and substantial for the VAS. Of the EoEHSS items, eosinophilic inflammation was associated with the highest ICC estimates and consistent with almost perfect intra- and inter-rater reliability. With the exception of dyskeratotic epithelial cells and surface epithelial alteration, ICC estimates for the remaining EoEHSS items were above the benchmarks for substantial intra-rater, and moderate inter-rater reliability. Estimation of peak eosinophil count and number of lamina propria eosinophils were associated with the highest ICC estimates among the exploratory items. The composite EoEHSS and most component items are associated with substantial reliability when assessed by central pathologists. Future studies should assess responsiveness of the score to change after a therapeutic intervention to facilitate its use in clinical trials. © 2018 John Wiley & Sons Ltd.

  16. Alberta Stroke Program Early CT Score applied to CT angiography source images is a strong predictor of futile recanalization in acute ischemic stroke

    International Nuclear Information System (INIS)

    Kawiorski, Michal M.; Alonso de Lecinana, Maria; Martinez-Sanchez, Patricia; Fuentes, Blanca; Sanz-Cuesta, Borja E.; Marin, Begona; Ruiz-Ares, Gerardo; Diez-Tejedor, Exuperio; Garcia-Pastor, Andres; Diaz-Otero, Fernando; Calleja, Patricia; Lourido, Daniel; Vicente, Agustina; Fandino, Eduardo; Sierra-Hidalgo, Fernando

    2016-01-01

    Reliable predictors of poor clinical outcome despite successful revascularization might help select patients with acute ischemic stroke for thrombectomy. We sought to determine whether baseline Alberta Stroke Program Early CT Score (ASPECTS) applied to CT angiography source images (CTA-SI) is useful in predicting futile recanalization. Data are from the FUN-TPA study registry (ClinicalTrials.gov; NCT02164357) including patients with acute ischemic stroke due to proximal arterial occlusion in anterior circulation, undergoing reperfusion therapies. Baseline non-contrast CT and CTA-SI-ASPECTS, time-lapse to image acquisition, occurrence, and timing of recanalization were recorded. Outcome measures were NIHSS at 24 h, symptomatic intracranial hemorrhage, modified Rankin scale score, and mortality at 90 days. Futile recanalization was defined when successful recanalization was associated with poor functional outcome (death or disability). Included were 110 patients, baseline NIHSS 17 (IQR 12; 20), treated with intravenous thrombolysis (IVT; 45 %), primary mechanical thrombectomy (MT; 16 %), or combined IVT + MT (39 %). Recanalization rate was 71 %, median delay of 287 min (225; 357). Recanalization was futile in 28 % of cases. In an adjusted model, baseline CTA-SI-ASPECTS was inversely related to the odds of futile recanalization (OR 0.5; 95 % CI 0.3-0.7), whereas NCCT-ASPECTS was not (OR 0.8; 95 % CI 0.5-1.2). A score ≤5 in CTA-SI-ASPECTS was the best cut-off to predict futile recanalization (sensitivity 35 %; specificity 97 %; positive predictive value 86 %; negative predictive value 77 %). CTA-SI-ASPECTS strongly predicts futile recanalization and could be a valuable tool for treatment decisions regarding the indication of revascularization therapies. (orig.)

  17. Alberta Stroke Program Early CT Score applied to CT angiography source images is a strong predictor of futile recanalization in acute ischemic stroke

    Energy Technology Data Exchange (ETDEWEB)

    Kawiorski, Michal M.; Alonso de Lecinana, Maria [Hospital Universitario La Paz, IdiPAZ, Universidad Autonoma de Madrid, Madrid (Spain); Hospital Universitario Ramon y Cajal, IRYCIS, Universidad de Alcala de Henares, Madrid (Spain); Martinez-Sanchez, Patricia; Fuentes, Blanca; Sanz-Cuesta, Borja E.; Marin, Begona; Ruiz-Ares, Gerardo; Diez-Tejedor, Exuperio [Hospital Universitario La Paz, IdiPAZ, Universidad Autonoma de Madrid, Madrid (Spain); Garcia-Pastor, Andres; Diaz-Otero, Fernando [Hospital Universitario Gregorio Maranon, IiSGM, Universidad Complutense de Madrid, Madrid (Spain); Calleja, Patricia [Hospital Universitario 12 de Octubre, Universidad Autonoma de Madrid, Madrid (Spain); Lourido, Daniel; Vicente, Agustina; Fandino, Eduardo [Hospital Universitario Ramon y Cajal, IRYCIS, Universidad de Alcala de Henares, Madrid (Spain); Sierra-Hidalgo, Fernando [Hospital Universitario 12 de Octubre, Universidad Autonoma de Madrid, Madrid (Spain); Hospital Universitario Infanta Leonor, Universidad Complutense de Madrid, Madrid (Spain)

    2016-05-15

    Reliable predictors of poor clinical outcome despite successful revascularization might help select patients with acute ischemic stroke for thrombectomy. We sought to determine whether baseline Alberta Stroke Program Early CT Score (ASPECTS) applied to CT angiography source images (CTA-SI) is useful in predicting futile recanalization. Data are from the FUN-TPA study registry (ClinicalTrials.gov; NCT02164357) including patients with acute ischemic stroke due to proximal arterial occlusion in anterior circulation, undergoing reperfusion therapies. Baseline non-contrast CT and CTA-SI-ASPECTS, time-lapse to image acquisition, occurrence, and timing of recanalization were recorded. Outcome measures were NIHSS at 24 h, symptomatic intracranial hemorrhage, modified Rankin scale score, and mortality at 90 days. Futile recanalization was defined when successful recanalization was associated with poor functional outcome (death or disability). Included were 110 patients, baseline NIHSS 17 (IQR 12; 20), treated with intravenous thrombolysis (IVT; 45 %), primary mechanical thrombectomy (MT; 16 %), or combined IVT + MT (39 %). Recanalization rate was 71 %, median delay of 287 min (225; 357). Recanalization was futile in 28 % of cases. In an adjusted model, baseline CTA-SI-ASPECTS was inversely related to the odds of futile recanalization (OR 0.5; 95 % CI 0.3-0.7), whereas NCCT-ASPECTS was not (OR 0.8; 95 % CI 0.5-1.2). A score ≤5 in CTA-SI-ASPECTS was the best cut-off to predict futile recanalization (sensitivity 35 %; specificity 97 %; positive predictive value 86 %; negative predictive value 77 %). CTA-SI-ASPECTS strongly predicts futile recanalization and could be a valuable tool for treatment decisions regarding the indication of revascularization therapies. (orig.)

  18. Hypertension Knowledge-Level Scale (HK-LS: A Study on Development, Validity and Reliability

    Directory of Open Access Journals (Sweden)

    Cemalettin Kalyoncu

    2012-03-01

    Full Text Available This study was conducted to develop a scale to measure knowledge about hypertension among Turkish adults. The Hypertension Knowledge-Level Scale (HK-LS was generated based on content, face, and construct validity, internal consistency, test re-test reliability, and discriminative validity procedures. The final scale had 22 items with six sub-dimensions. The scale was applied to 457 individuals aged ≥18 years, and 414 of them were re-evaluated for test-retest reliability. The six sub-dimensions encompassed 60.3% of the total variance. Cronbach alpha coefficients were 0.82 for the entire scale and 0.92, 0.59, 0.67, 0.77, 0.72, and 0.76 for the sub-dimensions of definition, medical treatment, drug compliance, lifestyle, diet, and complications, respectively. The scale ensured internal consistency in reliability and construct validity, as well as stability over time. Significant relationships were found between knowledge score and age, gender, educational level, and history of hypertension of the participants. No correlation was found between knowledge score and working at an income-generating job. The present scale, developed to measure the knowledge level of hypertension among Turkish adults, was found to be valid and reliable.

  19. Hypertension Knowledge-Level Scale (HK-LS): a study on development, validity and reliability.

    Science.gov (United States)

    Erkoc, Sultan Baliz; Isikli, Burhanettin; Metintas, Selma; Kalyoncu, Cemalettin

    2012-03-01

    This study was conducted to develop a scale to measure knowledge about hypertension among Turkish adults. The Hypertension Knowledge-Level Scale (HK-LS) was generated based on content, face, and construct validity, internal consistency, test re-test reliability, and discriminative validity procedures. The final scale had 22 items with six sub-dimensions. The scale was applied to 457 individuals aged ≥ 18 years, and 414 of them were re-evaluated for test-retest reliability. The six sub-dimensions encompassed 60.3% of the total variance. Cronbach alpha coefficients were 0.82 for the entire scale and 0.92, 0.59, 0.67, 0.77, 0.72, and 0.76 for the sub-dimensions of definition, medical treatment, drug compliance, lifestyle, diet, and complications, respectively. The scale ensured internal consistency in reliability and construct validity, as well as stability over time. Significant relationships were found between knowledge score and age, gender, educational level, and history of hypertension of the participants. No correlation was found between knowledge score and working at an income-generating job. The present scale, developed to measure the knowledge level of hypertension among Turkish adults, was found to be valid and reliable.

  20. A Novel Risk Scoring System Reliably Predicts Readmission Following Pancreatectomy

    Science.gov (United States)

    Valero, Vicente; Grimm, Joshua C.; Kilic, Arman; Lewis, Russell L.; Tosoian, Jeffrey J.; He, Jin; Griffin, James; Cameron, John L.; Weiss, Matthew J.; Vollmer, Charles M.; Wolfgang, Christopher L.

    2015-01-01

    Background Postoperative readmissions have been proposed by Medicare as a quality metric and may impact provider reimbursement. Since readmission following pancreatectomy is common, we sought to identify factors associated with readmission in order to establish a predictive risk scoring system (RSS). Study Design A retrospective analysis of 2,360 pancreatectomies performed at nine, high-volume pancreatic centers between 2005 and 2011 was performed. Forty-five factors strongly associated with readmission were identified. To derive and validate a RSS, the population was randomly divided into two cohorts in a 4:1 fashion. A multivariable logistic regression model was constructed and scores were assigned based on the relative odds ratio of each independent predictor. A composite Readmission After Pancreatectomy (RAP) score was generated and then stratified to create risk groups. Results Overall, 464 (19.7%) patients were readmitted within 90-days. Eight pre- and postoperative factors, including prior myocardial infarction (OR 2.03), ASA Class ≥ 3 (OR 1.34), dementia (OR 6.22), hemorrhage (OR 1.81), delayed gastric emptying (OR 1.78), surgical site infection (OR 3.31), sepsis (OR 3.10) and short length of stay (OR 1.51), were independently predictive of readmission. The 32-point RAP score generated from the derivation cohort was highly predictive of readmission in the validation cohort (AUC 0.72). The low (0-3), intermediate (4-7) and high risk (>7) groups correlated to 11.7%, 17.5% and 45.4% observed readmission rates, respectively (preadmission following pancreatectomy. Identification of patients with increased risk of readmission using the RAP score will allow efficient resource allocation aimed to attenuate readmission rates. It also has potential to serve as a new metric for comparative research and quality assessment. PMID:25797757

  1. Estimating Between-Person and Within-Person Subscore Reliability with Profile Analysis.

    Science.gov (United States)

    Bulut, Okan; Davison, Mark L; Rodriguez, Michael C

    2017-01-01

    Subscores are of increasing interest in educational and psychological testing due to their diagnostic function for evaluating examinees' strengths and weaknesses within particular domains of knowledge. Previous studies about the utility of subscores have mostly focused on the overall reliability of individual subscores and ignored the fact that subscores should be distinct and have added value over the total score. This study introduces a profile reliability approach that partitions the overall subscore reliability into within-person and between-person subscore reliability. The estimation of between-person reliability and within-person reliability coefficients is demonstrated using subscores from number-correct scoring, unidimensional and multidimensional item response theory scoring, and augmented scoring approaches via a simulation study and a real data study. The effects of various testing conditions, such as subtest length, correlations among subscores, and the number of subtests, are examined. Results indicate that there is a substantial trade-off between within-person and between-person reliability of subscores. Profile reliability coefficients can be useful in determining the extent to which subscores provide distinct and reliable information under various testing conditions.

  2. The role and reliability of the Psychopathy Checklist-Revised in U.S. sexually violent predator evaluations: a case law survey.

    Science.gov (United States)

    DeMatteo, David; Edens, John F; Galloway, Meghann; Cox, Jennifer; Smith, Shannon Toney; Formon, Dana

    2014-06-01

    The civil commitment of offenders as sexually violent predators (SVPs) is a highly contentious area of U.S. mental health law. The Psychopathy Checklist-Revised (PCL-R) is frequently used in mental health evaluations in these cases to aid legal decision making. Although generally perceived to be a useful assessment tool in applied settings, recent research has raised questions about the reliability of PCL-R scores in SVP cases. In this report, we review the use of the PCL-R in SVP trials identified as part of a larger project investigating its role in U.S. case law. After presenting data on how the PCL-R is used in SVP cases, we examine the reliability of scores reported in these cases. We located 214 cases involving the PCL-R, 88 of which included an actual score and 29 of which included multiple scores. In the 29 cases with multiple scores, the intraclass correlation coefficient for a single evaluator for the PCL-R scores was only .58, and only 41.4% of the difference scores were within 1 standard error of measurement unit. The average score reported by prosecution experts was significantly higher than the average score reported by defense-retained experts, and prosecution experts reported PCL-R scores of 30 or above in nearly 50% of the cases, compared with less than 10% of the cases for defense witnesses (κ = .29). In conjunction with other recently published findings demonstrating the unreliability of PCL-R scores in applied settings, our results raise questions as to whether this instrument should be admitted into SVP proceedings.

  3. Clinical outcome scoring of intra-articular calcaneal fractures

    NARCIS (Netherlands)

    Schepers, Tim; Heetveld, Martin J.; Mulder, Paul G. H.; Patka, Peter

    2008-01-01

    Outcome reporting of intra-articular calcaneal fractures is inconsistent. This study aimed to identify the most cited outcome scores in the literature and to analyze their reliability and validity. A systematic literature search identified 34 different outcome scores. The most cited outcome score

  4. Reliability of cortical lesion detection on double inversion recovery MRI applying the MAGNIMS-Criteria in multiple sclerosis patients within a 16-months period.

    Directory of Open Access Journals (Sweden)

    Tobias Djamsched Faizy

    Full Text Available In patients with multiple sclerosis (MS, Double Inversion Recovery (DIR magnetic resonance imaging (MRI can be used to identify cortical lesions (CL. We sought to evaluate the reliability of CL detection on DIR longitudinally at multiple subsequent time-points applying the MAGNIMs scoring criteria for CLs.26 MS patients received a 3T-MRI (Siemens, Skyra with DIR at 12 time-points (TP within a 16 months period. Scans were assessed in random order by two different raters. Both raters separately marked all CLs on each scan and total lesion numbers were obtained for each scan-TP and patient. After a retrospective re-evaluation, the number of consensus CLs (conL was defined as the total number of CLs, which both raters finally agreed on. CLs volumes, relative signal intensities and CLs localizations were determined. Both ratings (conL vs. non-consensus scoring were compared for further analysis.A total number of n = 334 CLs were identified by both raters in 26 MS patients with a first agreement of both raters on 160 out of 334 of the CLs found (κ = 0.48. After the retrospective re-evaluation, consensus agreement increased to 233 out of 334 CL (κ = 0.69. 93.8% of conL were visible in at least 2 consecutive TP. 74.7% of the conL were visible in all 12 consecutive TP. ConL had greater mean lesion volumes and higher mean signal intensities compared to lesions that were only detected by one of the raters (p<0.05. A higher number of CLs in the frontal, parietal, temporal and occipital lobe were identified by both raters than the number of those only identified by one of the raters (p<0.05.After a first assessment, slightly less than a half of the CL were considered as reliably detectable on longitudinal DIR images. A retrospective re-evaluation notably increased the consensus agreement. However, this finding is narrowed, considering the fact that retrospective evaluation steps might not be practicable in clinical routine. Lesions that were not reliably

  5. MODIFIED ALVARADO SCORING AS A DIAGNOSTIC TOOL IN ACUTE APPENDICITIS- A PROSPECTIVE STUDY

    Directory of Open Access Journals (Sweden)

    V. K. Arun Kumar

    2017-02-01

    Full Text Available BACKGROUND Acute Appendicitis commonest community-acquired intra-abdominal infections. Acute appendicitis and its associated complications are significant source of morbidity and sometimes mortality. The Modified Alvarado Scoring System (MASS has been reported to be a cheap and quick diagnostic tool in patients with acute appendicitis. Diagnostic accuracy have been observed if the scores were applied to various populations and clinical settings. The purpose of this study was to evaluate the diagnostic value of Modified Alvarado Scoring System in patients with acute appendicitis in our setting. The aim of the study is to evaluate the efficacy of the modified Alvarado score as a diagnostic tool in Acute Appendicitis, as the diagnosis of appendicitis depends on the onset of symptoms and the subjective interpretation of the physical examination. MATERIALS AND METHODS This was a prospective study carried out in Pondicherry Institute of Medical Science during the period of November 2013 to May 2015. This study was done on 50 patients diagnosed with Acute Appendicitis and admitted in General Surgery. RESULTS In this study, there were a total of 50 patients who were taken up for surgery based on clinical and radiological diagnosis. Our study demonstrates that modified Alvarado score applied to all adult patients of acute appendicitis in adults with a sensitivity of 60% and a specificity of 40% only. Showing it wasn’t efficient in diagnosing acute appendicitis. The positive predictive value shown by our study was 80% which is marginally lower than that explained in literature which reports 87.5%. Negative appendicectomy rate in this study is 12%. CONCLUSION Alvarado score is a non-invasive, safe diagnostic procedure, which is simple, fast reliable and repeatable; it can be used in all conditions, without expensive and complicated supportive diagnostic methods. Alvarado score increases the diagnostic certainty of clinical examination in diagnosis of

  6. Reliability and validity of internalized stigmatization scale in psoriasis

    Directory of Open Access Journals (Sweden)

    Erkan Alpsoy

    2015-03-01

    Full Text Available Backround and design. Internalized stigma involves endorsing negative feelings and beliefs such as insignificance, shame and withdrawal triggered by applying these negative stereotypes to one self. Internalized Stigma Scale has not been applied to psoriasis patients. We aimed to evaluate the reliability and validity of Internalized Stigma Scale in psoriasis patients. Materials and Methods. 100 consecutive, volunteer psoriasis patients (48 female, 52 male; aged, 40.59±15.44 years were enrolled in the study. PASI and BSA were evaluated by physician (A.B.. Patients responded contemporaneously to Psoriasis Internalized Stigma Scale (PISS, DQoL, and Perceived Health Status (PHS, single-item self-rated general health question, of which Likert scores 1, 2, and 3 were classified as “from fair to very poor”, and 4, 5 as “good”. Results. Cronbach's alpha coefficient of PISS subscales was 0.83 for alienation, 0.70 for stereotype endorsement, 0.70 for perceived discrimination, 0.84 for social withdrawal and 0.68 for stigma resistance. The same value was 0.89 for the total scale. PISS and DQoL scores mean values were 58.8±12.6 and 10.0±9.4, respectively. PISS was significantly correlated with the patients' DQoL scores (r=,726, p=0,001. PISS was also significantly correlated with disease duration (r=,209, p=0,047. There was no any significant relationship between PASI or BSA and PISS. Mean DQoL scores in patients reporting their PHS as “from fair to very poor” and “good” were 12.1±7.3 and 5.0±4.3, respectively. Mean values of PISS in patients reporting their PHS as “from fair to very poor” was significantly increased compared with patients reporting their PHS as “good” (p=0.001. Conclusion. PISS can be used as a reliable and valid tool in assesing internalized stigmatization in psoriasis patients. Our results indicate a high level of stigmatization in psoriasis patients. Low DQoL scores show a correlation with increased levels of

  7. Evaluation of revised trauma score in poly- traumatized patients

    International Nuclear Information System (INIS)

    Ahmad, H.N.

    2004-01-01

    Objective: To determine the prognostic value and reliability of revised trauma score (RTS) in polytraumatized patients. Subjects and Methods: Thirty adult patients of road traffic accidents sustaining multisystem injuries due to high energy blunt trauma were managed according to the protocols of advanced trauma life support (ATLS) and from their first set of data RTS was calculated. Score of each patient was compared with his final outcome at the time of discharge from the hospital. Results: The revised trauma score was found to be a reliable predictor of prognosis of polytraumatized patients but a potentially weak predictor for those patients having severe injury involving a single anatomical region. The higher the RTS the better the prognosis of polytrauma patient and vice versa. Revised trauma score <8 turned out to be an indicator of severe injury with high mortality and morbidity and overall mortality in polytraumatized patients was 26.66%. However, RTS-6 was associated with 50% mortality. Conclusion: The revised trauma score is a reliable indicator of prognosis of polytraumatized patients. Therefore, it can be used for field and emergency room triage. (author)

  8. Methodology for risk assessment and reliability applied for pipeline engineering design and industrial valves operation

    Energy Technology Data Exchange (ETDEWEB)

    Silveira, Dierci [Universidade Federal Fluminense (UFF), Volta Redonda, RJ (Brazil). Escola de Engenharia Industrial e Metalurgia. Lab. de Sistemas de Producao e Petroleo e Gas], e-mail: dsilveira@metal.eeimvr.uff.br; Batista, Fabiano [CICERO, Rio das Ostras, RJ (Brazil)

    2009-07-01

    Two kinds of situations may be distinguished for estimating the operating reliability when maneuvering industrial valves and the probability of undesired events in pipelines and industrial plants: situations in which the risk is identified in repetitive cycles of operations and situations in which there is a permanent hazard due to project configurations introduced by decisions during the engineering design definition stage. The estimation of reliability based on the influence of design options requires the choice of a numerical index, which may include a composite of human operating parameters based on biomechanics and ergonomics data. We first consider the design conditions under which the plant or pipeline operator reliability concepts can be applied when operating industrial valves, and then describe in details the ergonomics and biomechanics risks that would lend itself to engineering design database development and human reliability modeling and assessment. This engineering design database development and reliability modeling is based on a group of engineering design and biomechanics parameters likely to lead to over-exertion forces and working postures, which are themselves associated with the functioning of a particular plant or pipeline. This approach to construct based on ergonomics and biomechanics for a more common industrial valve positioning in the plant layout is proposed through the development of a methodology to assess physical efforts and operator reach, combining various elementary operations situations. These procedures can be combined with the genetic algorithm modeling and four elements of the man-machine systems: the individual, the task, the machinery and the environment. The proposed methodology should be viewed not as competing to traditional reliability and risk assessment bur rather as complementary, since it provides parameters related to physical efforts values for valves operation and workspace design and usability. (author)

  9. Construct Validity and Reliability of Structured Assessment of endoVascular Expertise in a Simulated Setting

    DEFF Research Database (Denmark)

    Bech, B; Lönn, L; Falkenberg, M

    2011-01-01

    Objectives To study the construct validity and reliability of a novel endovascular global rating scale, Structured Assessment of endoVascular Expertise (SAVE). Design A Clinical, experimental study. Materials Twenty physicians with endovascular experiences ranging from complete novices to highly....... Validity was analysed by correlating experience with performance results. Reliability was analysed according to generalisability theory. Results The mean score on the 29 items of the SAVE scale correlated well with clinical experience (R = 0.84, P ... with clinical experience (R = -0.53, P validity and reliability of assessment with the SAVE scale was high when applied to performances in a simulation setting with advanced realism. No ceiling effect...

  10. The Structured Interview & Scoring Tool-Massachusetts Alzheimer's Disease Research Center (SIST-M): development, reliability, and cross-sectional validation of a brief structured clinical dementia rating interview.

    Science.gov (United States)

    Okereke, Olivia I; Copeland, Maura; Hyman, Bradley T; Wanggaard, Taylor; Albert, Marilyn S; Blacker, Deborah

    2011-03-01

    The Clinical Dementia Rating (CDR) and CDR Sum-of-Boxes can be used to grade mild but clinically important cognitive symptoms of Alzheimer disease. However, sensitive clinical interview formats are lengthy. To develop a brief instrument for obtaining CDR scores and to assess its reliability and cross-sectional validity. Using legacy data from expanded interviews conducted among 347 community-dwelling older adults in a longitudinal study, we identified 60 questions (from a possible 131) about cognitive functioning in daily life using clinical judgment, inter-item correlations, and principal components analysis. Items were selected in 1 cohort (n=147), and a computer algorithm for generating CDR scores was developed in this same cohort and re-run in a replication cohort (n=200) to evaluate how well the 60 items retained information from the original 131 items. Short interviews based on the 60 items were then administered to 50 consecutively recruited older individuals, with no symptoms or mild cognitive symptoms, at an Alzheimer's Disease Research Center. Clinical Dementia Rating scores based on short interviews were compared with those from independent long interviews. In the replication cohort, agreement between short and long CDR interviews ranged from κ=0.65 to 0.79, with κ=0.76 for Memory, κ=0.77 for global CDR, and intraclass correlation coefficient for CDR Sum-of-Boxes=0.89. In the cross-sectional validation, short interview scores were slightly lower than those from long interviews, but good agreement was observed for global CDR and Memory (κ≥0.70) as well as for CDR Sum-of-Boxes (intraclass correlation coefficient=0.73). The Structured Interview & Scoring Tool-Massachusetts Alzheimer's Disease Research Center is a brief, reliable, and sensitive instrument for obtaining CDR scores in persons with symptoms along the spectrum of mild cognitive change.

  11. Anterior Cruciate Ligament OsteoArthritis Score (ACLOAS)

    DEFF Research Database (Denmark)

    Roemer, Frank W; Frobell, Richard; Lohmander, Stefan

    2014-01-01

    OBJECTIVE: To develop a whole joint scoring system, the Anterior Cruciate Ligament OsteoArthritis Score (ACLOAS), for magnetic resonance imaging (MRI)-based assessment of acute anterior cruciate ligament (ACL) injury and follow-up of structural sequelae, and to assess its reliability. DESIGN...

  12. Reliability assessment of AOSpine thoracolumbar spine injury classification system and Thoracolumbar Injury Classification and Severity Score (TLICS) for thoracolumbar spine injuries: results of a multicentre study.

    Science.gov (United States)

    Kaul, Rahul; Chhabra, Harvinder Singh; Vaccaro, Alexander R; Abel, Rainer; Tuli, Sagun; Shetty, Ajoy Prasad; Das, Kali Dutta; Mohapatra, Bibhudendu; Nanda, Ankur; Sangondimath, Gururaj M; Bansal, Murari Lal; Patel, Nishit

    2017-05-01

    The aim of this multicentre study was to determine whether the recently introduced AOSpine Classification and Injury Severity System has better interrater and intrarater reliability than the already existing Thoracolumbar Injury Classification and Severity Score (TLICS) for thoracolumbar spine injuries. Clinical and radiological data of 50 consecutive patients admitted at a single centre with a diagnosis of an acute traumatic thoracolumbar spine injury were distributed to eleven attending spine surgeons from six different institutions in the form of PowerPoint presentation, who classified them according to both classifications. After time span of 6 weeks, cases were randomly rearranged and sent again to same surgeons for re-classification. Interobserver and intraobserver reliability for each component of TLICS and new AOSpine classification were evaluated using Fleiss Kappa coefficient (k value) and Spearman rank order correlation. Moderate interrater and intrarater reliability was seen for grading fracture type and integrity of posterior ligamentous complex (Fracture type: k = 0.43 ± 0.01 and 0.59 ± 0.16, respectively, PLC: k = 0.47 ± 0.01 and 0.55 ± 0.15, respectively), and fair to moderate reliability (k = 0.29 ± 0.01 interobserver and 0.44+/0.10 intraobserver, respectively) for total score according to TLICS. Moderate interrater (k = 0.59 ± 0.01) and substantial intrarater reliability (k = 0.68 ± 0.13) was seen for grading fracture type regardless of subtype according to AOSpine classification. Near perfect interrater and intrarater agreement was seen concerning neurological status for both the classification systems. Recently proposed AOSpine classification has better reliability for identifying fracture morphology than the existing TLICS. Additional studies are clearly necessary concerning the application of these classification systems across multiple physicians at different level of training and trauma centers to evaluate not

  13. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.

    Science.gov (United States)

    Lee, Hayan; Schatz, Michael C

    2012-08-15

    Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions. The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net

  14. Reliability of Holistic Scoring for the 1985 MCAT Essays.

    Science.gov (United States)

    Mitchell, Karen J.; Anderson, Judith A.

    A pilot essay was included in the 1985 Spring and Fall administrations of the Medical College Admission Test. A sample of 320 of the essays written by Fall examinees who had expressed an interest in allopathic medicine was used to calculate interrater reliability estimates. Sixteen of 20 readers who had been trained by White's suggestions for…

  15. Reliability and Model Fit

    Science.gov (United States)

    Stanley, Leanne M.; Edwards, Michael C.

    2016-01-01

    The purpose of this article is to highlight the distinction between the reliability of test scores and the fit of psychometric measurement models, reminding readers why it is important to consider both when evaluating whether test scores are valid for a proposed interpretation and/or use. It is often the case that an investigator judges both the…

  16. Introducing the HOPE (Hypospadias Objective Penile Evaluation)-score : A validation study of an objective scoring system for evaluating cosmetic appearance in hypospadias patients

    NARCIS (Netherlands)

    van der Toorn, Fred; de Jong, Tom P. V. M.; de Gier, Robert P. E.; Callewaert, Piet R. H.; van der Horst, Eric H. J. R.; Steffens, Martijn G.; Hoebeke, Piet; Nijman, Rien J. M.; Bush, Nicol C.; Wolffenbuttel, Katja P.; van den Heijkant, Marleen M. C.; van Capelle, Jan-Willem; Wildhagen, Mark; Timman, Reinier; van Busschbach, Jan J. V.

    2013-01-01

    Objective: To determine the reliability and internal validity of the Hypospadias Objective Penile Evaluation (HOPE)-score, a newly developed scoring system assessing the cosmetic outcome in hypospadias. Patients and methods: The HOPE scoring system incorporates all surgically-correctable items:

  17. Neurology objective structured clinical examination reliability using generalizability theory.

    Science.gov (United States)

    Blood, Angela D; Park, Yoon Soo; Lukas, Rimas V; Brorson, James R

    2015-11-03

    This study examines factors affecting reliability, or consistency of assessment scores, from an objective structured clinical examination (OSCE) in neurology through generalizability theory (G theory). Data include assessments from a multistation OSCE taken by 194 medical students at the completion of a neurology clerkship. Facets evaluated in this study include cases, domains, and items. Domains refer to areas of skill (or constructs) that the OSCE measures. G theory is used to estimate variance components associated with each facet, derive reliability, and project the number of cases required to obtain a reliable (consistent, precise) score. Reliability using G theory is moderate (Φ coefficient = 0.61, G coefficient = 0.64). Performance is similar across cases but differs by the particular domain, such that the majority of variance is attributed to the domain. Projections in reliability estimates reveal that students need to participate in 3 OSCE cases in order to increase reliability beyond the 0.70 threshold. This novel use of G theory in evaluating an OSCE in neurology provides meaningful measurement characteristics of the assessment. Differing from prior work in other medical specialties, the cases students were randomly assigned did not influence their OSCE score; rather, scores varied in expected fashion by domain assessed. © 2015 American Academy of Neurology.

  18. The reliability, validity and responsiveness of the Dutch version of the Oxford elbow score

    Directory of Open Access Journals (Sweden)

    Patka Peter

    2011-07-01

    Full Text Available Abstract Background The Oxford elbow score (OES is an English questionnaire that measures the patients' subjective experience of elbow surgery. The OES comprises three domains: elbow function, pain, and social-psychological effects. This questionnaire can be completed by the patient and used as an outcome measure after elbow surgery. The aim of this study was to develop and evaluate the Dutch version of the translated OES for reliability, validity and responsiveness with respect to patients after elbow trauma and surgery. Methods The 12 items of the English-language OES were translated into Dutch and then back-translated; the back-translated questionnaire was then compared to the original English version. The OES Dutch version was completed by 69 patients (group A, 60 of whom had an elbow luxation, four an elbow fracture and five an epicondylitis. QuickDASH, the visual analogue pain scale (VAS and the Mayo Elbow Performance Index (MEPI were also completed to examine the convergent validity of the OES in group A. To calculate the test-retest reliability and responsiveness of the OES, this questionnaire was completed three times by 43 different patients (group B. An average of 52 days elapsed between therapy and the administration of the third OES (SD = 24.1. Results The Cronbach's α coefficients for the function, pain and social-psychological domains were 0.90, 0.87 and 0.90, respectively. The intra-class correlation coefficients for the domains were 0.87 for function, 0.89 for pain and 0.87 for social-psychological. The standardised response means for the domains were 0.69, 0.46 and 0.60, respectively, and the minimal detectable changes were 27.6, 21.7 and 24.0, respectively. The convergent validity for the function, pain and social-psychological domains, which were measured as the Spearman's correlation of the OES domains with the MEPI, were 0.68, 0.77 and 0.77, respectively. The Spearman's correlations of the OES domains with QuickDASH were

  19. The ACTA PORT-score for predicting perioperative risk of blood transfusion for adult cardiac surgery.

    Science.gov (United States)

    Klein, A A; Collier, T; Yeates, J; Miles, L F; Fletcher, S N; Evans, C; Richards, T

    2017-09-01

    A simple and accurate scoring system to predict risk of transfusion for patients undergoing cardiac surgery is lacking. We identified independent risk factors associated with transfusion by performing univariate analysis, followed by logistic regression. We then simplified the score to an integer-based system and tested it using the area under the receiver operator characteristic (AUC) statistic with a Hosmer-Lemeshow goodness-of-fit test. Finally, the scoring system was applied to the external validation dataset and the same statistical methods applied to test the accuracy of the ACTA-PORT score. Several factors were independently associated with risk of transfusion, including age, sex, body surface area, logistic EuroSCORE, preoperative haemoglobin and creatinine, and type of surgery. In our primary dataset, the score accurately predicted risk of perioperative transfusion in cardiac surgery patients with an AUC of 0.76. The external validation confirmed accuracy of the scoring method with an AUC of 0.84 and good agreement across all scores, with a minor tendency to under-estimate transfusion risk in very high-risk patients. The ACTA-PORT score is a reliable, validated tool for predicting risk of transfusion for patients undergoing cardiac surgery. This and other scores can be used in research studies for risk adjustment when assessing outcomes, and might also be incorporated into a Patient Blood Management programme. © The Author 2017. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  20. Reliability, validity, and minimal detectable change of the push-off test scores in assessing upper extremity weight-bearing ability.

    Science.gov (United States)

    Mehta, Saurabh P; George, Hannah R; Goering, Christian A; Shafer, Danielle R; Koester, Alan; Novotny, Steven

    2017-11-01

    Clinical measurement study. The push-off test (POT) was recently conceived and found to be reliable and valid for assessing weight bearing through injured wrist or elbow. However, further research with larger sample can lend credence to the preliminary findings supporting the use of the POT. This study examined the interrater reliability, construct validity, and measurement error for the POT in patients with wrist conditions. Participants with musculoskeletal (MSK) wrist conditions were recruited. The performance on the POT, grip isometric strength of wrist extensors was assessed. The shortened version of the Disabilities of the Arm, Shoulder and Hand and numeric pain rating scale were completed. The intraclass correlation coefficient assessed interrater reliability of the POT. Pearson correlation coefficients (r) examined the concurrent relationships between the POT and other measures. The standard error of measurement and the minimal detectable change at 90% confidence interval were assessed as measurement error and index of true change for the POT. A total of 50 participants with different elbow or wrist conditions (age: 48.1 ± 16.6 years) were included in this study. The results of this study strongly supported the interrater reliability (intraclass correlation coefficient: 0.96 and 0.93 for the affected and unaffected sides, respectively) of the POT in patients with wrist MSK conditions. The POT showed convergent relationships with the grip strength on the injured side (r = 0.89) and the wrist extensor strength (r = 0.7). The POT showed smaller standard error of measurement (1.9 kg). The minimal detectable change at 90% confidence interval for the POT was 4.4 kg for the sample. This study provides additional evidence to support the reliability and validity of the POT. This is the first study that provides the values for the measurement error and true change on the POT scores in patients with wrist MSK conditions. Further research should examine the

  1. TEST-RETEST RELIABILITY OF THE CLOSED KINETIC CHAIN UPPER EXTREMITY STABILITY TEST (CKCUEST) IN ADOLESCENTS: RELIABILITY OF CKCUEST IN ADOLESCENTS.

    Science.gov (United States)

    de Oliveira, Valéria M A; Pitangui, Ana C R; Nascimento, Vinícius Y S; da Silva, Hítalo A; Dos Passos, Muana H P; de Araújo, Rodrigo C

    2017-02-01

    The Closed Kinetic Chain Upper Extremity Stability Test (CKCUEST) has been proposed as an option to assess upper limb function and stability; however, there are few studies that support the use of this test in adolescents. The purpose of the present study was to investigate the intersession reliability and agreement of three CKCUEST scores in adolescents and establish clinimetric values for this test. Test-retest reliability. Twenty-five healthy adolescents of both sexes were evaluated. The subjects performed two CKCUEST with an interval of one week between the tests. An intraclass correlation coefficient (ICC 3,3 ) two-way mixed model with a 95% interval of confidence was utilized to determine intersession reliability. A Bland-Altman graph was plotted to analyze the agreement between assessments. The presence of systematic error was evaluated by a one-sample t test. The difference between the evaluation and reevaluation was observed using a paired-sample t test. The level of significance was set at 0.05. Standard error of measurements and minimum detectable changes were calculated. The intersession reliability of the average touches score, normalized score, and power score were 0.68, 0.68 and 0.87, the standard error of measurement were 2.17, 1.35 and 6.49, and the minimal detectable change was 6.01, 3.74 and 17.98, respectively. The presence of systematic error (p test with moderate to excellent reliability when used with adolescents. The CKCUEST is a measurement with moderate to excellent reliability for adolescents. 2b.

  2. The Truth about Scores Children Achieve on Tests.

    Science.gov (United States)

    Brown, Jonathan R.

    1989-01-01

    The importance of using the standard error of measurement (SEm) in determining reliability in test scores is emphasized. The SEm is compared to the hypothetical true score for standardized tests, and procedures for calculation of the SEm are explained. (JDD)

  3. Inter-rater Reliability for Metrics Scored in a Binary Fashion-Performance Assessment for an Arthroscopic Bankart Repair.

    Science.gov (United States)

    Gallagher, Anthony G; Ryu, Richard K N; Pedowitz, Robert A; Henn, Patrick; Angelo, Richard L

    2018-05-02

    To determine the inter-rater reliability (IRR) of a procedure-specific checklist scored in a binary fashion for the evaluation of surgical skill and whether it meets a minimum level of agreement (≥0.8 between 2 raters) required for high-stakes assessment. In a prospective randomized and blinded fashion, and after detailed assessment training, 10 Arthroscopy Association of North America Master/Associate Master faculty arthroscopic surgeons (in 5 pairs) with an average of 21 years of surgical experience assessed the video-recorded 3-anchor arthroscopic Bankart repair performance of 44 postgraduate year 4 or 5 residents from 21 Accreditation Council for Graduate Medical Education orthopaedic residency training programs from across the United States. No paired scores of resident surgeon performance evaluated by the 5 teams of faculty assessors dropped below the 0.8 IRR level (mean = 0.93; range 0.84-0.99; standard deviation = 0.035). A comparison between the 5 assessor groups with 1 factor analysis of variance showed that there was no significant difference between the groups (P = .205). Pearson's product-moment correlation coefficient revealed a strong and statistically significant negative correlation, that is, -0.856 (P fashion meet the need and can show a high (>80%) IRR. Copyright © 2018 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.

  4. Performance of a novel clinical score, the Pediatric Asthma Severity Score (PASS), in the evaluation of acute asthma.

    Science.gov (United States)

    Gorelick, Marc H; Stevens, Molly W; Schultz, Theresa R; Scribano, Philip V

    2004-01-01

    To evaluate the reliability, validity, and responsiveness of a new clinical asthma score, the Pediatric Asthma Severity Score (PASS), in children aged 1 through 18 years in an acute clinical setting. This was a prospective cohort study of children treated for acute asthma at two urban pediatric emergency departments (EDs). A total of 852 patients were enrolled at one site and 369 at the second site. Clinical findings were assessed at the start of the ED visit, after one hour of treatment, and at the time of disposition. Peak expiratory flow rate (PEFR) (for patients aged 6 years and older) and pulse oximetry were also measured. Composite scores including three, four, or five clinical findings were evaluated, and the three-item score (wheezing, prolonged expiration, and work of breathing) was selected as the PASS. Interobserver reliability for the PASS was good to excellent (kappa = 0.72 to 0.83). There was a significant correlation between PASS and PEFR (r = 0.27 to 0.37) and pulse oximetry (r = 0.29 to 0.41) at various time points. The PASS was able to discriminate between those patients who did and did not require hospitalization, with area under the receiver operating characteristic curve of 0.82. Finally, the PASS was shown to be responsive, with a 48% relative increase in score from start to end of treatment and an overall effect size of 0.62, indicating a moderate to large effect. This clinical score, the PASS, based on three clinical findings, is a reliable and valid measure of asthma severity in children and shows both discriminative and responsive properties. The PASS may be a useful tool to assess acute asthma severity for clinical and research purposes.

  5. Validity and Reliability of Baseline Testing in a Standardized Environment.

    Science.gov (United States)

    Higgins, Kathryn L; Caze, Todd; Maerlender, Arthur

    2017-08-11

    The Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) is a computerized neuropsychological test battery commonly used to determine cognitive recovery from concussion based on comparing post-injury scores to baseline scores. This model is based on the premise that ImPACT baseline test scores are a valid and reliable measure of optimal cognitive function at baseline. Growing evidence suggests that this premise may not be accurate and a large contributor to invalid and unreliable baseline test scores may be the protocol and environment in which baseline tests are administered. This study examined the effects of a standardized environment and administration protocol on the reliability and performance validity of athletes' baseline test scores on ImPACT by comparing scores obtained in two different group-testing settings. Three hundred-sixty one Division 1 cohort-matched collegiate athletes' baseline data were assessed using a variety of indicators of potential performance invalidity; internal reliability was also examined. Thirty-one to thirty-nine percent of the baseline cases had at least one indicator of low performance validity, but there were no significant differences in validity indicators based on environment in which the testing was conducted. Internal consistency reliability scores were in the acceptable to good range, with no significant differences between administration conditions. These results suggest that athletes may be reliably performing at levels lower than their best effort would produce. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. Clinical Outcome Scoring of Intra-articular Calcaneal Fractures

    NARCIS (Netherlands)

    T. Schepers (Tim); M.J. Heetveld (Martin); P.G.H. Mulder (Paul); P. Patka (Peter)

    2008-01-01

    textabstractOutcome reporting of intra-articular calcaneal fractures is inconsistent. This study aimed to identify the most cited outcome scores in the literature and to analyze their reliability and validity. A systematic literature search identified 34 different outcome scores. The most cited

  7. Study on Feasibility of Applying Function Approximation Moment Method to Achieve Reliability-Based Design Optimization

    International Nuclear Information System (INIS)

    Huh, Jae Sung; Kwak, Byung Man

    2011-01-01

    Robust optimization or reliability-based design optimization are some of the methodologies that are employed to take into account the uncertainties of a system at the design stage. For applying such methodologies to solve industrial problems, accurate and efficient methods for estimating statistical moments and failure probability are required, and further, the results of sensitivity analysis, which is needed for searching direction during the optimization process, should also be accurate. The aim of this study is to employ the function approximation moment method into the sensitivity analysis formulation, which is expressed as an integral form, to verify the accuracy of the sensitivity results, and to solve a typical problem of reliability-based design optimization. These results are compared with those of other moment methods, and the feasibility of the function approximation moment method is verified. The sensitivity analysis formula with integral form is the efficient formulation for evaluating sensitivity because any additional function calculation is not needed provided the failure probability or statistical moments are calculated

  8. Grant Peer Review: Improving Inter-Rater Reliability with Training.

    Science.gov (United States)

    Sattler, David N; McKnight, Patrick E; Naney, Linda; Mathis, Randy

    2015-01-01

    This study developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-rater reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers--especially those with experience--have good understanding of the grant review rating scale. The findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. The results underscore the benefits of and need for specialized peer reviewer training.

  9. Automated Quantification of the Landing Error Scoring System With a Markerless Motion-Capture System.

    Science.gov (United States)

    Mauntel, Timothy C; Padua, Darin A; Stanley, Laura E; Frank, Barnett S; DiStefano, Lindsay J; Peck, Karen Y; Cameron, Kenneth L; Marshall, Stephen W

    2017-11-01

      The Landing Error Scoring System (LESS) can be used to identify individuals with an elevated risk of lower extremity injury. The limitation of the LESS is that raters identify movement errors from video replay, which is time-consuming and, therefore, may limit its use by clinicians. A markerless motion-capture system may be capable of automating LESS scoring, thereby removing this obstacle.   To determine the reliability of an automated markerless motion-capture system for scoring the LESS.   Cross-sectional study.   United States Military Academy.   A total of 57 healthy, physically active individuals (47 men, 10 women; age = 18.6 ± 0.6 years, height = 174.5 ± 6.7 cm, mass = 75.9 ± 9.2 kg).   Participants completed 3 jump-landing trials that were recorded by standard video cameras and a depth camera. Their movement quality was evaluated by expert LESS raters (standard video recording) using the LESS rubric and by software that automates LESS scoring (depth-camera data). We recorded an error for a LESS item if it was present on at least 2 of 3 jump-landing trials. We calculated κ statistics, prevalence- and bias-adjusted κ (PABAK) statistics, and percentage agreement for each LESS item. Interrater reliability was evaluated between the 2 expert rater scores and between a consensus expert score and the markerless motion-capture system score.   We observed reliability between the 2 expert LESS raters (average κ = 0.45 ± 0.35, average PABAK = 0.67 ± 0.34; percentage agreement = 0.83 ± 0.17). The markerless motion-capture system had similar reliability with consensus expert scores (average κ = 0.48 ± 0.40, average PABAK = 0.71 ± 0.27; percentage agreement = 0.85 ± 0.14). However, reliability was poor for 5 LESS items in both LESS score comparisons.   A markerless motion-capture system had the same level of reliability as expert LESS raters, suggesting that an automated system can accurately assess movement. Therefore, clinicians can use

  10. Quantification of clinical scores through physiological recordings in low-responsive patients: a feasibility study

    Directory of Open Access Journals (Sweden)

    Wieser Martin

    2012-05-01

    Full Text Available Abstract Clinical scores represent the gold standard in characterizing the clinical condition of patients in vegetative or minimally conscious state. However, they suffer from problems of sensitivity, specificity, subjectivity and inter-rater reliability. In this feasibility study, objective measures including physiological and neurophysiological signals are used to quantify the clinical state of 13 low-responsive patients. A linear regression method was applied in nine patients to obtain fixed regression coefficients for the description of the clinical state. The statistical model was extended and evaluated with four patients of another hospital. A linear mixed models approach was introduced to handle the challenges of data sets obtained from different locations. Using linear backward regression 12 variables were sufficient to explain 74.4% of the variability in the change of the clinical scores. Variables based on event-related potentials and electrocardiogram account for most of the variability. These preliminary results are promising considering that this is the first attempt to describe the clinical state of low-responsive patients in such a global and quantitative way. This new model could complement the clinical scores based on objective measurements in order to increase diagnostic reliability. Nevertheless, more patients are necessary to prove the conclusions of a statistical model with 12 variables.

  11. Extension of the lod score: the mod score.

    Science.gov (United States)

    Clerget-Darpoux, F

    2001-01-01

    In 1955 Morton proposed the lod score method both for testing linkage between loci and for estimating the recombination fraction between them. If a disease is controlled by a gene at one of these loci, the lod score computation requires the prior specification of an underlying model that assigns the probabilities of genotypes from the observed phenotypes. To address the case of linkage studies for diseases with unknown mode of inheritance, we suggested (Clerget-Darpoux et al., 1986) extending the lod score function to a so-called mod score function. In this function, the variables are both the recombination fraction and the disease model parameters. Maximizing the mod score function over all these parameters amounts to maximizing the probability of marker data conditional on the disease status. Under the absence of linkage, the mod score conforms to a chi-square distribution, with extra degrees of freedom in comparison to the lod score function (MacLean et al., 1993). The mod score is asymptotically maximum for the true disease model (Clerget-Darpoux and Bonaïti-Pellié, 1992; Hodge and Elston, 1994). Consequently, the power to detect linkage through mod score will be highest when the space of models where the maximization is performed includes the true model. On the other hand, one must avoid overparametrization of the model space. For example, when the approach is applied to affected sibpairs, only two constrained disease model parameters should be used (Knapp et al., 1994) for the mod score maximization. It is also important to emphasize the existence of a strong correlation between the disease gene location and the disease model. Consequently, there is poor resolution of the location of the susceptibility locus when the disease model at this locus is unknown. Of course, this is true regardless of the statistics used. The mod score may also be applied in a candidate gene strategy to model the potential effect of this gene in the disease. Since, however, it

  12. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Directory of Open Access Journals (Sweden)

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  13. Major influence of interobserver reliability on polytrauma identification with the Injury Severity Score (ISS): Time for a centralised coding in trauma registries?

    Science.gov (United States)

    Maduz, Roman; Kugelmeier, Patrick; Meili, Severin; Döring, Robert; Meier, Christoph; Wahl, Peter

    2017-04-01

    The Abbreviated Injury Scale (AIS) and the Injury Severity Score (ISS) find increasingly widespread use to assess trauma burden and to perform interhospital benchmarking through trauma registries. Since 2015, public resource allocation in Switzerland shall even be derived from such data. As every trauma centre is responsible for its own coding and data input, this study aims at evaluating interobserver reliability of AIS and ISS coding. Interobserver reliability of the AIS and ISS is analysed from a cohort of 50 consecutive severely injured patients treated in 2012 at our institution, coded retrospectively by 3 independent and specifically trained observers. Considering a cutoff ISS≥16, only 38/50 patients (76%) were uniformly identified as polytraumatised or not. Increasing the cut off to ≥20, this increased to 41/50 patients (82%). A difference in the AIS of ≥ 1 was present in 261 (16%) of possible codes. Excluding the vast majority of uninjured body regions, uniformly identical AIS severity values were attributed in 67/193 (35%) body regions, or 318/579 (55%) possible observer pairings. Injury severity all too often is neither identified correctly nor consistently when using the AIS. This leads to wrong identification of severely injured patients using the ISS. Improving consistency of coding through centralisation is recommended before scores based on the AIS are to be used for interhospital benchmarking and resource allocation in the treatment of severely injured patients. Copyright © 2017. Published by Elsevier Ltd.

  14. Cross-cultural adaptation and validation of the reliability of the Thai version of the Hip disability and Osteoarthritis Outcome Score (HOOS).

    Science.gov (United States)

    Trathitiphan, Warayos; Paholpak, Permsak; Sirichativapee, Winai; Wisanuyotin, Taweechok; Laupattarakasem, Pat; Sukhonthamarn, Kamolsak; Jeeravipoolvarn, Polasak; Kosuwon, Weerachai

    2016-10-01

    HOOS was developed as an extension of the Western Ontario and McMaster Universities' Osteoarthritis Index questionnaire for measuring symptoms and functional limitations related to the hip(s) of patients with osteoarthritis. To determine the validity and reliability of the Thai version of the Hip disability and Osteoarthritis Outcome Score (HOOS) vis-à-vis hip osteoarthritis, the original HOOS was translated into a Thai version of HOOS, according to international recommendations. Patients with hip osteoarthritis (n = 57; 25 males) were asked to complete the Thai version of HOOS twice: once then again after a 3-week interval. The test-retest reliability was analyzed using the intraclass correlation coefficient (ICC). Internal consistencies were analyzed using Cronbach's alpha, while the construct validity was tested by comparing the Thai HOOS with the Thai modified SF-36 and calculating the Spearman's rank correlation coefficients. The Thai HOOS produced good reliability (i.e., the ICC was greater than 0.9 in all five subscales). All of the Cronbach's alpha showed that the Thai HOOS had high internal consistency (Cronbach's alpha greater than 0.8), especially for the pain and ADL subscales (0.89 and 0.90, respectively). The Spearman's rank correlation for all five subscales of the Thai HOOS had moderate correlation with the Bodily Pain subscale of the Thai SF-36. The pain subscale of the Thai HOOS had a high correlation with the Vitality and Social Function subscales of the Thai SF-36 (r = 0.55 and 0.54)-with which the symptom subscale had a moderate correlation. The Thai version of HOOS had excellent internal consistency, excellent test-retest reliability, and good construct validity. It can be used as a reliable tool for assessing quality of life for patients with hip osteoarthritis in Thailand.

  15. Inter- and intra-rater reliability of nasal auscultation in daycare children.

    Science.gov (United States)

    Santos, Rita; Silva Alexandrino, Ana; Tomé, David; Melo, Cristina; Mesquita Montes, António; Costa, Daniel; Pinto Ferreira, João

    2018-02-01

    The aim of this study was to assess nasal auscultation's intra- and inter-rater reliability and to analyze ear and respiratory clinical condition according to nasal auscultation. Cross-sectional study performed in 125 children aged up to 3 years old attending daycare centers. Nasal auscultation, tympanometry and Paediatric Respiratory Severity Score (PRSS) were applied to all children. Nasal sounds were classified by an expert panel in order to determine nasal auscultation's intra and inter- rater reliability. The classification of nasal sounds was assessed against tympanometric and PRSS values. Nasal auscultation revealed substantial inter-rater (K=0.75) and intra-rater (K=0.69; K=0.61 and K=0.72) reliability. Children with a "non-obstructed" classification revealed a lower peak pressure (t=-3.599, Pauscultation revealed substantial intra- and inter-rater reliability. Nasal auscultation exhibited important differences according to ear and respiratory clinical conditions. Nasal auscultation in pediatrics seems to be an original topic as well as a simple method that can be used to identify early signs of nasopharyngeal obstruction.

  16. Funding Medical Research Projects: Taking into Account Referees' Severity and Consistency through Many-Faceted Rasch Modeling of Projects' Scores.

    Science.gov (United States)

    Tesio, Luigi; Simone, Anna; Grzeda, Mariuzs T; Ponzio, Michela; Dati, Gabriele; Zaratin, Paola; Perucca, Laura; Battaglia, Mario A

    2015-01-01

    The funding policy of research projects often relies on scores assigned by a panel of experts (referees). The non-linear nature of raw scores and the severity and inconsistency of individual raters may generate unfair numeric project rankings. Rasch measurement (many-facets version, MFRM) provides a valid alternative to scoring. MFRM was applied to the scores achieved by 75 research projects on multiple sclerosis sent in response to a previous annual call by FISM-Italian Foundation for Multiple Sclerosis. This allowed to simulate, a posteriori, the impact of MFRM on the funding scenario. The applications were each scored by 2 to 4 independent referees (total = 131) on a 10-item, 0-3 rating scale called FISM-ProQual-P. The rotation plan assured "connection" of all pairs of projects through at least 1 shared referee.The questionnaire fulfilled satisfactorily the stringent criteria of Rasch measurement for psychometric quality (unidimensionality, reliability and data-model fit). Arbitrarily, 2 acceptability thresholds were set at a raw score of 21/30 and at the equivalent Rasch measure of 61.5/100, respectively. When the cut-off was switched from score to measure 8 out of 18 acceptable projects had to be rejected, while 15 rejected projects became eligible for funding. Some referees, of various severity, were grossly inconsistent (z-std fit indexes less than -1.9 or greater than 1.9). The FISM-ProQual-P questionnaire seems a valid and reliable scale. MFRM may help the decision-making process for allocating funds to MS research projects but also in other fields. In repeated assessment exercises it can help the selection of reliable referees. Their severity can be steadily calibrated, thus obviating the need to connect them with other referees assessing the same projects.

  17. Human reliability

    International Nuclear Information System (INIS)

    Embrey, D.E.

    1987-01-01

    Concepts and techniques of human reliability have been developed and are used mostly in probabilistic risk assessment. For this, the major application of human reliability assessment has been to identify the human errors which have a significant effect on the overall safety of the system and to quantify the probability of their occurrence. Some of the major issues within human reliability studies are reviewed and it is shown how these are applied to the assessment of human failures in systems. This is done under the following headings; models of human performance used in human reliability assessment, the nature of human error, classification of errors in man-machine systems, practical aspects, human reliability modelling in complex situations, quantification and examination of human reliability, judgement based approaches, holistic techniques and decision analytic approaches. (UK)

  18. Cross-cultural adaptation of Kerlan-Jobe Orthopaedic Clinic shoulder and elbow score: Reliability and validity in Turkish-speaking overhead athletes.

    Science.gov (United States)

    Turgut, Elif; Tunay, Volga Bayrakci

    2018-03-09

    Kerlan-Jobe Orthopaedic Clinic Shoulder and Elbow Score (KJOC-SES) is a subjective assessment tool to measure functional status of the upper extremities in overhead athletes. The aim was to translate and culturally adapt the KJOC-SES and to evaluate the psychometric properties of the Turkish version (KJOC-SES-Tr) in overhead athletes. The forward and back-translation method was followed. One hundred and twenty-three overhead athletes completed the KJOC-SES-Tr, the Disabilities of the Arm, Shoulder, and Hand (DASH), and the American Shoulder and Elbow Surgeons Evaluation Form (ASES). Participants were assigned to one of the following subgroups: asymptomatic (playing without pain) or symptomatic (playing with pain, or not playing due to pain). Internal consistency, reliability, construct validity, discriminant validity, and content validity of the KJOC-SES-Tr were tested. The test-retest reliability of the KJOC-SES-Tr was excellent with an interclass coefficient of 0.93. There was a strong correlation between the KJOC-SES-Tr and the DASH and the ASES, indicating that the construct validity was good for all participants. Results of the KJOC-SES-Tr significantly differed between different subgroups and categories of athletes. The floor and ceiling effects were acceptable for symptomatic athletes. The KJOC-SES-Tr was shown to be valid, reliable tool to monitor the return to sports following injuries in athletes. Copyright © 2018 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.

  19. Standardization, Validity and Reliability Study of Gülhane Aphasia Test-2 (GAT-2

    Directory of Open Access Journals (Sweden)

    İlknur Maviş

    2007-04-01

    Full Text Available OBJECTIVE: Gülhane Aphasia Test-2 (GAT-2 has been developed to show the presence of a language disorder ‘aphasia’ and to give the clinician implications for the accompanying speech disorders such as apraxia and dysarthria. OBJECTIVE: The aim of the study was to report standardization, validity and reliability study of GAT-2. METHODS: : 10 healthy individuals were tested initially for the pilot study. 134 healthy individual was included to the standardization study and 30 individuals with aphasia and 11 individuals with right brain injury was included to the validation study. The inter group GAT-2 score differentiations and the effects of age, years of education, sex variances were observed. GAT-2 cut-off scores were calculated by the scores of healthy individuals. GAT-2 test-retest reliability and inter-observer reliability was calculated. RESULTS: Healthy individuals’ GAT-2 scores were significantly different from the GAT-2 scores of aphasic patients, but not from right brain injured patients’. Healthy individuals’ GAT-2 scores were not affected from the sex, age variances but from years of education, so cut-off scores were calculated by this variance. GAT-2 scores of aphasic patients were not affected from age, sex and years of education. Test-retest and inter-observer reliability and internal consistency results showed that GAT-2 is a highly reliable aphasia screening test. CONCLUSION: GAT-2 was found to be a standardized, highly reliable and a valid aphasia test for Turkish stroke patients with aphasia

  20. MRI interrReader and intra-reader reliabilities for assessing injury morphology and posterior ligamentous complex integrity of the spine according to the thoracolumbar injury classification system and severity score

    International Nuclear Information System (INIS)

    Lee, Guen Young; Lee, Joon Woo; Choi, Seung Woo; Lim, Hyun Jin; Sun, Hye Young; Kang, Yu Suhn; Kang, Heung Sik; Chai, Jee Won; Kim, Su Jin

    2015-01-01

    To evaluate spine magnetic resonance imaging (MRI) inter-reader and intra-reader reliabilities using the thoracolumbar injury classification system and severity score (TLICS) and to analyze the effects of reader experience on reliability and the possible reasons for discordant interpretations. Six radiologists (two senior, two junior radiologists, and two residents) independently scored 100 MRI examinations of thoracolumbar spine injuries to assess injury morphology and posterior ligamentous complex (PLC) integrity according to the TLICS. Inter-reader and intra-reader agreements were determined and analyzed according to the number of years of radiologist experience. Inter-reader agreement between the six readers was moderate (k = 0.538 for the first and 0.537 for the second review) for injury morphology and fair to moderate (k = 0.440 for the first and 0.389 for the second review) for PLC integrity. No significant difference in inter-reader agreement was observed according to the number of years of radiologist experience. Intra-reader agreements showed a wide range (k = 0.538-0.822 for injury morphology and 0.423-0.616 for PLC integrity). Agreement was achieved in 44 for the first and 45 for the second review about injury morphology, as well as in 41 for the first and 38 for the second review of PLC integrity. A positive correlation was detected between injury morphology score and PLC integrity. The reliability of MRI for assessing thoracolumbar spinal injuries according to the TLICS was moderate for injury morphology and fair to moderate for PLC integrity, which may not be influenced by radiologist' experience

  1. Reliability of four experimental mechanical pain tests in children

    DEFF Research Database (Denmark)

    Søe, Ann-Britt Langager; Thomsen, Lise L; Tornoe, Birte

    2013-01-01

    In order to study pain in children, it is necessary to determine whether pain measurement tools used in adults are reliable measurements in children. The aim of this study was to explore the intrasession reliability of pressure pain thresholds (PPT) in healthy children. Furthermore, the aim was a...... was also to study the intersession reliability of the following four tests: (1) Total Tenderness Score; (2) PPT; (3) Visual Analog Scale score at suprapressure pain threshold; and (4) area under the curve (stimulus-response functions for pressure versus pain).......In order to study pain in children, it is necessary to determine whether pain measurement tools used in adults are reliable measurements in children. The aim of this study was to explore the intrasession reliability of pressure pain thresholds (PPT) in healthy children. Furthermore, the aim...

  2. Cross-cultural adaptation and validation of the Portuguese version of the Oxford Shoulder Score (OSS).

    Science.gov (United States)

    Gonçalves, Rui Soles; Caldeira, Carolina Quintal; Rodrigues, Mónica Vieira; Felícia, Sabine Cardoso; Cavalheiro, Luís Manuel; Ferreira, Pedro Lopes

    2018-03-08

    To translate and culturally adapt the Oxford Shoulder Score (OSS) to the European Portuguese language, and to test its reliability (internal consistency, reproducibility and measurement error) and validity (construct validity). The OSS Portuguese version was obtained through translations, back-translations, consensus panels, clinical review and cognitive pre-test. Portuguese OSS, Disabilities of the Arm, Shoulder and Hand (DASH) questionnaires, and the visual analogue scales of pain at rest [VAS rest] and during movement [VAS movement] were applied to 111 subjects with shoulder pain (degenerative or inflammatory disorders) and recommended for physical therapy. A clinical and sociodemographic questionnaire was also applied. The reliability was good, with a Cronbach's alpha coefficient of 0.90, an intraclass correlation coefficient (ICC) of 0.92, a standard error of measurement (SEM) of 2.59 points and a smallest detectable change (SDC) of 7.18 points. Construct validity was supported by the confirmation of three initial hypotheses involving expected significant correlation between OSS and other measures (DASH, VAS rest and VAS movement) and between OSS and the number of days of work absenteeism. The Portuguese OSS version presented suitable psychometric properties, in terms of reliability (internal consistency, reproducibility and measurement error) and validity (construct validity).

  3. D-score: a search engine independent MD-score.

    Science.gov (United States)

    Vaudel, Marc; Breiter, Daniela; Beck, Florian; Rahnenführer, Jörg; Martens, Lennart; Zahedi, René P

    2013-03-01

    While peptides carrying PTMs are routinely identified in gel-free MS, the localization of the PTMs onto the peptide sequences remains challenging. Search engine scores of secondary peptide matches have been used in different approaches in order to infer the quality of site inference, by penalizing the localization whenever the search engine similarly scored two candidate peptides with different site assignments. In the present work, we show how the estimation of posterior error probabilities for peptide candidates allows the estimation of a PTM score called the D-score, for multiple search engine studies. We demonstrate the applicability of this score to three popular search engines: Mascot, OMSSA, and X!Tandem, and evaluate its performance using an already published high resolution data set of synthetic phosphopeptides. For those peptides with phosphorylation site inference uncertainty, the number of spectrum matches with correctly localized phosphorylation increased by up to 25.7% when compared to using Mascot alone, although the actual increase depended on the fragmentation method used. Since this method relies only on search engine scores, it can be readily applied to the scoring of the localization of virtually any modification at no additional experimental or in silico cost. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Interphalangeal Osteoarthritis Radiographic Simplified (iOARS) score: a radiographic method to detect osteoarthritis of the interphalangeal finger joints based on its histopathological alterations.

    Science.gov (United States)

    Sunk, Ilse-Gerlinde; Amoyo-Minar, Love; Stamm, Tanja; Haider, Stefanie; Niederreiter, Birgit; Supp, Gabriela; Soleiman, Afschin; Kainberger, Franz; Smolen, Josef S; Bobacz, Klaus

    2014-11-01

    To develop a radiographic score for assessment of hand osteoarthritis (OA) that is based on histopathological alterations of the distal (DIP) and proximal (PIP) interphalangeal joints. DIP and PIP joints were obtained from corpses (n=40). Plain radiographies of these joints were taken. Joint samples were prepared for histological analysis; cartilage damage was graded according to the Mankin scoring system. A 2×2 Fisher's exact test was applied to define those radiographic features most likely to be associated with histological alterations. Receiver operating characteristic curves were analysed to determine radiographic thresholds. Intraclass correlation coefficients (ICC) estimated intra- and inter-reader variability. Spearman's correlation was applied to examine the relationship between our score and histopathological changes. Differences between groups were determined by a Student's t test. The Interphalangeal Osteoarthritis Radiographic Simplified (iOARS) score is presented. The score is based on histopathological changes of DIP and PIP joints and follows a simple dichotomy whether OA is present or not. The iOARS score relies on three equally ranked radiographic features (osteophytes, joint space narrowing and subchondral sclerosis). For both DIP and PIP joints, the presence of one x-ray features reflects interphalangeal OA. Sensitivity and specificity for DIP joints were 92.3% and 90.9%, respectively, and 75% and 100% for PIP joints. All readers were able to reproduce their own readings in DIP and PIP joints after 4 weeks. The overall agreement between the three readers was good; ICCs ranged from 0.945 to 0.586. Additionally, outcomes of the iOARS score in a hand OA cohort revealed a higher prevalence of interphalangeal joint OA compared with the Kellgren and Lawrence score. The iOARS score is uniquely based on histopathological alterations of the interphalangeal joints in order to reliably determine OA of the DIP and PIP joints radiographically. Its high

  5. Validation of the Simplified Motor Score in patients with traumatic ...

    African Journals Online (AJOL)

    Background. This study used data from a large prospectively entered database to assess the efficacy of the motor score (M score) component of the Glasgow Coma Scale (GCS) and the Simplified Motor Score (SMS) in predicting overall outcome in patients with traumatic brain injury (TBI). Objective. To safely and reliably ...

  6. Reliability of a Retail Food Store Survey and Development of an Accompanying Retail Scoring System to Communicate Survey Findings and Identify Vendors for Healthful Food and Marketing Initiatives

    Science.gov (United States)

    Ghirardelli, Alyssa; Quinn, Valerie; Sugerman, Sharon

    2011-01-01

    Objective: To develop a retail grocery instrument with weighted scoring to be used as an indicator of the food environment. Participants/Setting: Twenty six retail food stores in low-income areas in California. Intervention: Observational. Main Outcome Measure(s): Inter-rater reliability for grocery store survey instrument. Description of store…

  7. Reliable computer systems.

    Science.gov (United States)

    Wear, L L; Pinkert, J R

    1993-11-01

    In this article, we looked at some decisions that apply to the design of reliable computer systems. We began with a discussion of several terms such as testability, then described some systems that call for highly reliable hardware and software. The article concluded with a discussion of methods that can be used to achieve higher reliability in computer systems. Reliability and fault tolerance in computers probably will continue to grow in importance. As more and more systems are computerized, people will want assurances about the reliability of these systems, and their ability to work properly even when sub-systems fail.

  8. Reliability of Modern Scores to Predict Long-Term Mortality After Isolated Aortic Valve Operations.

    Science.gov (United States)

    Barili, Fabio; Pacini, Davide; D'Ovidio, Mariangela; Ventura, Martina; Alamanni, Francesco; Di Bartolomeo, Roberto; Grossi, Claudio; Davoli, Marina; Fusco, Danilo; Perucci, Carlo; Parolari, Alessandro

    2016-02-01

    Contemporary scores for estimating perioperative death have been proposed to also predict also long-term death. The aim of the study was to evaluate the performance of the updated European System for Cardiac Operative Risk Evaluation II, The Society of Thoracic Surgeons Predicted Risk of Mortality score, and the Age, Creatinine, Left Ventricular Ejection Fraction score for predicting long-term mortality in a contemporary cohort of isolated aortic valve replacement (AVR). We also sought to develop for each score a simple algorithm based on predicted perioperative risk to predict long-term survival. Complete data on 1,444 patients who underwent isolated AVR in a 7-year period were retrieved from three prospective institutional databases and linked with the Italian Tax Register Information System. Data were evaluated with performance analyses and time-to-event semiparametric regression. Survival was 83.0% ± 1.1% at 5 years and 67.8 ± 1.9% at 8 years. Discrimination and calibration of all three scores both worsened for prediction of death at 1 year and 5 years. Nonetheless, a significant relationship was found between long-term survival and quartiles of scores (p System for Cardiac Operative Risk Evaluation II, 1.34 (95% CI, 1.28 to 1.40) for the Society of Thoracic Surgeons score, and 1.08 (95% CI, 1.06 to 1.10) for the Age, Creatinine, Left Ventricular Ejection Fraction score. The predicted risk generated by European System for Cardiac Operative Risk Evaluation II, The Society of Thoracic Surgeons score, and Age, Creatinine, Left Ventricular Ejection Fraction scores cannot also be considered a direct estimate of the long-term risk for death. Nonetheless, the three scores can be used to derive an estimate of long-term risk of death in patients who undergo isolated AVR with the use of a simple algorithm. Copyright © 2016 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.

  9. Development of the siriraj clinical asthma score.

    Science.gov (United States)

    Vichyanond, Pakit; Veskitkul, Jittima; Rienmanee, Nuanphong; Pacharn, Punchama; Jirapongsananuruk, Orathai; Visitsunthorn, Nualanong

    2013-09-01

    Acute asthmatic attack in children commonly occurs despite the introduction of effective controllers such as inhaled corticosteroids and leukotriene modifiers. Treatment of acute asthmatic attack requires proper evaluation of attack severity and appropriate selection of medical therapy. In children, measurement of lung function is difficult during acute attack and thus clinical asthma scoring may aid physician in making further decision regarding treatment and admission. We enrolled 70 children with acute asthmatic attack with age range from 1 to 12 years (mean ± SD = 51.5 ± 31.8 months) into the study. Twelve selected asthma severity items were assessed by 2 independent observers prior to administration of salbutamol nebulization (up to 3 doses at 20 minutes interval). Decision for further therapy and admission was made by emergency department physician. Three different scoring systems were constructed from items with best validity. Sensitivity, specificity and accuracy of these scores were assessed. Inter-rater reliability was assessed for each score. Review of previous scoring systems was also conducted and reported. Three severity items had poor validity, i.e., cyanosis, depressed cerebral function, and I:E ratio (p > 0.05). Three items had poor inter-rater reliability, i.e., breath sound quality, air entry, and I:E ratio. These items were omitted and three new clinical scores were constructed from the remaining items. Clinical scoring system comprised retractions, dyspnea, O2 saturation, respiratory rate and wheezing (rangeof score 0-10) gave the best accuracy and inter-rater variability and were chosen for clinical use-Siriraj Clinical Asthma Score (SCAS). A Clinical Asthma Score that is simple, relatively easy to administer and with good validity and variability is essential for treatment of acute asthma in children. Several good candidate scores have been introduced in the past. We described the development of the Siriraj Clinical Asthma Score (SCAS) in

  10. Reliability of the CARE rule and the HEART score to rule out an acute coronary syndrome in non-traumatic chest pain patients.

    Science.gov (United States)

    Moumneh, Thomas; Richard-Jourjon, Vanessa; Friou, Emilie; Prunier, Fabrice; Soulie-Chavignon, Caroline; Choukroun, Jacques; Mazet-Guilaumé, Betty; Riou, Jérémie; Penaloza, Andréa; Roy, Pierre-Marie

    2018-03-02

    In patients consulting in the Emergency Department for chest pain, a HEART score ≤ 3 has been shown to rule out an acute coronary syndrome (ACS) with a low risk of major adverse cardiac event (MACE) occurrence. A negative CARE rule (≤ 1) that stands for the first four elements of the HEART score may have similar rule-out reliability without troponin assay requirement. We aim to prospectively assess the performance of the CARE rule and of the HEART score to predict MACE in a chest pain population. Prospective two-center non-interventional study. Patients admitted to the ED for non-traumatic chest pain were included, and followed-up at 6 weeks. The main study endpoint was the 6-week rate of MACE (myocardial infarction, coronary angioplasty, coronary bypass, and sudden unexplained death). 641 patients were included, of whom 9.5% presented a MACE at 6 weeks. The CARE rule was negative for 31.2% of patients, and none presented a MACE during follow-up [0, 95% confidence interval: (0.0-1.9)]. The HEART score was ≤ 3 for 63.0% of patients, and none presented a MACE during follow-up [0% (0.0-0.9)]. With an incidence below 2% in the negative group, the CARE rule seemed able to safely rule out a MACE without any biological test for one-third of patients with chest pain and the HEART score for another third with a single troponin assay.

  11. Applying Computational Scoring Functions to Assess Biomolecular Interactions in Food Science: Applications to the Estrogen Receptors

    Directory of Open Access Journals (Sweden)

    Francesca Spyrakis

    2016-10-01

    Thus, key computational medicinal chemistry methods like molecular dynamics can be used to decipher protein flexibility and to obtain stable models for docking and scoring in food-related studies, and virtual screening is increasingly being applied to identify molecules with potential to act as endocrine disruptors, food mycotoxins, and new nutraceuticals [3,4,5]. All of these methods and simulations are based on protein-ligand interaction phenomena, and represent the basis for any subsequent modification of the targeted receptor's or enzyme's physiological activity. We describe here the energetics of binding of biological complexes, providing a survey of the most common and successful algorithms used in evaluating these energetics, and we report case studies in which computational techniques have been applied to food science issues. In particular, we explore a handful of studies involving the estrogen receptors for which we have a long-term interest.

  12. A Categorical Instrument for Scoring Second Language Writing Skills.

    Science.gov (United States)

    Brown, James Dean; Bailey, Kathleen M.

    1984-01-01

    Discusses a study of the reliability of a categorical instrument for evaluating compositions written by upper intermediate university English as a second language students. The instrument tests organization, logical development of ideas, grammar, mechanics, and style. Results indicate that the scoring instrument is moderately reliable. (SED)

  13. Reliability and validity of a modified MEDFICTS dietary fat screener in South African schoolchildren are determined by use and outcome measures.

    Science.gov (United States)

    Wenhold, Friedeburg Anna Maria; MacIntyre, Una Elizabeth; Rheeder, Paul

    2014-06-01

    In South Africa, noncommunicable diseases and obesity are increasing and also affect children. No validated assessment tools for fat intake are available. To determine test-retest reliability and relative validity of a pictorial modified meats, eggs, dairy, fried foods, fats in baked goods, convenience foods, table fats, and snacks (MEDFICTS) dietary fat screener. We determined test-retest reliability and diagnostic accuracy with the modified MEDFICTS as the index test and a 3-day weighed food record and parental completion of the screener as primary and secondary reference methods, respectively. Grade-six learners (aged 12 years, 4 months) in an urban, middle-class school (n=93) and their parents (n=72). Portion size, frequency of intake, final score, and classification of fat intake of the modified MEDFICTS, and percent energy from fat, saturated fatty acids, and cholesterol of the food record. For categorical data agreement was based on kappa statistics, McNemar's test for symmetry, and diagnostic performance parameters. Continuous data were analyzed with correlations, mean differences, the Bland-Altman method, and receiver operating characteristics. The classification of fat intake by the modified MEDFICTS was test-retest reliable. Final scores of the group did not differ between administrations (P=0.86). The correlation of final scores between administrations was significant for girls only (r=0.58; P=0.01). Reliability of portion size and frequency of intake scores depended on the food category. For girls the screener final score was significantly (P90%), but chance corrected agreement between the classifications was poor. Parents did not agree with their children. Test-retest reliability and relative validity of a modified MEDFICTS dietary fat screener in South African schoolchildren depended on the use and outcome measures applied. Copyright © 2014 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.

  14. Unsupervised deep learning applied to breast density segmentation and mammographic risk scoring

    DEFF Research Database (Denmark)

    Kallenberg, Michiel Gijsbertus J.; Petersen, Peter Kersten; Nielsen, Mads

    2016-01-01

    Mammographic risk scoring has commonly been automated by extracting a set of handcrafted features from mammograms, and relating the responses directly or indirectly to breast cancer risk. We present a method that learns a feature hierarchy from unlabeled data. When the learned features are used...... as the input to a simple classifier, two different tasks can be addressed: i) breast density segmentation, and ii) scoring of mammographic texture. The proposed model learns features at multiple scales. To control the models capacity a novel sparsity regularizer is introduced that incorporates both lifetime...... and population sparsity. We evaluated our method on three different clinical datasets. Our state-of-the-art results show that the learned breast density scores have a very strong positive relationship with manual ones, and that the learned texture scores are predictive of breast cancer. The model is easy...

  15. [Validating the Spanish version of the Nursing Activities Score].

    Science.gov (United States)

    Sánchez-Sánchez, M M; Arias-Rivera, S; Fraile-Gamo, M P; Thuissard-Vasallo, I J; Frutos-Vivar, F

    2015-01-01

    Validating workload scores ensures that they are appropriate for the purpose for which they were developed. To validate the Nursing Activities Score (NAS) Spanish version. Observational and prospective study. 1,045 patients who were admitted to a medical-surgical unit and a serious burns unit in 2006 were included. The nurse in charge assessed patient workloads by Nine Equivalent of Nursing Manpower use Score and NAS. To assess the internal consistency of the measurements of NAS, item-test correlations, Cronbach's α and Cronbach's α corrected by omitting each of the items were calculated. The intraobserver and interobserver reliability were assessed with the intraclass correlation coefficient by viewing recordings and Kappa (interobserver reliability) was estimated. For the analysis of internal validity, a factorial principal components analysis was performed. Convergent validity was assessed using the Spearman correlation coefficient values obtained from the Nine Equivalent of Nursing Manpower use Score and Spanish-NAS scales. For internal consistency, 164 questionnaires were analysed and a Cronbach's α of 0.373 was calculated. The intraclass correlation coefficient for intraobserver reliability estimate was 0.837 (95% IC: 0.466-0.950) and 0.662 (95% IC: 0.033-0.882) for interobserver reliability. The estimated kappa was 0.371. For internal validity, exploratory factor analysis showed that the first item explained 58.9% of the variance of the questionnaire. For convergent validity 1006 questionnaires were included and a Spearman correlation coefficient of 0.746 was observed. The psychometric properties of Spanish-NAS are acceptable. Copyright © 2014 Elsevier España, S.L.U. y SEEIUC. All rights reserved.

  16. Reliability, construct and criterion validity of the KIDSCREEN-10 score: a short measure for children and adolescents’ well-being and health-related quality of life

    Science.gov (United States)

    Erhart, Michael; Rajmil, Luis; Herdman, Michael; Auquier, Pascal; Bruil, Jeanet; Power, Mick; Duer, Wolfgang; Abel, Thomas; Czemy, Ladislav; Mazur, Joanna; Czimbalmos, Agnes; Tountas, Yannis; Hagquist, Curt; Kilroe, Jean

    2010-01-01

    Background To assess the criterion and construct validity of the KIDSCREEN-10 well-being and health-related quality of life (HRQoL) score, a short version of the KIDSCREEN-52 and KIDSCREEN-27 instruments. Methods The child self-report and parent report versions of the KIDSCREEN-10 were tested in a sample of 22,830 European children and adolescents aged 8–18 and their parents (n = 16,237). Correlation with the KIDSCREEN-52 and associations with other generic HRQoL measures, physical and mental health, and socioeconomic status were examined. Score differences by age, gender, and country were investigated. Results Correlations between the 10-item KIDSCREEN score and KIDSCREEN-52 scales ranged from r = 0.24 to 0.72 (r = 0.27–0.72) for the self-report version (proxy-report version). Coefficients below r = 0.5 were observed for the KIDSCREEN-52 dimensions Financial Resources and Being Bullied only. Cronbach alpha was 0.82 (0.78), test–retest reliability was ICC = 0.70 (0.67) for the self- (proxy-)report version. Correlations between other children self-completed HRQoL questionnaires and KIDSCREEN-10 ranged from r = 0.43 to r = 0.63 for the KIDSCREEN children self-report and r = 0.22–0.40 for the KIDSCREEN parent proxy report. Known group differences in HRQoL between physically/mentally healthy and ill children were observed in the KIDSCREEN-10 self and proxy scores. Associations with self-reported psychosomatic complaints were r = −0.52 (−0.36) for the KIDSCREEN-10 self-report (proxy-report). Statistically significant differences in KIDSCREEN-10 self and proxy scores were found by socioeconomic status, age, and gender. Conclusions Our results indicate that the KIDSCREEN-10 provides a valid measure of a general HRQoL factor in children and adolescents, but the instrument does not represent well most of the single dimensions of the original KIDSCREEN-52. Test–retest reliability was slightly below a priori defined thresholds. PMID:20668950

  17. A reliable parameter to standardize the scoring of stem cell spheres.

    Directory of Open Access Journals (Sweden)

    Xiaochen Zhou

    Full Text Available Sphere formation assay is widely used in selection and enrichment of normal stem cells or cancer stem cells (CSCs, also known as tumor initiating cells (TICs, based on their ability to grow in serum-free suspension culture for clonal proliferation. However, there is no standardized parameter to accurately score the spheres, which should be reflected by both the number and size of the spheres. Here we define a novel parameter, designated as Standardized Sphere Score (SSS, which is expressed by the total volume of selected spheres divided by the number of cells initially plated. SSS was validated in quantification of both tumor spheres from cancer cell lines and embryonic bodies (EB from mouse embryonic stem cells with high sensitivity and reproducibility.

  18. Validity of GRE General Test scores and TOEFL scores for graduate admission to a technical university in Western Europe

    Science.gov (United States)

    Zimmermann, Judith; von Davier, Alina A.; Buhmann, Joachim M.; Heinimann, Hans R.

    2018-01-01

    Graduate admission has become a critical process in tertiary education, whereby selecting valid admissions instruments is key. This study assessed the validity of Graduate Record Examination (GRE) General Test scores for admission to Master's programmes at a technical university in Europe. We investigated the indicative value of GRE scores for the Master's programme grade point average (GGPA) with and without the addition of the undergraduate GPA (UGPA) and the TOEFL score, and of GRE scores for study completion and Master's thesis performance. GRE scores explained 20% of the variation in the GGPA, while additional 7% were explained by the TOEFL score and 3% by the UGPA. Contrary to common belief, the GRE quantitative reasoning score showed only little explanatory power. GRE scores were also weakly related to study progress but not to thesis performance. Nevertheless, GRE and TOEFL scores were found to be sensible admissions instruments. Rigorous methodology was used to obtain highly reliable results.

  19. Severity score system for progressive myelopathy: development and validation of a new clinical scale

    Directory of Open Access Journals (Sweden)

    R.M. Castilhos

    2012-07-01

    Full Text Available Progressive myelopathies can be secondary to inborn errors of metabolism (IEM such as mucopolysaccharidosis, mucolipidosis, and adrenomyeloneuropathy. The available scale, Japanese Orthopaedic Association (JOA score, was validated only for degenerative vertebral diseases. Our objective is to propose and validate a new scale addressing progressive myelopathies and to present validating data for JOA in these diseases. A new scale, Severity Score System for Progressive Myelopathy (SSPROM, covering motor disability, sphincter dysfunction, spasticity, and sensory losses. Inter- and intra-rater reliabilities were measured. External validation was tested by applying JOA, the Expanded Disability Status Scale (EDSS, the Barthel index, and the Osame Motor Disability Score. Thirty-eight patients, 17 with adrenomyeloneuropathy, 3 with mucopolysaccharidosis I, 3 with mucopolysaccharidosis IV, 2 with mucopolysaccharidosis VI, 2 with mucolipidosis, and 11 with human T-cell lymphotropic virus type-1 (HTLV-1-associated myelopathy participated in the study. The mean ± SD SSPROM and JOA scores were 74.6 ± 11.4 and 12.4 ± 2.3, respectively. Construct validity for SSPROM (JOA: r = 0.84, P < 0.0001; EDSS: r = -0.83, P < 0.0001; Barthel: r = 0.56, P < 0.002; Osame: r = -0.94, P < 0.0001 and reliability (intra-rater: r = 0.83, P < 0.0001; inter-rater: r = 0.94, P < 0.0001 were demonstrated. The metric properties of JOA were similar to those found in SSPROM. Several clinimetric requirements were met for both SSPROM and JOA scales. Since SSPROM has a wider range, it should be useful for follow-up studies on IEM myelopathies.

  20. Validity, Reliability and Standardization Study of the Language Assessment Test for Aphasia

    Directory of Open Access Journals (Sweden)

    Bülent Toğram

    2012-09-01

    Full Text Available OBJECTIVE: Aphasia assessment is the first step towards a well- founded language therapy. Language tests need to consider cultural as well as typological linguistic aspects of a given language. This study was designed to determine the standardization, validity and reliability of Language Assessment Test for Aphasia, which consists of eight subtests including spontaneous speech and language, auditory comprehension, repetition, naming, reading, grammar, speech acts, and writing. METHODS: The test was administered to 282 healthy participants and 92 aphasic participants in age, education and gender matched groups. The validity study of the test was investigated with analysis of content, structure and criterion-related validity. For reliability of the test, the analysis of internal consistency, stability and equivalence reliability was conducted. The influence of variables on healhty participants’ sub-test scores, test score and language score was examined. According to significant differences, norms and cut-off scores based on language score were determined. RESULTS: The group with aphasia performed highly lower than healthy participants on subtest, test and language scores. The test scores of healthy group were mostly affected by age and educational level but not affected by gender. According to significant differences, age and educational level for both groups were determined. Considering age and educational levels, the reference values for the cut-off scores were presented. CONCLUSION: The test was found to be a highly reliable and valid aphasia test for Turkish- speaking aphasic patients either in Turkey or other Turkish communities around the world

  1. Administration and scoring variance on the ADAS-Cog.

    Science.gov (United States)

    Connor, Donald J; Sabbagh, Marwan N

    2008-11-01

    The Alzheimer's Disease Assessment Scale - Cognitive (ADAS-Cog) is the most commonly used primary outcome instrument in clinical trials for treatments of dementia. Variations in forms, administration procedures and scoring rules, along with rater turnover and intra-rater drift may decrease the reliability of the instrument. A survey of possible variations in the ADAS-Cog was administered to 26 volunteer raters at a clinical trials meeting. Results indicate notable protocol variations in the forms used, administration procedures, and scoring rules. Since change over time is used to determine treatment effect in clinical trials, standardizing the instrument's ambiguities and addressing common problems will greatly increase the instrument's reliability and thereby enhance its sensitivity to treatment effects.

  2. Structural reliability analysis applied to pipeline risk analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gardiner, M. [GL Industrial Services, Loughborough (United Kingdom); Mendes, Renato F.; Donato, Guilherme V.P. [PETROBRAS S.A., Rio de Janeiro, RJ (Brazil)

    2009-07-01

    Quantitative Risk Assessment (QRA) of pipelines requires two main components to be provided. These are models of the consequences that follow from some loss of containment incident, and models for the likelihood of such incidents occurring. This paper describes how PETROBRAS have used Structural Reliability Analysis for the second of these, to provide pipeline- and location-specific predictions of failure frequency for a number of pipeline assets. This paper presents an approach to estimating failure rates for liquid and gas pipelines, using Structural Reliability Analysis (SRA) to analyze the credible basic mechanisms of failure such as corrosion and mechanical damage. SRA is a probabilistic limit state method: for a given failure mechanism it quantifies the uncertainty in parameters to mathematical models of the load-resistance state of a structure and then evaluates the probability of load exceeding resistance. SRA can be used to benefit the pipeline risk management process by optimizing in-line inspection schedules, and as part of the design process for new construction in pipeline rights of way that already contain multiple lines. A case study is presented to show how the SRA approach has recently been used on PETROBRAS pipelines and the benefits obtained from it. (author)

  3. Reliability of provocative tests of motion sickness susceptibility

    Science.gov (United States)

    Calkins, D. S.; Reschke, M. F.; Kennedy, R. S.; Dunlop, W. P.

    1987-01-01

    Test-retest reliability values were derived from motion sickness susceptibility scores obtained from two successive exposures to each of three tests: (1) Coriolis sickness sensitivity test; (2) staircase velocity movement test; and (3) parabolic flight static chair test. The reliability of the three tests ranged from 0.70 to 0.88. Normalizing values from predictors with skewed distributions improved the reliability.

  4. Evaluation of Cardiovascular Risk Scores Applied to NASA's Astronant Corps

    Science.gov (United States)

    Jain, I.; Charvat, J. M.; VanBaalen, M.; Lee, L.; Wear, M. L.

    2014-01-01

    In an effort to improve cardiovascular disease (CVD) risk prediction, this analysis evaluates and compares the applicability of multiple CVD risk scores to the NASA Astronaut Corps which is extremely healthy at selection.

  5. The scoring of movements in sleep.

    Science.gov (United States)

    Walters, Arthur S; Lavigne, Gilles; Hening, Wayne; Picchietti, Daniel L; Allen, Richard P; Chokroverty, Sudhansu; Kushida, Clete A; Bliwise, Donald L; Mahowald, Mark W; Schenck, Carlos H; Ancoli-Israel, Sonia

    2007-03-15

    The International Classification of Sleep Disorders (ICSD-2) has separated sleep-related movement disorders into simple, repetitive movement disorders (such as periodic limb movements in sleep [PLMS], sleep bruxism, and rhythmic movement disorder) and parasomnias (such as REM sleep behavior disorder and disorders of partial arousal, e.g., sleep walking, confusional arousals, night terrors). Many of the parasomnias are characterized by complex behaviors in sleep that appear purposeful, goal directed and voluntary but are outside the conscious awareness of the individual and therefore inappropriate. All of the sleep-related movement disorders described here have specific polysomnographic findings. For the purposes of developing and/or revising specifications and polysomnographic scoring rules, the AASM Scoring Manual Task Force on Movements in Sleep reviewed background literature and executed evidence grading of 81 relevant articles obtained by a literature search of published articles between 1966 and 2004. Subsequent evidence grading identified limited evidence for reliability and/or validity for polysomnographic scoring criteria for periodic limb movements in sleep, REM sleep behavior disorder, and sleep bruxism. Published scoring criteria for rhythmic movement disorder, excessive fragmentary myoclonus, and hypnagogic foot tremor/alternating leg muscle activation were empirical and based on descriptive studies. The literature review disclosed no published evidence defining clinical consequences of excessive fragmentary myoclonus or hypnagogic foot tremor/alternating leg muscle activation. Because of limited or absent evidence for reliability and/or validity, a standardized RAND/UCLA consensus process was employed for recommendation of specific rules for the scoring of sleep-associated movements.

  6. Do medical students’ scores using different assessment instruments predict their scores in clinical reasoning using a computer-based simulation?

    Directory of Open Access Journals (Sweden)

    Fida M

    2015-02-01

    Full Text Available Mariam Fida,1 Salah Eldin Kassab2 1Department of Molecular Medicine, College of Medicine and Medical Sciences, Arabian Gulf University, Manama, Bahrain; 2Department of Medical Education, Faculty of Medicine, Suez Canal University, Ismailia, Egypt Purpose: The development of clinical problem-solving skills evolves over time and requires structured training and background knowledge. Computer-based case simulations (CCS have been used for teaching and assessment of clinical reasoning skills. However, previous studies examining the psychometric properties of CCS as an assessment tool have been controversial. Furthermore, studies reporting the integration of CCS into problem-based medical curricula have been limited. Methods: This study examined the psychometric properties of using CCS software (DxR Clinician for assessment of medical students (n=130 studying in a problem-based, integrated multisystem module (Unit IX during the academic year 2011–2012. Internal consistency reliability of CCS scores was calculated using Cronbach's alpha statistics. The relationships between students' scores in CCS components (clinical reasoning, diagnostic performance, and patient management and their scores in other examination tools at the end of the unit including multiple-choice questions, short-answer questions, objective structured clinical examination (OSCE, and real patient encounters were analyzed using stepwise hierarchical linear regression. Results: Internal consistency reliability of CCS scores was high (α=0.862. Inter-item correlations between students' scores in different CCS components and their scores in CCS and other test items were statistically significant. Regression analysis indicated that OSCE scores predicted 32.7% and 35.1% of the variance in clinical reasoning and patient management scores, respectively (P<0.01. Multiple-choice question scores, however, predicted only 15.4% of the variance in diagnostic performance scores (P<0.01, while

  7. Validity and Reliability of the Korean Version of the Hyperthyroidism Symptom Scale.

    Science.gov (United States)

    Lee, Jie Eun; Lee, Dong Hwa; Oh, Tae Jung; Kim, Kyoung Min; Choi, Sung Hee; Lim, Soo; Park, Young Joo; Park, Do Joon; Jang, Hak Chul; Moon, Jae Hoon

    2018-03-01

    Thyrotoxicosis is a common disease resulting from an excess of thyroid hormones, which affects many organ systems. The clinical symptoms and signs are relatively nonspecific and can vary depending on age, sex, comorbidities, and the duration and cause of the disease. Several symptom rating scales have been developed in an attempt to assess these symptoms objectively and have been applied to diagnosis or to evaluation of the response to treatment. The aim of this study was to assess the reliability and validity of the Korean version of the hyperthyroidism symptom scale (K-HSS). Twenty-eight thyrotoxic patients and 10 healthy subjects completed the K-HSS at baseline and after follow-up at Seoul National University Bundang Hospital. The correlation between K-HSS scores and thyroid function was analyzed. K-HSS scores were compared between baseline and follow-up in patient and control groups. Cronbach's α coefficient was calculated to demonstrate the internal consistency of K-HSS. The mean age of the participants was 34.7±9.8 years and 13 (34.2%) were men. K-HSS scores demonstrated a significant positive correlation with serum free thyroxine concentration and decreased significantly with improved thyroid function. K-HSS scores were highest in subclinically thyrotoxic subjects, lower in patients who were euthyroid after treatment, and lowest in the control group at follow-up, but these differences were not significant. Cronbach's α coefficient for the K-HSS was 0.86. The K-HSS is a reliable and valid instrument for evaluating symptoms of thyrotoxicosis in Korean patients. Copyright © 2018 Korean Endocrine Society.

  8. Validity and Reliability of the Korean Version of the Hyperthyroidism Symptom Scale

    Directory of Open Access Journals (Sweden)

    Jie-Eun Lee

    2018-03-01

    Full Text Available BackgroundThyrotoxicosis is a common disease resulting from an excess of thyroid hormones, which affects many organ systems. The clinical symptoms and signs are relatively nonspecific and can vary depending on age, sex, comorbidities, and the duration and cause of the disease. Several symptom rating scales have been developed in an attempt to assess these symptoms objectively and have been applied to diagnosis or to evaluation of the response to treatment. The aim of this study was to assess the reliability and validity of the Korean version of the hyperthyroidism symptom scale (K-HSS.MethodsTwenty-eight thyrotoxic patients and 10 healthy subjects completed the K-HSS at baseline and after follow-up at Seoul National University Bundang Hospital. The correlation between K-HSS scores and thyroid function was analyzed. K-HSS scores were compared between baseline and follow-up in patient and control groups. Cronbach's α coefficient was calculated to demonstrate the internal consistency of K-HSS.ResultsThe mean age of the participants was 34.7±9.8 years and 13 (34.2% were men. K-HSS scores demonstrated a significant positive correlation with serum free thyroxine concentration and decreased significantly with improved thyroid function. K-HSS scores were highest in subclinically thyrotoxic subjects, lower in patients who were euthyroid after treatment, and lowest in the control group at follow-up, but these differences were not significant. Cronbach's α coefficient for the K-HSS was 0.86.ConclusionThe K-HSS is a reliable and valid instrument for evaluating symptoms of thyrotoxicosis in Korean patients.

  9. Validity and Reliability of the Korean Version of the Hyperthyroidism Symptom Scale

    Science.gov (United States)

    Lee, Dong Hwa

    2018-01-01

    Background Thyrotoxicosis is a common disease resulting from an excess of thyroid hormones, which affects many organ systems. The clinical symptoms and signs are relatively nonspecific and can vary depending on age, sex, comorbidities, and the duration and cause of the disease. Several symptom rating scales have been developed in an attempt to assess these symptoms objectively and have been applied to diagnosis or to evaluation of the response to treatment. The aim of this study was to assess the reliability and validity of the Korean version of the hyperthyroidism symptom scale (K-HSS). Methods Twenty-eight thyrotoxic patients and 10 healthy subjects completed the K-HSS at baseline and after follow-up at Seoul National University Bundang Hospital. The correlation between K-HSS scores and thyroid function was analyzed. K-HSS scores were compared between baseline and follow-up in patient and control groups. Cronbach's α coefficient was calculated to demonstrate the internal consistency of K-HSS. Results The mean age of the participants was 34.7±9.8 years and 13 (34.2%) were men. K-HSS scores demonstrated a significant positive correlation with serum free thyroxine concentration and decreased significantly with improved thyroid function. K-HSS scores were highest in subclinically thyrotoxic subjects, lower in patients who were euthyroid after treatment, and lowest in the control group at follow-up, but these differences were not significant. Cronbach's α coefficient for the K-HSS was 0.86. Conclusion The K-HSS is a reliable and valid instrument for evaluating symptoms of thyrotoxicosis in Korean patients. PMID:29589389

  10. Measuring reliable change in cognition using the Edinburgh Cognitive and Behavioural ALS Screen (ECAS).

    Science.gov (United States)

    Crockford, Christopher; Newton, Judith; Lonergan, Katie; Madden, Caoifa; Mays, Iain; O'Sullivan, Meabhdh; Costello, Emmet; Pinto-Grau, Marta; Vajda, Alice; Heverin, Mark; Pender, Niall; Al-Chalabi, Ammar; Hardiman, Orla; Abrahams, Sharon

    2018-02-01

    Cognitive impairment affects approximately 50% of people with amyotrophic lateral sclerosis (ALS). Research has indicated that impairment may worsen with disease progression. The Edinburgh Cognitive and Behavioural ALS Screen (ECAS) was designed to measure neuropsychological functioning in ALS, with its alternate forms (ECAS-A, B, and C) allowing for serial assessment over time. The aim of the present study was to establish reliable change scores for the alternate forms of the ECAS, and to explore practice effects and test-retest reliability of the ECAS's alternate forms. Eighty healthy participants were recruited, with 57 completing two and 51 completing three assessments. Participants were administered alternate versions of the ECAS serially (A-B-C) at four-month intervals. Intra-class correlation analysis was employed to explore test-retest reliability, while analysis of variance was used to examine the presence of practice effects. Reliable change indices (RCI) and regression-based methods were utilized to establish change scores for the ECAS alternate forms. Test-retest reliability was excellent for ALS Specific, ALS Non-Specific, and ECAS Total scores of the combined ECAS A, B, and C (all > .90). No significant practice effects were observed over the three testing sessions. RCI and regression-based methods produced similar change scores. The alternate forms of the ECAS possess excellent test-retest reliability in a healthy control sample, with no significant practice effects. The use of conservative RCI scores is recommended. Therefore, a change of ≥8, ≥4, and ≥9 for ALS Specific, ALS Non-Specific, and ECAS Total score is required for reliable change.

  11. Psychometrics Matter in Health Behavior: A Long-term Reliability Generalization Study.

    Science.gov (United States)

    Pickett, Andrew C; Valdez, Danny; Barry, Adam E

    2017-09-01

    Despite numerous calls for increased understanding and reporting of reliability estimates, social science research, including the field of health behavior, has been slow to respond and adopt such practices. Therefore, we offer a brief overview of reliability and common reporting errors; we then perform analyses to examine and demonstrate the variability of reliability estimates by sample and over time. Using meta-analytic reliability generalization, we examined the variability of coefficient alpha scores for a well-designed, consistent, nationwide health study, covering a span of nearly 40 years. For each year and sample, reliability varied. Furthermore, reliability was predicted by a sample characteristic that differed among age groups within each administration. We demonstrated that reliability is influenced by the methods and individuals from which a given sample is drawn. Our work echoes previous calls that psychometric properties, particularly reliability of scores, are important and must be considered and reported before drawing statistical conclusions.

  12. Reliability and Validity of the Greek Migraine Disability Assessment (MIDAS) Questionnaire.

    Science.gov (United States)

    Oikonomidi, Theodora; Vikelis, Michail; Artemiadis, Artemios; Chrousos, George P; Darviri, Christina

    2018-03-01

    The Migraine Disability Assessment (MIDAS) Questionnaire is a reliable and valid instrument for migraine-related disability. Such a tool is needed to quantify migraine-related disability in the Greek population. This validation study aims to assess the test-retest reliability, internal consistency, item discriminant and convergent validity of the Greek translation of the MIDAS. Adults diagnosed with migraine completed the MIDAS Questionnaire on two occasions 3 weeks apart to assess reliability, and completed the RAND-36 to assess validity. Participants (n = 152) had a median MIDAS score of 24 and mostly severe disability (58% were grade IV). The test-retest reliability analysis (N = 59) revealed excellent reliability for the total score. Internal consistency was α = 0.71 for initial and α = 0.82 for retest completion. For item discriminant validity, the correlations between each question and the total score were significant, with high correlations for questions 2-5 (range 0.67 ≤ r ≤ 0.79; p MIDAS score tended to have better wellbeing. Psychometric properties are comparable with those of other published validation studies of the MIDAS and the original. Findings on question 1 show that missing work/school days may be closely related with increased affect issues. The Greek version of the MIDAS Questionnaire has good reliability and validity. This study allowed for cross-cultural comparability of research findings.

  13. Evaluating the test-retest reliability of symptom indices associated with the ImPACT post-concussion symptom scale (PCSS).

    Science.gov (United States)

    Merritt, Victoria C; Bradson, Megan L; Meyer, Jessica E; Arnett, Peter A

    2018-05-01

    indices beyond the total symptom score from the PCSS is beneficial. Findings from this study can be applied to athlete samples to assess reliable change in symptoms following concussion.

  14. Validity and reliability of Abbreviated Mental Test Score (AMTS) among older Iranian.

    Science.gov (United States)

    Foroughan, Mahshid; Wahlund, Lars-Olof; Jafari, Zahra; Rahgozar, Mehdi; Farahani, Ida G; Rashedi, Vahid

    2017-11-01

    Cognitive impairment is common among older people and is associated with increased morbidity and mortality. The main aim of this study was to evaluate the validity of the Persian version of the Abbreviated Mental Test Score (AMTS) as a screening tool for dementia. Data were obtained from a cross-sectional study. One hundred and one older adults who were members of Iranian Alzheimer Association and 101 of their siblings were entered into this study by convenient sampling. The Diagnostic and Statistical Manual of Mental Disorders, 4th edition, criteria for diagnosing dementia and the Mini-Mental State Examination were used as the study tools. The gathered data were analyzed by the Mann-Whitney U-test, the Kruskal-Wallis test, Spearman's rank correlation coefficient, and the receiver-operating characteristic. The AMTS could successfully differentiate the dementia group from the non-dementia group. Scores were significantly correlated with Diagnostic and Statistical Manual of Mental Disorders diagnosis for dementia and Mini-Mental State Examination scores (P < 0.001). Educational level (P < 0.001) and male sex (P = 0.015) were positively associated with AMTS, whereas (P < 0.001) was negatively associated with AMTS. Total Cronbach's α coefficient was 0.90. The scores 6 and 7 showed the optimum balance between sensitivity (99% and 94%, respectively) and specificity (85% and 86%, respectively). The Persian version of the AMTS is a valid cognitive assessment tool for older Iranian adults and can be used for dementia screening in Iran. © 2017 Japanese Psychogeriatric Society.

  15. Modified Tuck Jump Assessment: Reliability and Training of Raters

    Directory of Open Access Journals (Sweden)

    Craig A. Smith, Nicole J. Chimera, Monica R. Lininger, Meghan Warren

    2017-09-01

    Full Text Available We are writing with regard to “Intra- and inter-rater reliability of the modified tuck jump assessment,” by Fort-Vanmeerhaeghe et al. (2017 published in the Journal of Sports Science & Medicine. The authors reported on the reliability of the modified Tuck Jump Assessment (TJA. The purpose of the article was twofold: to introduce a new scoring methodology and to report on the interrater and intrarater reliability. The authors found the modified TJA to have excellent interrater reliability (ICC = 0.94, 95% CI = 0.88-0.97 and intrarater reliability (rater 1 ICC = 0.94, 95% CI = 0.88-0.9; rater 2 ICC = 0.96, 95% CI = 0.92-0.98 with experienced raters (n = 2 in a sample of 24 elite volleyball athletes. Overall, we found the study to be well conducted and valuable to the field of injury screening; however, the study did not adequately explain how the raters were trained in the modified TJA to improve consistency of scoring, or the modifications of the individual flaw “excessive contact noise at landing.” This information is necessary to improve the clinical utility of the TJA and direct future reliability studies. The TJA has been changed at least three times in the literature: from the initial introduction (Myer et al., 2006 to the most referenced and detailed protocol (Myer et al., 2011 to the publication under discussion (Fort-Vanmeerhaeghe et al., 2017. The initial test protocol was based upon clinical expertise and has evolved over time as new research emerged and problems arose with the original TJA. Initially, the TJA was scored on a visual analog scale (Myer et al., 2006, changed to a dichotomous scale (0 for no flaw or 1 for flaw present (Myer et al., 2011 and most recently modified using an ordinal scale (Fort-Vanmeerhaeghe et al., 2017. A significant disparity in the reported interrater and intrarater reliability arose with the dichotomously scored TJA, between those involved in the development of the TJA (Herrington et al., 2013

  16. An ultrasound score for knee osteoarthritis

    DEFF Research Database (Denmark)

    Riecke, B F; Christensen, R.; Torp-Pedersen, S

    2014-01-01

    OBJECTIVE: To develop standardized musculoskeletal ultrasound (MUS) procedures and scoring for detecting knee osteoarthritis (OA) and test the MUS score's ability to discern various degrees of knee OA, in comparison with plain radiography and the 'Knee injury and Osteoarthritis Outcome Score' (KOOS......) domains as comparators. METHOD: A cross-sectional study of MUS examinations in 45 patients with knee OA. Validity, reliability, and reproducibility were evaluated. RESULTS: MUS examination for knee OA consists of five separate domains assessing (1) predominantly morphological changes in the medial...... coefficients ranging from 0.75 to 0.97 for the five domains. Construct validity was confirmed with statistically significant correlation coefficients (0.47-0.81, P knee OA. In comparison with standing radiographs...

  17. WebScore: An Effective Page Scoring Approach for Uncertain Web Social Networks

    Directory of Open Access Journals (Sweden)

    Shaojie Qiao

    2011-10-01

    Full Text Available To effectively score pages with uncertainty in web social networks, we first proposed a new concept called transition probability matrix and formally defined the uncertainty in web social networks. Second, we proposed a hybrid page scoring algorithm, called WebScore, based on the PageRank algorithm and three centrality measures including degree, betweenness, and closeness. Particularly,WebScore takes into a full consideration of the uncertainty of web social networks by computing the transition probability from one page to another. The basic idea ofWebScore is to: (1 integrate uncertainty into PageRank in order to accurately rank pages, and (2 apply the centrality measures to calculate the importance of pages in web social networks. In order to verify the performance of WebScore, we developed a web social network analysis system which can partition web pages into distinct groups and score them in an effective fashion. Finally, we conducted extensive experiments on real data and the results show that WebScore is effective at scoring uncertain pages with less time deficiency than PageRank and centrality measures based page scoring algorithms.

  18. Manual and automatic locomotion scoring systems in dairy cows: a review.

    Science.gov (United States)

    Schlageter-Tello, Andrés; Bokkers, Eddie A M; Koerkamp, Peter W G Groot; Van Hertem, Tom; Viazzi, Stefano; Romanini, Carlos E B; Halachmi, Ilan; Bahr, Claudia; Berckmans, Daniël; Lokhorst, Kees

    2014-09-01

    The objective of this review was to describe, compare and evaluate agreement, reliability, and validity of manual and automatic locomotion scoring systems (MLSSs and ALSSs, respectively) used in dairy cattle lameness research. There are many different types of MLSSs and ALSSs. Twenty-five MLSSs were found in 244 articles. MLSSs use different types of scale (ordinal or continuous) and different gait and posture traits need to be observed. The most used MLSS (used in 28% of the references) is based on asymmetric gait, reluctance to bear weight, and arched back, and is scored on a five-level scale. Fifteen ALSSs were found that could be categorized according to three approaches: (a) the kinetic approach measures forces involved in locomotion, (b) the kinematic approach measures time and distance of variables associated to limb movement and some specific posture variables, and (c) the indirect approach uses behavioural variables or production variables as indicators for impaired locomotion. Agreement and reliability estimates were scarcely reported in articles related to MLSSs. When reported, inappropriate statistical methods such as PABAK and Pearson and Spearman correlation coefficients were commonly used. Some of the most frequently used MLSSs were poorly evaluated for agreement and reliability. Agreement and reliability estimates for the original four-, five- or nine-level MLSS, expressed in percentage of agreement, kappa and weighted kappa, showed large ranges among and sometimes also within articles. After the transformation into a two-level scale, agreement and reliability estimates showed acceptable estimates (percentage of agreement ≥ 75%; kappa and weighted kappa ≥ 0.6), but still estimates showed a large variation between articles. Agreement and reliability estimates for ALSSs were not reported in any article. Several ALSSs use MLSSs as a reference for model calibration and validation. However, varying agreement and reliability estimates of MLSSs make a

  19. Reliability and validity of a Swedish language version of the Resilience Scale.

    Science.gov (United States)

    Nygren, Björn; Randström, Kerstin Björkman; Lejonklou, Anna K; Lundman, Beril

    2004-01-01

    The purpose of this study was to test the reliability and validity of the Swedish language version of the Resilience Scale (RS). Participants were 142 adults between 19-85 years of age. Internal consistency reliability, stability over time, and construct validity were evaluated using Cronbach's alpha, principal components analysis with varimax rotation and correlations with scores on the Sense of Coherence Scale (SOC) and the Rosenberg Self-Esteem Scale (RSE). The mean score on the RS was 142 (SD = 15). The possible scores on the RS range from 25 to 175, and scores higher than 146 are considered high. The test-retest correlation was .78. Correlations with the SOC and the RSE were .41 (p Self and Life emerged as components from the principal components analysis. These findings provide evidence for the reliability and validity of the Swedish language version of the RS.

  20. Validity and reliability of the Baecke questionnaire for the evaluation of habitual physical activity among people living with HIV/AIDS

    Directory of Open Access Journals (Sweden)

    Florindo Alex Antonio

    2006-01-01

    Full Text Available This study evaluates the validity and reliability of the Baecke questionnaire on habitual physical activity when applied to a population of HIV/AIDS subjects. Validity was determined by comparing measurements for 30 subjects of peak oxygen uptake, peak workload, and energy expenditure with scores for occupational physical activity (OPA, physical exercise in leisure (PEL, leisure and locomotion activities (LLA, and total score (TS. Reliability was determined by testing and retesting 29 subjects at intervals of 15-30 days. Validity was evaluated with the Pearson correlation and reliability analyses were done using the intraclass correlation, paired Student t-test, and Bland-Altman methods. Peak VO2 and peak workload had significant correlation with PEL (r = 0.41; r = 0.43; respectively. Energy expenditure had a significant correlation with OPA (r = 0.64. The intraclass coefficients were 0.70 or more for OPA, PEL and TS. There was no difference in OPA, PEL, LLA and TS between the two evaluations. The Bland-Altman methods showed that there was good agreement between the measurements for all habitual physical activities scores. Results show that the Baecke questionnaire is valid for the evaluation of habitual physical activity among people living with HIV/AIDS.

  1. Increasing Active Student Responding in a University Applied Behavior Analysis Course: The Effect of Daily Assessment and Response Cards on End of Week Quiz Scores

    Science.gov (United States)

    Malanga, Paul R.; Sweeney, William J.

    2008-01-01

    The study compared the effects of daily assessment and response cards on average weekly quiz scores in an introduction to applied behavior analysis course. An alternating treatments design (Kazdin 1982, "Single-case research designs." New York: Oxford University Press; Cooper et al. 2007, "Applied behavior analysis." Upper Saddle River:…

  2. Facilitating the Interpretation of English Language Proficiency Scores: Combining Scale Anchoring and Test Score Mapping Methodologies

    Science.gov (United States)

    Powers, Donald; Schedl, Mary; Papageorgiou, Spiros

    2017-01-01

    The aim of this study was to develop, for the benefit of both test takers and test score users, enhanced "TOEFL ITP"® test score reports that go beyond the simple numerical scores that are currently reported. To do so, we applied traditional scale anchoring (proficiency scaling) to item difficulty data in order to develop performance…

  3. Reliability calculations

    International Nuclear Information System (INIS)

    Petersen, K.E.

    1986-03-01

    Risk and reliability analysis is increasingly being used in evaluations of plant safety and plant reliability. The analysis can be performed either during the design process or during the operation time, with the purpose to improve the safety or the reliability. Due to plant complexity and safety and availability requirements, sophisticated tools, which are flexible and efficient, are needed. Such tools have been developed in the last 20 years and they have to be continuously refined to meet the growing requirements. Two different areas of application were analysed. In structural reliability probabilistic approaches have been introduced in some cases for the calculation of the reliability of structures or components. A new computer program has been developed based upon numerical integration in several variables. In systems reliability Monte Carlo simulation programs are used especially in analysis of very complex systems. In order to increase the applicability of the programs variance reduction techniques can be applied to speed up the calculation process. Variance reduction techniques have been studied and procedures for implementation of importance sampling are suggested. (author)

  4. The Berg Balance Scale has high intra- and inter-rater reliability but absolute reliability varies across the scale: a systematic review.

    Science.gov (United States)

    Downs, Stephen; Marquez, Jodie; Chiarelli, Pauline

    2013-06-01

    What is the intra-rater and inter-rater relative reliability of the Berg Balance Scale? What is the absolute reliability of the Berg Balance Scale? Does the absolute reliability of the Berg Balance Scale vary across the scale? Systematic review with meta-analysis of reliability studies. Any clinical population that has undergone assessment with the Berg Balance Scale. Relative intra-rater reliability, relative inter-rater reliability, and absolute reliability. Eleven studies involving 668 participants were included in the review. The relative intrarater reliability of the Berg Balance Scale was high, with a pooled estimate of 0.98 (95% CI 0.97 to 0.99). Relative inter-rater reliability was also high, with a pooled estimate of 0.97 (95% CI 0.96 to 0.98). A ceiling effect of the Berg Balance Scale was evident for some participants. In the analysis of absolute reliability, all of the relevant studies had an average score of 20 or above on the 0 to 56 point Berg Balance Scale. The absolute reliability across this part of the scale, as measured by the minimal detectable change with 95% confidence, varied between 2.8 points and 6.6 points. The Berg Balance Scale has a higher absolute reliability when close to 56 points due to the ceiling effect. We identified no data that estimated the absolute reliability of the Berg Balance Scale among participants with a mean score below 20 out of 56. The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects. The review was only able to comment on the absolute reliability of the Berg Balance Scale among people with moderately poor to normal balance. Copyright © 2013 Australian Physiotherapy Association. Published by .. All rights reserved.

  5. Reliability And Validity Of Turkish Version Of Motor Activity Log-28

    Directory of Open Access Journals (Sweden)

    Burcu Ersöz Hüseyinsinoğlu

    2011-06-01

    Full Text Available OBJECTIVE: The aim of this study was to adapt the Motor Activity Log-28 (MAL-28 into Turkish and probe the reliability and validity of this questionnaire in stroke patients. METHODS: Following the translation of the MAL-28 into Turkish, its reliability and construct validity was examined in 30 stroke patients. For the reliability study, patients were interviewed twice within a three day period, during which no rehabilitative activities were undertaken. The test-retest reliability was determined by using intra-class correlation coefficient (ICC and Spearman correlation coefficient (r; internal consistency was determined by Cronbach's alpha (α. The construct validity was examined by comparing MAL-28 Quality Of Movement (QOM scale and Amount Of Use (AOU scale with Wolf Motor Function Test (WMFT-Performance Time (PT and Functional Ability (FA scores. Furthermore, item-to-scale correlations of AOU and QOM scales were determined and correlation between totol scores of two scales was examined. RESULTS: Turkish version of MAL-28 AOU and QOM scales were reliable (ICC scores were 0.97 and 0.96, respectively and internally consistent (Cronbach’s α value was 0.96 for both scales. Test-retest reliability was supported (AOU, r=0.94; QOM, r=0.93. WMFT FA scores was correlated with both scales (r=0.63. Correlation between WMFT PT and AOU and QOM scales were -0.56 and -0.55. AOU and QOM scales were highly correlated (r=0.95. CONCLUSION: The findings indicate that Turkish version of MAL-28 is reliable and valid in individuals with stroke. Further investigation about its responsiveness is needed before using that version as a primary measurement in clinical trials

  6. First quality score for referral letters in gastroenterology—a validation study

    Science.gov (United States)

    Eskeland, Sigrun Losada; Brunborg, Cathrine; Seip, Birgitte; Wiencke, Kristine; Hovde, Øistein; Owen, Tanja; Skogestad, Erik; Huppertz-Hauss, Gert; Halvorsen, Fred-Arne; Garborg, Kjetil; Aabakken, Lars; de Lange, Thomas

    2016-01-01

    Objective To create and validate an objective and reliable score to assess referral quality in gastroenterology. Design An observational multicentre study. Setting and participants 25 gastroenterologists participated in selecting variables for a Thirty Point Score (TPS) for quality assessment of referrals to gastroenterology specialist healthcare for 9 common indications. From May to September 2014, 7 hospitals from the South-Eastern Norway Regional Health Authority participated in collecting and scoring 327 referrals to a gastroenterologist. Main outcome measure Correlation between the TPS and a visual analogue scale (VAS) for referral quality. Results The 327 referrals had an average TPS of 13.2 (range 1–25) and an average VAS of 4.7 (range 0.2–9.5). The reliability of the score was excellent, with an intra-rater intraclass correlation coefficient (ICC) of 0.87 and inter-rater ICC of 0.91. The overall correlation between the TPS and the VAS was moderate (r=0.42), and ranged from fair to substantial for the various indications. Mean agreement was good (ICC=0.47, 95% CI (0.34 to 0.57)), ranging from poor to good. Conclusions The TPS is reliable, objective and shows good agreement with the subjective VAS. The score may be a useful tool for assessing referral quality in gastroenterology, particularly important when evaluating the effect of interventions to improve referral quality. PMID:27855107

  7. The reliability and validity of Cognitive Abilities Screening Instrument C-2.0 in Screening elderly people in Chengdu

    Institute of Scientific and Technical Information of China (English)

    Luo Zhuming; Lu Rong

    2000-01-01

    Background: Cognitive Abilities Screening Instrument C-2.0 (CASI C-2.0) is founded on Chinese culture apply to person with little or no formal education.Objective: In order to assess the reliability and validity of Cognitive Abilities Screening Instrument C-2.0 (CASI C-2.0), we surveyed 807 persons aged from 55 to 100 years and with little or no formal education in the town and country of Chengdu city. Methods: Phase I: 807 persona were assessed by different doctors with CASI C-2.0 and Mini mental State Examination (MMSE). Phase Ⅱ: The positive persons of phase I were assessed by trained field physicians who did not know the CASI C-2.0 and MMSE scores. The physicians assessment included a detail history, a systematic physical examination and a selected group of psychometric tests to ascertain the clinical diagnoses of dementia、 Alzheimer′s disease and vascular dementia utilizing the ICD-10 and NINCDS-ADRAD、 DSM-Ⅳ criteria respectively. Phase Ⅲ: 30 persons were retest with CASI C-2.0 randomly after 3--4 weeks. Results: Among the 807 persons, 55 cases of dementia were identified, including 50 cases of probable Alzheimer′s disease and 5 cases of vascular dementia. Discussion: On reliability, the test-retest reliability coefficient of CASI C-2.0 is 0.97,p<0.001, the person′s correlation coefficient between nine items scores and total scores are all above 0.66, the Cronback a is 0.9056, the split-half reliability coefficient is 0.8355. On validity, using ICD-10 criteria as ”Gold Criteria" and 50 as a cut-offscore, the sensitivity、 specificity、 accuracy of CASI C-2.0 are 94.5%、 89.5%、 89.8% respectively, kappa is 0.5123. Comparing with MMSE, CAS1 C-2.0 has a better specificity, x2 test, p<0.01; a same sensitivity; and a lower refusal answer ratio, x2 test, p<0.005. Conclusion: We consider that CASI C-2.0 has excellent reliability and validity. It is worthy of disseminating and applying for both clinical and epidemiological surveys.

  8. Gait Deviation Index, Gait Profile Score and Gait Variable Score in children with spastic cerebral palsy

    DEFF Research Database (Denmark)

    Rasmussen, Helle Mätzke; Nielsen, Dennis Brandborg; Pedersen, Niels Wisbech

    2015-01-01

    Abstract The Gait Deviation Index (GDI) and Gait Profile Score (GPS) are the most used summary measures of gait in children with cerebral palsy (CP). However, the reliability and agreement of these indices have not been investigated, limiting their clinimetric quality for research and clinical...... to good reliability with ICCs of 0.4–0.7. The agreement for the GDI and the logarithmically transformed GPS, in terms of the standard error of measurement as a percentage of the grand mean (SEM%) varied from 4.1 to 6.7%, whilst the smallest detectable change in percent (SDC%) ranged from 11.3 to 18...

  9. Intrajudge and Interjudge Reliability of the Stuttering Severity Instrument-Fourth Edition.

    Science.gov (United States)

    Davidow, Jason H; Scott, Kathleen A

    2017-11-08

    The Stuttering Severity Instrument (SSI) is a tool used to measure the severity of stuttering. Previous versions of the instrument have known limitations (e.g., Lewis, 1995). The present study examined the intra- and interjudge reliability of the newest version, the Stuttering Severity Instrument-Fourth Edition (SSI-4) (Riley, 2009). Twelve judges who were trained on the SSI-4 protocol participated. Judges collected SSI-4 data while viewing 4 videos of adults who stutter at Time 1 and 4 weeks later at Time 2. Data were analyzed for intra- and interjudge reliability of the SSI-4 subscores (for Frequency, Duration, and Physical Concomitants), total score, and final severity rating. Intra- and interjudge reliability across the subscores and total score concurred with the manual's reported reliability when reliability was calculated using the methods described in the manual. New calculations of judge agreement produced different values from those in the manual-for the 3 subscores, total score, and final severity rating-and provided data absent from the manual. Clinicians and researchers who use the SSI-4 should carefully consider the limitations of the instrument. Investigation into the multitasking demands of the instrument may provide information on whether separating the collection of data for specific variables will improve intra- and interjudge reliability of those variables.

  10. Reliability and validity of the visual analogue scale for disability in patients with chronic musculoskeletal pain.

    Science.gov (United States)

    Boonstra, Anne M; Schiphorst Preuper, Henrica R; Reneman, Michiel F; Posthumus, Jitze B; Stewart, Roy E

    2008-06-01

    To determine the reliability and concurrent validity of a visual analogue scale (VAS) for disability as a single-item instrument measuring disability in chronic pain patients was the objective of the study. For the reliability study a test-retest design and for the validity study a cross-sectional design was used. A general rehabilitation centre and a university rehabilitation centre was the setting for the study. The study population consisted of patients over 18 years of age, suffering from chronic musculoskeletal pain; 52 patients in the reliability study, 344 patients in the validity study. Main outcome measures were as follows. Reliability study: Spearman's correlation coefficients (rho values) of the test and retest data of the VAS for disability; validity study: rho values of the VAS disability scores with the scores on four domains of the Short-Form Health Survey (SF-36) and VAS pain scores, and with Roland-Morris Disability Questionnaire scores in chronic low back pain patients. Results were as follows: in the reliability study rho values varied from 0.60 to 0.77; and in the validity study rho values of VAS disability scores with SF-36 domain scores varied from 0.16 to 0.51, with Roland-Morris Disability Questionnaire scores from 0.38 to 0.43 and with VAS pain scores from 0.76 to 0.84. The conclusion of the study was that the reliability of the VAS for disability is moderate to good. Because of a weak correlation with other disability instruments and a strong correlation with the VAS for pain, however, its validity is questionable.

  11. The Assumption of a Reliable Instrument and Other Pitfalls to Avoid When Considering the Reliability of Data

    Science.gov (United States)

    Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.

    2012-01-01

    The purpose of this article is to help researchers avoid common pitfalls associated with reliability including incorrectly assuming that (a) measurement error always attenuates observed score correlations, (b) different sources of measurement error originate from the same source, and (c) reliability is a function of instrumentation. To accomplish our purpose, we first describe what reliability is and why researchers should care about it with focus on its impact on effect sizes. Second, we review how reliability is assessed with comment on the consequences of cumulative measurement error. Third, we consider how researchers can use reliability generalization as a prescriptive method when designing their research studies to form hypotheses about whether or not reliability estimates will be acceptable given their sample and testing conditions. Finally, we discuss options that researchers may consider when faced with analyzing unreliable data. PMID:22518107

  12. Cross-cultural adaptation and validation of Persian Achilles tendon Total Rupture Score.

    Science.gov (United States)

    Ansari, Noureddin Nakhostin; Naghdi, Soofia; Hasanvand, Sahar; Fakhari, Zahra; Kordi, Ramin; Nilsson-Helander, Katarina

    2016-04-01

    To cross-culturally adapt the Achilles tendon Total Rupture Score (ATRS) to Persian language and to preliminary evaluate the reliability and validity of a Persian ATRS. A cross-sectional and prospective cohort study was conducted to translate and cross-culturally adapt the ATRS to Persian language (ATRS-Persian) following steps described in guidelines. Thirty patients with total Achilles tendon rupture and 30 healthy subjects participated in this study. Psychometric properties of floor/ceiling effects (responsiveness), internal consistency reliability, test-retest reliability, standard error of measurement (SEM), smallest detectable change (SDC), construct validity, and discriminant validity were tested. Factor analysis was performed to determine the ATRS-Persian structure. There were no floor or ceiling effects that indicate the content and responsiveness of ATRS-Persian. Internal consistency was high (Cronbach's α 0.95). Item-total correlations exceeded acceptable standard of 0.3 for the all items (0.58-0.95). The test-retest reliability was excellent [(ICC)agreement 0.98]. SEM and SDC were 3.57 and 9.9, respectively. Construct validity was supported by a significant correlation between the ATRS-Persian total score and the Persian Foot and Ankle Outcome Score (PFAOS) total score and PFAOS subscales (r = 0.55-0.83). The ATRS-Persian significantly discriminated between patients and healthy subjects. Explanatory factor analysis revealed 1 component. The ATRS was cross-culturally adapted to Persian and demonstrated to be a reliable and valid instrument to measure functional outcomes in Persian patients with Achilles tendon rupture. II.

  13. Examining the interrater reliability of the Hare Psychopathy Checklist-Revised across a large sample of trained raters.

    Science.gov (United States)

    Blais, Julie; Forth, Adelle E; Hare, Robert D

    2017-06-01

    The goal of the current study was to assess the interrater reliability of the Psychopathy Checklist-Revised (PCL-R) among a large sample of trained raters (N = 280). All raters completed PCL-R training at some point between 1989 and 2012 and subsequently provided complete coding for the same 6 practice cases. Overall, 3 major conclusions can be drawn from the results: (a) reliability of individual PCL-R items largely fell below any appropriate standards while the estimates for Total PCL-R scores and factor scores were good (but not excellent); (b) the cases representing individuals with high psychopathy scores showed better reliability than did the cases of individuals in the moderate to low PCL-R score range; and (c) there was a high degree of variability among raters; however, rater specific differences had no consistent effect on scoring the PCL-R. Therefore, despite low reliability estimates for individual items, Total scores and factor scores can be reliably scored among trained raters. We temper these conclusions by noting that scoring standardized videotaped case studies does not allow the rater to interact directly with the offender. Real-world PCL-R assessments typically involve a face-to-face interview and much more extensive collateral information. We offer recommendations for new web-based training procedures. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  14. Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.

    Science.gov (United States)

    Lou, Xiaoying; Lee, Richard; Feins, Richard H; Enter, Daniel; Hicks, George L; Verrier, Edward D; Fann, James I

    2014-12-01

    Previous work has demonstrated high inter-rater reliability in the objective assessment of simulated anastomoses among experienced educators. We evaluated the inter-rater reliability of less-experienced educators and the impact of focused training with a video-embedded coronary anastomosis assessment tool. Nine less-experienced cardiothoracic surgery faculty members from different institutions evaluated 2 videos of simulated coronary anastomoses (1 by a medical student and 1 by a resident) at the Thoracic Surgery Directors Association Boot Camp. They then underwent a 30-minute training session using an assessment tool with embedded videos to anchor rating scores for 10 components of coronary artery anastomosis. Afterward, they evaluated 2 videos of a different student and resident performing the task. Components were scored on a 1 to 5 Likert scale, yielding an average composite score. Inter-rater reliabilities of component and composite scores were assessed using intraclass correlation coefficients (ICCs) and overall pass/fail ratings with kappa. All components of the assessment tool exhibited improvement in reliability, with 4 (bite, needle holder use, needle angles, and hand mechanics) improving the most from poor (ICC range, 0.09-0.48) to strong (ICC range, 0.80-0.90) agreement. After training, inter-rater reliabilities for composite scores improved from moderate (ICC, 0.76) to strong (ICC, 0.90) agreement, and for overall pass/fail ratings, from poor (kappa = 0.20) to moderate (kappa = 0.78) agreement. Focused, video-based anchor training facilitates greater inter-rater reliability in the objective assessment of simulated coronary anastomoses. Among raters with less teaching experience, such training may be needed before objective evaluation of technical skills. Published by Elsevier Inc.

  15. Evaluation of a Lameness Scoring System for Dairy Cows

    DEFF Research Database (Denmark)

    Thomsen, P T; Munksgaard, L; Tøgersen, F A

    2008-01-01

    Lameness is a major problem in dairy production both in terms of reduced production and compromised animal welfare. A 5-point lameness scoring system was developed based on previously published systems, but optimized for use under field conditions. The scoring system included the words "in most...... categories by different observers before or after training. In conclusion, the results suggest that the lameness categories were not equidistant and the scoring system has reasonable reliability in terms of intra- and interobserver agreement...

  16. A study of the reliability of the Nociception Coma Scale.

    Science.gov (United States)

    Riganello, F; Cortese, M D; Arcuri, F; Candelieri, A; Guglielmino, F; Dolce, G; Sannita, W G; Schnakers, C

    2015-04-01

    In this study, we investigated the reliability of the Nociception Coma Scale which has recently been developed to assess nociception in non-communicative, severely brain-injured patients. Prospective cross-sequential study. Semi-intensive care unit and long-term brain injury care. Forty-four patients diagnosed as being in a vegetative state (n=26) or in a minimally conscious state (n=18). Patients were assessed by two experts (rater A and rater B) on two consecutive weeks to measure inter-rater agreement and test-retest reliability. Total scores and subscores of the Nociception Coma Scale. We performed a total of 176 assessments. The inter-rater agreement was moderate for the total scores (k = 0.57) and fair to substantial for the subscores (0.33 ≤ k ≤ 0.62) on week 2. The test-retest reliability was substantial for the total scores (k = 0.66) and moderate to almost perfect for the subscores (0.53 ≤ k ≤ 0.96) for rater A. The inter-rater agreement was weaker on week 1, whereas the test-retest reliability was lower for the least experienced rater (rater B). This study provides further evidence of the psychometric qualities of the Nociception Coma Scale. Future studies should assess the impact of practical experience and background on administration and scoring of the scale. © The Author(s) 2014.

  17. Assessing Reliability of Two Versions of Vocabulary Levels Tests in Iranian Context

    Directory of Open Access Journals (Sweden)

    Aso Bayazidi

    2017-02-01

    Full Text Available This study examined the equivalence and reliability of the two versions of the Vocabulary Levels Test in an Iranian context. This study was motivated by the fact that the Vocabulary Levels test is increasingly being used in Iran for both research and pedagogical purposes without having been checked for validity and reliability in this context. The equivalence and reliability of the two versions of the test were examined through the parallel-form approach to reliability in Classical True Score theory. Seventy-five intermediate learners of English as a foreign language at the Iran Language Institute took the two versions of the test with one week interval between the two administrations in a counterbalanced fashion. To examine the equivalence of the two versions, the means and variances of the scores obtained for the two tests were compared using paired-sample t-test and one-way ANOVA, respectively. The results of the analyses indicated that the difference between the means of the two versions was significant, and the two versions cannot be considered as parallel forms. To assess the reliability of the two versions, the correlation between the scores obtained from them was estimated using Pearson Product Moment correlation. The results of the analyses showed that the two versions are highly correlated and are reliable tests. It is concluded that the two versions should not be treated as equivalent in longitudinal and gain score studies.

  18. IW-Scoring: an Integrative Weighted Scoring framework for annotating and prioritizing genetic variations in the noncoding genome.

    Science.gov (United States)

    Wang, Jun; Dayem Ullah, Abu Z; Chelala, Claude

    2018-01-30

    The vast majority of germline and somatic variations occur in the noncoding part of the genome, only a small fraction of which are believed to be functional. From the tens of thousands of noncoding variations detectable in each genome, identifying and prioritizing driver candidates with putative functional significance is challenging. To address this, we implemented IW-Scoring, a new Integrative Weighted Scoring model to annotate and prioritise functionally relevant noncoding variations. We evaluate 11 scoring methods, and apply an unsupervised spectral approach for subsequent selective integration into two linear weighted functional scoring schemas for known and novel variations. IW-Scoring produces stable high-quality performance as the best predictors for three independent data sets. We demonstrate the robustness of IW-Scoring in identifying recurrent functional mutations in the TERT promoter, as well as disease SNPs in proximity to consensus motifs and with gene regulatory effects. Using follicular lymphoma as a paradigmatic cancer model, we apply IW-Scoring to locate 11 recurrently mutated noncoding regions in 14 follicular lymphoma genomes, and validate 9 of these regions in an extension cohort, including the promoter and enhancer regions of PAX5. Overall, IW-Scoring demonstrates greater versatility in identifying trait- and disease-associated noncoding variants. Scores from IW-Scoring as well as other methods are freely available from http://www.snp-nexus.org/IW-Scoring/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Conversion Between Mini-Mental State Examination, Montreal Cognitive Assessment, and Dementia Rating Scale-2 Scores in Parkinson’s Disease

    Science.gov (United States)

    van Steenoven, Inger; Aarsland, Dag; Hurtig, Howard; Chen-Plotkin, Alice; Duda, John E.; Rick, Jacqueline; Chahine, Lama M.; Dahodwala, Nabila; Trojanowski, John Q.; Roalf, David R.; Moberg, Paul J.; Weintraub, Daniel

    2015-01-01

    Cognitive impairment is one of the earliest, most common, and most disabling non-motor symptoms in Parkinson’s disease (PD). Thus, routine screening of global cognitive abilities is important for the optimal management of PD patients. Few global cognitive screening instruments have been developed for or validated in PD patients. The Mini-Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), and Dementia Rating Scale-2 (DRS-2) have been used extensively for cognitive screening in both clinical and research settings. Determining how to convert the scores between instruments would facilitate the longitudinal assessment of cognition in clinical settings and the comparison and synthesis of cognitive data in multicenter and longitudinal cohort studies. The primary aim of this study was to apply a simple and reliable algorithm for the conversion of MoCA to MMSE scores in PD patients. A secondary aim was to apply this algorithm for the conversion of DRS-2 to both MMSE and MoCA scores. The cognitive performance of a convenience sample of 360 patients with idiopathic PD was assessed by at least two of these cognitive screening instruments. We then developed conversion scores between the MMSE, MoCA, and DRS-2 using equipercentile equating and log-linear smoothing. The conversion score tables reported here enable direct and easy comparison of three routinely used cognitive screening assessments in PD patients. PMID:25381961

  20. Scoring sacroiliac joints by magnetic resonance imaging. A multiple-reader reliability experiment

    DEFF Research Database (Denmark)

    Landewe, Robert B.M.; Hermann, Kay Geert A; Van Der Heijde, Desiree M.F.M

    2005-01-01

    Magnetic resonance imaging (MRI) of the sacroiliac (SI) joints and the spine is increasingly important in the assessment of inflammatory activity and structural damage in clinical trials with patients with ankylosing spondylitis (AS). We investigated inter-reader reliability and sensitivity...

  1. Modification site localization scoring integrated into a search engine.

    Science.gov (United States)

    Baker, Peter R; Trinidad, Jonathan C; Chalkley, Robert J

    2011-07-01

    Large proteomic data sets identifying hundreds or thousands of modified peptides are becoming increasingly common in the literature. Several methods for assessing the reliability of peptide identifications both at the individual peptide or data set level have become established. However, tools for measuring the confidence of modification site assignments are sparse and are not often employed. A few tools for estimating phosphorylation site assignment reliabilities have been developed, but these are not integral to a search engine, so require a particular search engine output for a second step of processing. They may also require use of a particular fragmentation method and are mostly only applicable for phosphorylation analysis, rather than post-translational modifications analysis in general. In this study, we present the performance of site assignment scoring that is directly integrated into the search engine Protein Prospector, which allows site assignment reliability to be automatically reported for all modifications present in an identified peptide. It clearly indicates when a site assignment is ambiguous (and if so, between which residues), and reports an assignment score that can be translated into a reliability measure for individual site assignments.

  2. Features of applying systems approach for evaluating the reliability of cryogenic systems for special purposes

    Directory of Open Access Journals (Sweden)

    E. D. Chertov

    2016-01-01

    Full Text Available Summary. The analysis of cryogenic installations confirms objective regularity of increase in amount of the tasks solved by systems of a special purpose. One of the most important directions of development of a cryogenics is creation of installations for air separation product receipt, namely oxygen and nitrogen. Modern aviation complexes require use of these gases in large numbers as in gaseous, and in the liquid state. The onboard gas systems applied in aircraft of the Russian Federation are subdivided on: oxygen system; air (nitric system; system of neutral gas; fire-proof system. Technological schemes ADI are in many respects determined by pressure of compressed air or, in a general sense, a refrigerating cycle. For the majority ADI a working body of a refrigerating cycle the divided air is, that is technological and refrigerating cycles in installation are integrated. By this principle differentiate installations: low pressure; average and high pressure; with detander; with preliminary chilling. There is also insignificant number of the ADI types in which refrigerating and technological cycles are separated. These are installations with external chilling. For the solution of tasks of control of technical condition of the BRV hardware in real time and estimates of indicators of reliability it is offered to use multi-agent technologies. Multi-agent approach is the most acceptable for creation of SPPR for reliability assessment as allows: to redistribute processing of information on elements of system that leads to increase in overall performance; to solve a problem of accumulating, storage and recycling of knowledge that will allow to increase significantly efficiency of the solution of tasks of an assessment of reliability; to considerably reduce intervention of the person in process of functioning of system that will save time of the person of the making decision (PMD and will not demand from it special skills of work with it.

  3. Models on reliability of non-destructive testing

    International Nuclear Information System (INIS)

    Simola, K.; Pulkkinen, U.

    1998-01-01

    The reliability of ultrasonic inspections has been studied in e.g. international PISC (Programme for the Inspection of Steel Components) exercises. These exercises have produced a large amount of information on the effect of various factors on the reliability of inspections. The information obtained from reliability experiments are used to model the dependency of flaw detection probability on various factors and to evaluate the performance of inspection equipment, including the sizing accuracy. The information from experiments is utilised in a most effective way when mathematical models are applied. Here, some statistical models for reliability of non-destructive tests are introduced. In order to demonstrate the use of inspection reliability models, they have been applied to the inspection results of intergranular stress corrosion cracking (IGSCC) type flaws in PISC III exercise (PISC 1995). The models are applied to both flaw detection frequency data of all inspection teams and to flaw sizing data of one participating team. (author)

  4. First quality score for referral letters in gastroenterology-a validation study.

    Science.gov (United States)

    Eskeland, Sigrun Losada; Brunborg, Cathrine; Seip, Birgitte; Wiencke, Kristine; Hovde, Øistein; Owen, Tanja; Skogestad, Erik; Huppertz-Hauss, Gert; Halvorsen, Fred-Arne; Garborg, Kjetil; Aabakken, Lars; de Lange, Thomas

    2016-10-08

    To create and validate an objective and reliable score to assess referral quality in gastroenterology. An observational multicentre study. 25 gastroenterologists participated in selecting variables for a Thirty Point Score (TPS) for quality assessment of referrals to gastroenterology specialist healthcare for 9 common indications. From May to September 2014, 7 hospitals from the South-Eastern Norway Regional Health Authority participated in collecting and scoring 327 referrals to a gastroenterologist. Correlation between the TPS and a visual analogue scale (VAS) for referral quality. The 327 referrals had an average TPS of 13.2 (range 1-25) and an average VAS of 4.7 (range 0.2-9.5). The reliability of the score was excellent, with an intra-rater intraclass correlation coefficient (ICC) of 0.87 and inter-rater ICC of 0.91. The overall correlation between the TPS and the VAS was moderate (r=0.42), and ranged from fair to substantial for the various indications. Mean agreement was good (ICC=0.47, 95% CI (0.34 to 0.57)), ranging from poor to good. The TPS is reliable, objective and shows good agreement with the subjective VAS. The score may be a useful tool for assessing referral quality in gastroenterology, particularly important when evaluating the effect of interventions to improve referral quality. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  5. THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS).

    Science.gov (United States)

    McCunn, Robert; Aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

    2017-02-01

    The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the 'pure' intra-rater (intra-occasion) reliability for those movements. Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate

  6. Completeness and reliability of mortality data in Viet Nam: Implications for the national routine health management information system.

    Science.gov (United States)

    Hong, Tran Thi; Phuong Hoa, Nguyen; Walker, Sue M; Hill, Peter S; Rao, Chalapati

    2018-01-01

    Mortality statistics form a crucial component of national Health Management Information Systems (HMIS). However, there are limitations in the availability and quality of mortality data at national level in Viet Nam. This study assessed the completeness of recorded deaths and the reliability of recorded causes of death (COD) in the A6 death registers in the national routine HMIS in Viet Nam. 1477 identified deaths in 2014 were reviewed in two provinces. A capture-recapture method was applied to assess the completeness of the A6 death registers. 1365 household verbal autopsy (VA) interviews were successfully conducted, and these were reviewed by physicians who assigned multiple and underlying cause of death (UCOD). These UCODs from VA were then compared with the CODs recorded in the A6 death registers, using kappa scores to assess the reliability of the A6 death register diagnoses. The overall completeness of the A6 death registers in the two provinces was 89.3% (95%CI: 87.8-90.8). No COD recorded in the A6 death registers demonstrated good reliability. There is very low reliability in recording of cardiovascular deaths (kappa for stroke = 0.47 and kappa for ischaemic heart diseases = 0.42) and diabetes (kappa = 0.33). The reporting of deaths due to road traffic accidents, HIV and some cancers are at a moderate level of reliability with kappa scores ranging between 0.57-0.69 (pViet Nam.

  7. Transcultural adaptation and reliability of the Spanish version of a questionnaire about oral hygiene advice given by dentists in Chile.

    Directory of Open Access Journals (Sweden)

    Ilze Maldupa

    2015-04-01

    Full Text Available Aim: To adapt and evaluate validity and reliability of the Spanish version of a questionnaire about oral hygiene advice given by dentists in Chile Materials and methods: A validation study was conducted according to recommendations of COSMIN. The original questionnaire was adapted from English into Spanish using translation, back translation, expert review and pilot test sample by 56 dentists. The instrument consisted of 3 sections: recommendations for oral hygiene, relevance given to delivery of oral hygiene instruction and training and experience in delivering oral hygiene recommendations. It was reapplied in 5 of them a week later. Reliability was measured with internal consistency (Cronbach’s alpha, test-retest reliability (Cohen’s kappa and weighted kappa and measurement error (limits of agreement, LdA. Content validity was evaluated by experts and construct validity by using convergent validity (Pearson correlation. Results: A good level of internal consistency that applies to 5 items (Cronbach’s alpha=0.73 was obtained. For items on a nominal scale, Cohen Kappa coefficient was 0.80 (95% CI=0.64 to 0.95 and for ordinal items weighted kappa coefficient (linear weighting was 0.76 (95% CI=0.65 to 0.88. The difference between the scores calculated for the measurements was 1 standard deviation 2.35. Ninety five percent of the differences were between -5.7 to 3.7 (+/-4.7 LoA=1 and the variance of the total score was 29- 41. A good level of convergent validity (Pearson correlation=0.63 was obtained. Conclusion: The final questionnaire is valid and reliable to be applied to Chilean dentists with a profile like those included in this study in order to identify and quantify the oral hygiene instruction they provide to patients. Future studies should assess validity and reliability of this adaptation for other Spanish-speaking countries.

  8. Failure analysis – basic step of applying Reliability Centered Maintenance in general aviation

    Directory of Open Access Journals (Sweden)

    Martin BUGAJ

    2012-01-01

    Full Text Available Performing a reliability analysis on a product or system can actually include a number of different analyses to determine how reliable the product or system is. A reliability centered maintenance program consists of a set of scheduled tasks generated on the basis of specific reliability characteristics of the equipment they are designed to protect. Complex equipment is composed of a vast number of parts and assemblies. All these items can be expected to fail at one time or another, but some of the failures have more serious consequences than others. Certain kinds of failures have a direct effect on operating safety, and others affect the operational capability of the equipment. The consequences of a particular failure depend on the design of the item and the equipment in which it is installed. Although the environment in which the equipment is operated is sometimes an additional factor, the impact of failures on the equipment, and hence their consequences for the operating organization, are established primarily by the equipment designer. Failure consequences are therefore a primary inherent reliability characteristic.

  9. Psychometric properties of a Swedish translation of the VISA-P outcome score for patellar tendinopathy.

    Science.gov (United States)

    Frohm, Anna; Saartok, Tönu; Edman, Gunnar; Renström, Per

    2004-12-18

    Self-administrated patient outcome scores are increasingly recommended for evaluation of primary outcome in clinical studies. The VISA-P score, developed at the Victorian Institute of Sport Assessment in Melbourne, Australia, is a questionnaire developed for patients with patellar tendinopathy and the patients assess severity of symptoms, function and ability to participate in sport. The aim of this study was to translate the questionnaire into Swedish and to study the reliability and validity of the translated questionnaire and resultant scores. The questionnaire was translated into Swedish according to internationally recommended guidelines for cross-cultural adaptation of self-report measures. The reliability and validity were tested in three different populations. The populations used were healthy students (n = 17), members of the Swedish male national basketball team (n = 17), considered as a population at risk, and a group of non-surgically treated patients (n = 17) with clinically diagnosed patellar tendinopathy. The questionnaire was completed by 51 subjects altogether. The translated VISA-P questionnaire showed very good test-retest reliability (ICC = 0.97).The mean (+/- SD) of the VISA-P score, at both the first and second test occasions was highest in the healthy student group 83 (+/- 13) and 81 (+/- 15), respectively. The score of the basketball players was 79 (+/- 24) and 80 (+/- 23), while the patient group scored significantly (p < 0.05) lower, 48 (+/- 20) and 52 (+/- 19). The translated version of the VISA-P questionnaire was linguistically and culturally equivalent to the original version. The translated score showed good reliability.

  10. Psychometric properties of a Swedish translation of the VISA-P outcome score for patellar tendinopathy

    Directory of Open Access Journals (Sweden)

    Edman Gunnar

    2004-12-01

    Full Text Available Abstract Background Self-administrated patient outcome scores are increasingly recommended for evaluation of primary outcome in clinical studies. The VISA-P score, developed at the Victorian Institute of Sport Assessment in Melbourne, Australia, is a questionnaire developed for patients with patellar tendinopathy and the patients assess severity of symptoms, function and ability to participate in sport. The aim of this study was to translate the questionnaire into Swedish and to study the reliability and validity of the translated questionnaire and resultant scores. Methods The questionnaire was translated into Swedish according to internationally recommended guidelines for cross-cultural adaptation of self-report measures. The reliability and validity were tested in three different populations. The populations used were healthy students (n = 17, members of the Swedish male national basketball team (n = 17, considered as a population at risk, and a group of non-surgically treated patients (n = 17 with clinically diagnosed patellar tendinopathy. The questionnaire was completed by 51 subjects altogether. Results The translated VISA-P questionnaire showed very good test-retest reliability (ICC = 0.97. The mean (± SD of the VISA-P score, at both the first and second test occasions was highest in the healthy student group 83 (± 13 and 81 (± 15, respectively. The score of the basketball players was 79 (± 24 and 80 (± 23, while the patient group scored significantly (p Conclusions The translated version of the VISA-P questionnaire was linguistically and culturally equivalent to the original version. The translated score showed good reliability.

  11. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

    Science.gov (United States)

    Bastien, Olivier; Maréchal, Eric

    2008-08-07

    information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.

  12. Reliability analysis techniques in power plant design

    International Nuclear Information System (INIS)

    Chang, N.E.

    1981-01-01

    An overview of reliability analysis techniques is presented as applied to power plant design. The key terms, power plant performance, reliability, availability and maintainability are defined. Reliability modeling, methods of analysis and component reliability data are briefly reviewed. Application of reliability analysis techniques from a design engineering approach to improving power plant productivity is discussed. (author)

  13. The role of test-retest reliability in measuring individual and group differences in executive functioning.

    Science.gov (United States)

    Paap, Kenneth R; Sawi, Oliver

    2016-12-01

    Studies testing for individual or group differences in executive functioning can be compromised by unknown test-retest reliability. Test-retest reliabilities across an interval of about one week were obtained from performance in the antisaccade, flanker, Simon, and color-shape switching tasks. There is a general trade-off between the greater reliability of single mean RT measures, and the greater process purity of measures based on contrasts between mean RTs in two conditions. The individual differences in RT model recently developed by Miller and Ulrich was used to evaluate the trade-off. Test-retest reliability was statistically significant for 11 of the 12 measures, but was of moderate size, at best, for the difference scores. The test-retest reliabilities for the Simon and flanker interference scores were lower than those for switching costs. Standard practice evaluates the reliability of executive-functioning measures using split-half methods based on data obtained in a single day. Our test-retest measures of reliability are lower, especially for difference scores. These reliability measures must also take into account possible day effects that classical test theory assumes do not occur. Measures based on single mean RTs tend to have acceptable levels of reliability and convergent validity, but are "impure" measures of specific executive functions. The individual differences in RT model shows that the impurity problem is worse than typically assumed. However, the "purer" measures based on difference scores have low convergent validity that is partly caused by deficiencies in test-retest reliability. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Reliable and Valid Assessment of Point-of-care Ultrasonography

    DEFF Research Database (Denmark)

    Todsen, Tobias; Tolsgaard, Martin Grønnebæk; Olsen, Beth Härstedt

    2015-01-01

    physicians' OSAUS scores with diagnostic accuracy. RESULTS: The generalizability coefficient was high (0.81) and a D-study demonstrated that 1 assessor and 5 cases would result in similar reliability. The construct validity of the OSAUS scale was supported by a significant difference in the mean scores......OBJECTIVE: To explore the reliability and validity of the Objective Structured Assessment of Ultrasound Skills (OSAUS) scale for point-of-care ultrasonography (POC US) performance. BACKGROUND: POC US is increasingly used by clinicians and is an essential part of the management of acute surgical...... conditions. However, the quality of performance is highly operator-dependent. Therefore, reliable and valid assessment of trainees' ultrasonography competence is needed to ensure patient safety. METHODS: Twenty-four physicians, representing novices, intermediates, and experts in POC US, scanned 4 different...

  15. Computed tomography for the detection of distal radioulnar joint instability: normal variation and reliability of four CT scoring systems in 46 patients

    Energy Technology Data Exchange (ETDEWEB)

    Wijffels, Mathieu; Krijnen, Pieta; Schipper, Inger [Leiden University Medical Center, Department of Surgery-Trauma Surgery, P.O. Box 9600, Leiden (Netherlands); Stomp, Wouter; Reijnierse, Monique [Leiden University Medical Center, Department of Radiology, P.O. Box 9600, Leiden (Netherlands)

    2016-11-15

    The diagnosis of distal radioulnar joint (DRUJ) instability is clinically challenging. Computed tomography (CT) may aid in the diagnosis, but the reliability and normal variation for DRUJ translation on CT have not been established in detail. The aim of this study was to evaluate inter- and intraobserver agreement and normal ranges of CT scoring methods for determination of DRUJ translation in both posttraumatic and uninjured wrists. Patients with a conservatively treated, unilateral distal radius fracture were included. CT scans of both wrists were evaluated independently, by two readers using the radioulnar line method, subluxation ratio method, epicenter method and radioulnar ratio method. The inter- and intraobserver agreement was assessed and normal values were determined based on the uninjured wrists. Ninety-two wrist CTs (mean age: 56.5 years, SD: 17.0, mean follow-up 4.2 years, SD: 0.5) were evaluated. Interobserver agreement was best for the epicenter method [ICC = 0.73, 95 % confidence interval (CI) 0.65-0.79]. Intraobserver agreement was almost perfect for the radioulnar line method (ICC = 0.82, 95 % CI 0.77-0.87). Each method showed a wide normal range for normal DRUJ translation. Normal range for the epicenter method is -0.35 to -0.06 in pronation and -0.11 to 0.19 in supination. DRUJ translation on CT in pro- and supination can be reliably evaluated in both normal and posttraumatic wrists, however with large normal variation. The epicenter method seems the most reliable. Scanning of both wrists might be helpful to prevent the radiological overdiagnosis of instability. (orig.)

  16. Reliability and validity of the Parenting Scale of Inconsistency.

    Science.gov (United States)

    Yoshizumi, Takahiro; Murase, Satomi; Murakami, Takashi; Takai, Jiro

    2006-08-01

    The purposes of the present study were to develop a Parenting Scale of Inconsistency and to evaluate its initial reliability and validity. The 12 items assess the inconsistency among parents' moods, behaviors, and attitudes toward children. In the primary study, 517 participants completed three measures: the new Parenting Scale of Inconsistency, the Parental Bonding Instrument, and the Depression Scale of the General Health Questionnaire. The Parenting Scale of Inconsistency had good test-retest reliability of .85 and internal consistency of .88 (Cronbach coefficient alpha). Construct validity was good as Inconsistency scores were significantly correlated with the Care and Overprotection scores of the Parental Bonding Instrument and with the Depression scores. Moreover, Inconsistency scores' relation with a dimension of parenting style distinct from Care and Overprotection suggested that the Parenting Scale of Inconsistency had factorial validity. This scale seems a potential measure for examining the relationships between inconsistent parenting and the mental health of children.

  17. Interrater and Intrarater Reliability of the Tuck Jump Assessment by Health Professionals of Varied Educational Backgrounds

    Directory of Open Access Journals (Sweden)

    Lisa A. Dudley

    2013-01-01

    Full Text Available Objective. The Tuck Jump Assessment (TJA, a clinical plyometric assessment, identifies 10 jumping and landing technique flaws. The study objective was to investigate TJA interrater and intrarater reliability with raters of different educational and clinical backgrounds. Methods. 40 participants were video recorded performing the TJA using published protocol and instructions. Five raters of varied educational and clinical backgrounds scored the TJA. Each score of the 10 technique flaws was summed for the total TJA score. Approximately one month later, 3 raters scored the videos again. Intraclass correlation coefficients determined interrater (5 and 3 raters for first and second session, resp. and intrarater (3 raters reliability. Results. Interrater reliability with 5 raters was poor (ICC = 0.47; 95% confidence intervals (CI 0.33–0.62. Interrater reliability between 3 raters who completed 2 scoring sessions improved from 0.52 (95% CI 0.35–0.68 for session one to 0.69 (95% CI 0.55–0.81 for session two. Intrarater reliability was poor to moderate, ranging from 0.44 (95% CI 0.22–0.68 to 0.72 (95% CI 0.55–0.84. Conclusion. Published protocol and training of raters were insufficient to allow consistent TJA scoring. There may be a learned effect with the TJA since interrater reliability improved with repetition. TJA instructions and training should be modified and enhanced before clinical implementation.

  18. Design for reliability: NASA reliability preferred practices for design and test

    Science.gov (United States)

    Lalli, Vincent R.

    1994-01-01

    This tutorial summarizes reliability experience from both NASA and industry and reflects engineering practices that support current and future civil space programs. These practices were collected from various NASA field centers and were reviewed by a committee of senior technical representatives from the participating centers (members are listed at the end). The material for this tutorial was taken from the publication issued by the NASA Reliability and Maintainability Steering Committee (NASA Reliability Preferred Practices for Design and Test. NASA TM-4322, 1991). Reliability must be an integral part of the systems engineering process. Although both disciplines must be weighed equally with other technical and programmatic demands, the application of sound reliability principles will be the key to the effectiveness and affordability of America's space program. Our space programs have shown that reliability efforts must focus on the design characteristics that affect the frequency of failure. Herein, we emphasize that these identified design characteristics must be controlled by applying conservative engineering principles.

  19. Conversion between mini-mental state examination, montreal cognitive assessment, and dementia rating scale-2 scores in Parkinson's disease.

    Science.gov (United States)

    van Steenoven, Inger; Aarsland, Dag; Hurtig, Howard; Chen-Plotkin, Alice; Duda, John E; Rick, Jacqueline; Chahine, Lama M; Dahodwala, Nabila; Trojanowski, John Q; Roalf, David R; Moberg, Paul J; Weintraub, Daniel

    2014-12-01

    Cognitive impairment is one of the earliest, most common, and most disabling non-motor symptoms in Parkinson's disease (PD). Thus, routine screening of global cognitive abilities is important for the optimal management of PD patients. Few global cognitive screening instruments have been developed for or validated in PD patients. The Mini-Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), and Dementia Rating Scale-2 (DRS-2) have been used extensively for cognitive screening in both clinical and research settings. Determining how to convert the scores between instruments would facilitate the longitudinal assessment of cognition in clinical settings and the comparison and synthesis of cognitive data in multicenter and longitudinal cohort studies. The primary aim of this study was to apply a simple and reliable algorithm for the conversion of MoCA to MMSE scores in PD patients. A secondary aim was to apply this algorithm for the conversion of DRS-2 to both MMSE and MoCA scores. The cognitive performance of a convenience sample of 360 patients with idiopathic PD was assessed by at least two of these cognitive screening instruments. We then developed conversion scores between the MMSE, MoCA, and DRS-2 using equipercentile equating and log-linear smoothing. The conversion score tables reported here enable direct and easy comparison of three routinely used cognitive screening assessments in PD patients. © 2014 International Parkinson and Movement Disorder Society.

  20. Reliability of the Cooking Task in adults with acquired brain injury.

    Science.gov (United States)

    Poncet, Frédérique; Swaine, Bonnie; Taillefer, Chantal; Lamoureux, Julie; Pradat-Diehl, Pascale; Chevignard, Mathilde

    2015-01-01

    Acquired brain injury (ABI) often leads to deficits in executive functioning (EF) responsible for severe and long-standing disabilities in daily life activities. The Cooking Task is an ecological and valid test of EF involving multi-tasking in a real environment. Given its complex scoring system, it is important to establish the tool's reliability. The objective of the study was to examine the reliability of the Cooking Task (internal consistency, inter-rater and test-retest reliability). A total of 160 patients with ABI (113 men, mean age 37 years, SD = 14.3) were tested using the Cooking Task. For test-retest reliability, patients were assessed by the same rater on two occasions (mean interval 11 days) while two raters independently and simultaneously observed and scored patients' performances to estimate inter-rater reliability. Internal consistency was high for the global scale (Cronbach α = .74). Inter-rater reliability (n = 66) for total errors was also high (ICC = .93), however the test-retest reliability (n = 11) was poor (ICC = .36). In general the Cooking Task appears to be a reliable tool. The low test-retest results were expected given the importance of EF in the performance of novel tasks.

  1. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed

    DEFF Research Database (Denmark)

    Kottner, Jan; Audigé, Laurent; Brorson, Stig

    2011-01-01

    Results of reliability and agreement studies are intended to provide information about the amount of error inherent in any diagnosis, score, or measurement. The level of reliability and agreement among users of scales, instruments, or classifications is widely unknown. Therefore, there is a need ......, standards, or guidelines for reporting reliability and agreement in the health care and medical field are lacking. The objective was to develop guidelines for reporting reliability and agreement studies....

  2. The six-item Clock Drawing Test – reliability and validity in mild Alzheimer’s disease

    DEFF Research Database (Denmark)

    Jørgensen, Kasper; Kristensen, Maria K; Waldemar, Gunhild

    2015-01-01

    This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical neuropsychologi......This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical...... neuropsychologists blind to diagnostic classification. The interrater agreement of individual scoring criteria was analyzed and items with poor or moderate reliability were excluded. The classification accuracy of the resulting scoring system - the six-item CDT - was examined. We explored the effect of further...

  3. Development and reliability of a multi-modality scoring system for evaluation of disease progression in pre-clinical models of osteoarthritis: celecoxib may possess disease-modifying properties.

    Science.gov (United States)

    Panahifar, A; Jaremko, J L; Tessier, A G; Lambert, R G; Maksymowych, W P; Fallone, B G; Doschak, M R

    2014-10-01

    We sought to develop a comprehensive scoring system for evaluation of pre-clinical models of osteoarthritis (OA) progression, and use this to evaluate two different classes of drugs for management of OA. Post-traumatic OA (PTOA) was surgically induced in skeletally mature rats. Rats were randomly divided in three groups receiving either glucosamine (high dose of 192 mg/kg) or celecoxib (clinical dose) or no treatment. Disease progression was monitored utilizing micro-magnetic resonance imaging (MRI), micro-computed tomography (CT) and histology. Pertinent features such as osteophytes, subchondral sclerosis, joint effusion, bone marrow lesion (BML), cysts, loose bodies and cartilage abnormalities were included in designing a sensitive multi-modality based scoring system, termed the rat arthritis knee scoring system (RAKSS). Overall, an inter-observer correlation coefficient (ICC) of greater than 0.750 was achieved for each scored feature. None of the treatments prevented cartilage loss, synovitis, joint effusion, or sclerosis. However, celecoxib significantly reduced osteophyte development compared to placebo. Although signs of inflammation such as synovitis and joint effusion were readily identified at 4 weeks post-operation, we did not detect any BML. We report the development of a sensitive and reliable multi-modality scoring system, the RAKSS, for evaluation of OA severity in pre-clinical animal models. Using this scoring system, we found that celecoxib prevented enlargement of osteophytes in this animal model of PTOA, and thus it may be useful in preventing OA progression. However, it did not show any chondroprotective effect using the recommended dose. In contrast, high dose glucosamine had no measurable effects. Copyright © 2014 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.

  4. Radiosurgery for brain metastases: a score index for predicting prognosis

    International Nuclear Information System (INIS)

    Weltman, Eduardo; Salvajoli, Joao Victor; Brandt, Reynaldo Andre; Morais Hanriot, Rodrigo de; Prisco, Flavio Eduardo; Cruz, Jose Carlos; Oliveira Borges, Sandra Regina de; Wajsbrot, Dalia Ballas

    2000-01-01

    Purpose: To analyze a prognostic score index for patients with brain metastases submitted to stereotactic radiosurgery (the Score Index for Radiosurgery in Brain Metastases [SIR]). Methods and Materials: Actuarial survival of 65 brain metastases patients treated with radiosurgery between July 1993 and December 1997 was retrospectively analyzed. Prognostic factors included age, Karnofsky performance status (KPS), extracranial disease status, number of brain lesions, largest brain lesion volume, lesions site, and receiving or not whole brain irradiation. The SIR was obtained through summation of the previously noted first five prognostic factors. Kaplan-Meier actuarial survival curves for all prognostic factors, SIR, and recursive partitioning analysis (RPA) (RTOG prognostic score) were calculated. Survival curves of subsets were compared by log-rank test. Application of the Cox model was utilized to identify any correlation between prognostic factors, prognostic scores, and survival. Results: Median overall survival from radiosurgery was 6.8 months. Utilizing univariate analysis, extracranial disease status, KPS, number of brain lesions, largest brain lesion volume, RPA, and SIR were significantly correlated with prognosis. Median survival for the RPA classes 1, 2, and 3 was 20.19 months, 7.75 months, and 3.38 months respectively (p = 0.0131). Median survival for patients, grouped under SIR from 1 to 3, 4 to 7, and 8 to 10, was 2.91 months, 7.00 months, and 31.38 months respectively (p = 0.0001). Using the Cox model, extracranial disease status and KPS demonstrated significant correlation with prognosis (p 0.0001 and 0.0004 respectively). Multivariate analysis also demonstrated significance for SIR and RPA when tested individually (p = 0.0001 and 0.0040 respectively). Applying the Cox Model to both SIR and RPA, only SIR reached independent significance (p = 0.0004). Conclusions: Systemic disease status, KPS, SIR, and RPA are reliable prognostic factors for patients

  5. The RIPASA score for the diagnosis of acute appendicitis: A comparison with the modified Alvarado score.

    Science.gov (United States)

    Díaz-Barrientos, C Z; Aquino-González, A; Heredia-Montaño, M; Navarro-Tovar, F; Pineda-Espinosa, M A; Espinosa de Santillana, I A

    2018-02-06

    Acute appendicitis is the first cause of surgical emergencies. It is still a difficult diagnosis to make, especially in young persons, the elderly, and in reproductive-age women, in whom a series of inflammatory conditions can have signs and symptoms similar to those of acute appendicitis. Different scoring systems have been created to increase diagnostic accuracy, and they are inexpensive, noninvasive, and easy to use and reproduce. The modified Alvarado score is probably the most widely used and accepted in emergency services worldwide. On the other hand, the RIPASA score was formulated in 2010 and has greater sensitivity and specificity. There are very few studies conducted in Mexico that compare the different scoring systems for appendicitis. The aim of our article was to compare the modified Alvarado score and the RIPASA score in the diagnosis of patients with abdominal pain and suspected acute appendicitis. An observational, analytic, and prolective study was conducted within the time frame of July 2002 and February 2014 at the Hospital Universitario de Puebla. The questionnaires used for the evaluation process were applied to the patients suspected of having appendicitis. The RIPASA score with 8.5 as the optimal cutoff value: ROC curve (area .595), sensitivity (93.3%), specificity (8.3%), PPV (91.8%), NPV (10.1%). Modified Alvarado score with 6 as the optimal cutoff value: ROC curve (area .719), sensitivity (75%), specificity (41.6%), PPV (93.7%), NPV (12.5%). The RIPASA score showed no advantages over the modified Alvarado score when applied to patients presenting with suspected acute appendicitis. Copyright © 2018 Asociación Mexicana de Gastroenterología. Publicado por Masson Doyma México S.A. All rights reserved.

  6. Relative Merits of Four Methods for Scoring Cloze Tests.

    Science.gov (United States)

    Brown, James Dean

    1980-01-01

    Describes study comparing merits of exact answer, acceptable answer, clozentropy and multiple choice methods for scoring tests. Results show differences among reliability, mean item facility, discrimination and usability, but not validity. (BK)

  7. Reliability and sensitivity to change of the Simple Erosion Narrowing Score compared with the Sharp-van der Heijde method for scoring radiographs in rheumatoid arthritis

    NARCIS (Netherlands)

    Dias, E. M.; Lukas, C.; Landewé, R.; Fatenejad, S.; van der Heijde, D.

    2008-01-01

    To compare the performance of a simplified scoring method for structural damage on radiographs of patients with rheumatoid arthritis (the Simple Erosion Narrowing Score or SENS) with the Sharp-van der Heijde Score (SHS) as reference. We used the radiographic data from the Trial of Etanercept and

  8. Numerical differences between Guttman's reliability coefficients and the GLB

    NARCIS (Netherlands)

    Oosterwijk, P.R.; van der Ark, L.A.; Sijtsma, K.; van der Ark, L.A.; Bolt, D.M; Wang, W.-C.; Douglas, J.A.; Wiberg, M.

    2016-01-01

    For samples smaller than 1000 and tests longer than ten items, the greatest lower bound (GLB) to the reliability is known to be biased and not recommended as a method to estimate test-score reliability. As a first step in finding alternative lower bounds under these conditions, we investigated the

  9. [German validation of the Acute Cystitis Symptom Score].

    Science.gov (United States)

    Alidjanov, J F; Pilatz, A; Abdufattaev, U A; Wiltink, J; Weidner, W; Naber, K G; Wagenlehner, F

    2015-09-01

    The Uzbek version of the Acute Cystitis Symptom Score (ACSS) was developed as a simple self-reporting questionnaire to improve diagnosis and therapy of women with acute cystitis (AC). The purpose of this work was to validate the ACSS in the German language. The ACSS consists of 18 questions in four subscales: (1) typical symptoms, (2) differential diagnosis, (3) quality of life, and (4) additional circumstances. Translation of the ACSS into German was performed according to international guidelines. For the validation process 36 German-speaking women (age: 18-90 years), with and without symptoms of AC, were included in the study. Classification of participants into two groups (patients or controls) was based on the presence or absence of typical symptoms and significant bacteriuria (≥ 10(3) CFU/ml). Statistical evaluations of reliability, validity, and predictive ability were performed. ROC curve analysis was performed to assess sensitivity and specificity of ACSS and its subscales. The Mann-Whitney's U test and t-test were used to compare the scores of the groups. Of the 36 German-speaking women (age: 40 ± 19 years), 19 were diagnosed with AC (patient group), while 17 women served as controls. Cronbach's α for the German ACSS total scale was 0.87. A threshold score of ≥ 6 points in category 1 (typical symptoms) significantly predicted AC (sensitivity 94.7%, specificity 82.4%). There were no significant differences in ACSS scores in patients and controls compared to the original Uzbek version of the ACSS. The German version of the ACSS showed a high reliability and validity. Therefore, the German version of the ACSS can be reliably used in clinical practice and research for diagnosis and therapeutic monitoring of patients suffering from AC.

  10. Validation and reliability of a Behcet's Syndrome Activity Scale in Korea.

    Science.gov (United States)

    Choi, Hyo Jin; Seo, Mi Ryoung; Ryu, Hee Jung; Baek, Han Joo

    2016-01-01

    We prepared a cross-cultural adaptation of the Behcet's Syndrome Activity Scale (BSAS) and evaluated its reliability and validity in Korea. Fifty patients with Behcet's disease (BD) who attended the Rheumatology Clinic of Gachon University Gil Medical Center were included in this study. The first BSAS questionnaire was administered at each clinic visit, and the second questionnaire was completed at home within 24 hours of the visit. A Behcet's Disease Current Activity Form (BDCAF) and a Behcet's Disease Quality of Life (BDQOL) form were also given to patients. The test-retest reliability was analyzed by intraclass correlation coefficients (ICC). To assess the validity, the total BSAS score was compared with the BDCAF score, the patient/physician global assessment, and the BDQOL by Spearman rank correlation. Twelve males and 38 females were enrolled. The mean age was 48.5 years and the mean disease duration was 6.7 years. Thirty-eight patients (76.0%) returned the questionnaire by mail. For the test-retest reliability, the two assessments were significantly correlated on all 10 items of the BSAS questionnaire (p < 0.05) and the total BSAS score (ICC, 0.925; p < 0.001). The total BSAS score was statistically correlated with the BDQOL, BDCAF, and patient/physician global assessment (p < 0.01). The Korean version of BSAS is a reliable and valid instrument to measure BD activity.

  11. Validity and reliability of a self-administered foot evaluation questionnaire (SAFE-Q).

    Science.gov (United States)

    Niki, Hisateru; Tatsunami, Shinobu; Haraguchi, Naoki; Aoki, Takafumi; Okuda, Ryuzo; Suda, Yasunori; Takao, Masato; Tanaka, Yasuhito

    2013-03-01

    The Japanese Society for Surgery of the Foot (JSSF) is developing a QOL questionnaire instrument for use in pathological conditions related to the foot and ankle. The main body of the outcome instrument (the Self-Administered Foot Evaluation Questionnaire, SAFE-Q version 2) consists of 34 questionnaire items, which provide five subscale scores (1: Pain and Pain-Related; 2: Physical Functioning and Daily Living; 3: Social Functioning; 4: Shoe-Related; and 5: General Health and Well-Being). In addition, the instrument has nine optional questionnaire items that provide a Sports Activity subscale score. The purpose of this study was to evaluate the test-retest reliability of the SAFE-Q. Version 2 of the SAFE-Q was administered to 876 patients and 491 non-patients, and the test-retest reliability was evaluated for 131 patients. In addition, the SF-36 questionnaire and the JSSF Scale scoring form were administered to all of the participants. Subscale scores were scaled such that the final sum of scores ranged between zero (least healthy) to 100 (healthiest). The intraclass correlation coefficients were larger than 0.7 for all of the scores. The means of the five subscale scores were between 60 and 75. The five subscales easily separated patients from non-patients. The coefficients for the correlations of the subscale scores with the scores on the JSSF Scale and the SF-36 subscales were all highly statistically significantly greater than zero (p valid and reliable. In the future, it will be beneficial to test the responsiveness of the SAFE-Q.

  12. Continual Screening of Patients Using mHealth: The Rolling Score Concept Applied to Sleep Medicine.

    Science.gov (United States)

    Zluga, Claudio; Modre-Osprian, Robert; Kastner, Peter; Schreier, Günter

    2016-01-01

    Continual monitoring of patients utilizing mHealth-based telemonitoring applications are more and more used for individual management of patients. A new approach in risk assessment called Rolling Score Concept uses standardized questionnaires for continual scoring of individuals' health state through electronic patient reported outcome (ePRO). Using self-rated questionnaires and adding a specific Time Schedule to each question result in a movement of the questionnaires' scores over time, the Rolling Score. A text-processing pipeline was implemented with KNIME analytics platform to extract a Score Mapping Rule Set for three standardized screening questionnaires in the field of sleep medicine. A feasibility study was performed in 10 healthy volunteers equipped with a mHealth application on a smartphone and a sleep tracker. Results show that the proposed Rolling Score Concept is feasible and deviations of scores are in a reasonable range (< 7%), sustaining the new approach. However, further studies are required for verification. In addition, parameter quantification could avoid incorrect subjective evaluation by substitution of questions with sensor data.

  13. The Screening Test for Emotional Problems-Parent Report (STEP-P): Studies of Reliability and Validity

    Science.gov (United States)

    Erford, Bradley T.; Alsamadi, Silvana C.

    2012-01-01

    Score reliability and validity of parent responses concerning their 10- to 17-year-old students were analyzed using the Screening Test for Emotional Problems-Parent Report (STEP-P), which assesses a variety of emotional problems classified under the Individuals with Disabilities Education Improvement Act. Score reliability, convergent, and…

  14. Reliability of the ECHOWS Tool for Assessment of Patient Interviewing Skills.

    Science.gov (United States)

    Boissonnault, Jill S; Evans, Kerrie; Tuttle, Neil; Hetzel, Scott J; Boissonnault, William G

    2016-04-01

    History taking is an important component of patient/client management. Assessment of student history-taking competency can be achieved via a standardized tool. The ECHOWS tool has been shown to be valid with modest intrarater reliability in a previous study but did not demonstrate sufficient power to definitively prove its stability. The purposes of this study were: (1) to assess the reliability of the ECHOWS tool for student assessment of patient interviewing skills and (2) to determine whether the tool discerns between novice and experienced skill levels. A reliability and construct validity assessment was conducted. Three faculty members from the United States and Australia scored videotaped histories from standardized patients taken by students and experienced clinicians from each of these countries. The tapes were scored twice, 3 to 6 weeks apart. Reliability was assessed using interclass correlation coefficients (ICCs) and repeated measures. Analysis of variance models assessed the ability of the tool to discern between novice and experienced skill levels. The ECHOWS tool showed excellent intrarater reliability (ICC [3,1]=.74-.89) and good interrater reliability (ICC [2,1]=.55) as a whole. The summary of performance (S) section showed poor interrater reliability (ICC [2,1]=.27). There was no statistical difference in performance on the tool between novice and experienced clinicians. A possible ceiling effect may occur when standardized patients are not coached to provide complex and obtuse responses to interviewer questions. Variation in familiarity with the ECHOWS tool and in use of the online training may have influenced scoring of the S section. The ECHOWS tool demonstrates excellent intrarater reliability and moderate interrater reliability. Sufficient training with the tool prior to student assessment is recommended. The S section must evolve in order to provide a more discerning measure of interviewing skills. © 2016 American Physical Therapy

  15. Posterior probability of linkage and maximal lod score.

    Science.gov (United States)

    Génin, E; Martinez, M; Clerget-Darpoux, F

    1995-01-01

    To detect linkage between a trait and a marker, Morton (1955) proposed to calculate the lod score z(theta 1) at a given value theta 1 of the recombination fraction. If z(theta 1) reaches +3 then linkage is concluded. However, in practice, lod scores are calculated for different values of the recombination fraction between 0 and 0.5 and the test is based on the maximum value of the lod score Zmax. The impact of this deviation of the test on the probability that in fact linkage does not exist, when linkage was concluded, is documented here. This posterior probability of no linkage can be derived by using Bayes' theorem. It is less than 5% when the lod score at a predetermined theta 1 is used for the test. But, for a Zmax of +3, we showed that it can reach 16.4%. Thus, considering a composite alternative hypothesis instead of a single one decreases the reliability of the test. The reliability decreases rapidly when Zmax is less than +3. Given a Zmax of +2.5, there is a 33% chance that linkage does not exist. Moreover, the posterior probability depends not only on the value of Zmax but also jointly on the family structures and on the genetic model. For a given Zmax, the chance that linkage exists may then vary.

  16. Prediction of safety critical software operational reliability from test reliability using testing environment factors

    International Nuclear Information System (INIS)

    Jung, Hoan Sung; Seong, Poong Hyun

    1999-01-01

    It has been a critical issue to predict the safety critical software reliability in nuclear engineering area. For many years, many researches have focused on the quantification of software reliability and there have been many models developed to quantify software reliability. Most software reliability models estimate the reliability with the failure data collected during the test assuming that the test environments well represent the operation profile. User's interest is however on the operational reliability rather than on the test reliability. The experiences show that the operational reliability is higher than the test reliability. With the assumption that the difference in reliability results from the change of environment, from testing to operation, testing environment factors comprising the aging factor and the coverage factor are developed in this paper and used to predict the ultimate operational reliability with the failure data in testing phase. It is by incorporating test environments applied beyond the operational profile into testing environment factors. The application results show that the proposed method can estimate the operational reliability accurately. (Author). 14 refs., 1 tab., 1 fig

  17. A Reliability Generalization Study of the Marlowe-Crowne Social Desirability Scale.

    Science.gov (United States)

    Beretvas, S, Natasha; Meyers, Jason L.; Leite, Walter L.

    2002-01-01

    Conducted a reliability generalization study of the Marlowe-Crowne Social Desirability Scale (D. Crowne and D. Marlowe, 1960). Analysis of 93 studies show that the predicted score reliability for male adolescents was 0.53, and reliability for men's responses was lower than for women's. Discusses the need for further analysis of the scale. (SLD)

  18. Reliability of Oronasal Fistula Classification.

    Science.gov (United States)

    Sitzman, Thomas J; Allori, Alexander C; Matic, Damir B; Beals, Stephen P; Fisher, David M; Samson, Thomas D; Marcus, Jeffrey R; Tse, Raymond W

    2018-01-01

    Objective Oronasal fistula is an important complication of cleft palate repair that is frequently used to evaluate surgical quality, yet reliability of fistula classification has never been examined. The objective of this study was to determine the reliability of oronasal fistula classification both within individual surgeons and between multiple surgeons. Design Using intraoral photographs of children with repaired cleft palate, surgeons rated the location of palatal fistulae using the Pittsburgh Fistula Classification System. Intrarater and interrater reliability scores were calculated for each region of the palate. Participants Eight cleft surgeons rated photographs obtained from 29 children. Results Within individual surgeons reliability for each region of the Pittsburgh classification ranged from moderate to almost perfect (κ = .60-.96). By contrast, reliability between surgeons was lower, ranging from fair to substantial (κ = .23-.70). Between-surgeon reliability was lowest for the junction of the soft and hard palates (κ = .23). Within-surgeon and between-surgeon reliability were almost perfect for the more general classification of fistula in the secondary palate (κ = .95 and κ = .83, respectively). Conclusions This is the first reliability study of fistula classification. We show that the Pittsburgh Fistula Classification System is reliable when used by an individual surgeon, but less reliable when used among multiple surgeons. Comparisons of fistula occurrence among surgeons may be subject to less bias if they use the more general classification of "presence or absence of fistula of the secondary palate" rather than the Pittsburgh Fistula Classification System.

  19. Test-retest reliability and minimal detectable change scores for sit-to-stand-to-sit tests, the six-minute walk test, the one-leg heel-rise test, and handgrip strength in people undergoing hemodialysis.

    Science.gov (United States)

    Segura-Ortí, Eva; Martínez-Olmos, Francisco José

    2011-08-01

    Determining the relative and absolute reliability of outcomes of physical performance tests for people undergoing hemodialysis is necessary to discriminate between the true effects of exercise interventions and the inherent variability of this cohort. The aims of this study were to assess the relative reliability of sit-to-stand-to-sit tests (the STS-10, which measures the time [in seconds] required to complete 10 full stands from a sitting position, and the STS-60, which measures the number of repetitions achieved in 60 seconds), the Six-Minute Walk Test (6MWT), the one-leg heel-rise test, and the handgrip strength test and to calculate minimal detectable change (MDC) scores in people undergoing hemodialysis. This study was a prospective, nonexperimental investigation. Thirty-nine people undergoing hemodialysis at 2 clinics in Spain were contacted. Study participants performed the STS-10 (n=37), the STS-60 (n=37), and the 6MWT (n=36). At one of the settings, the participants also performed the one-leg heel-rise test (n=21) and the handgrip strength test (n=12) on both the right and the left sides. Participants attended 2 testing sessions 1 to 2 weeks apart. High intraclass correlation coefficients (≥.88) were found for all tests, suggesting good relative reliability. The MDC scores at 90% confidence intervals were as follows: 8.4 seconds for the STS-10, 4 repetitions for the STS-60, 66.3 m for the 6MWT, 3.4 kg for handgrip strength (force-generating capacity), 3.7 repetitions for the one-leg heel-rise test with the right leg, and 5.2 repetitions for the one-leg heel-rise test with the left leg. Limitations A limited sample of patients was used in this study. The STS-16, STS-60, 6MWT, one-leg heel rise test, and handgrip strength test are reliable outcome measures. The MDC scores at 90% confidence intervals for these tests will help to determine whether a change is due to error or to an intervention.

  20. ELUCIDATING BRAIN CONNECTIVITY NETWORKS IN MAJOR DEPRESSIVE DISORDER USING CLASSIFICATION-BASED SCORING.

    Science.gov (United States)

    Sacchet, Matthew D; Prasad, Gautam; Foland-Ross, Lara C; Thompson, Paul M; Gotlib, Ian H

    2014-04-01

    Graph theory is increasingly used in the field of neuroscience to understand the large-scale network structure of the human brain. There is also considerable interest in applying machine learning techniques in clinical settings, for example, to make diagnoses or predict treatment outcomes. Here we used support-vector machines (SVMs), in conjunction with whole-brain tractography, to identify graph metrics that best differentiate individuals with Major Depressive Disorder (MDD) from nondepressed controls. To do this, we applied a novel feature-scoring procedure that incorporates iterative classifier performance to assess feature robustness. We found that small-worldness , a measure of the balance between global integration and local specialization, most reliably differentiated MDD from nondepressed individuals. Post-hoc regional analyses suggested that heightened connectivity of the subcallosal cingulate gyrus (SCG) in MDDs contributes to these differences. The current study provides a novel way to assess the robustness of classification features and reveals anomalies in large-scale neural networks in MDD.

  1. Evaluation of ideomotor apraxia in patients with stroke: a study of reliability and validity.

    Science.gov (United States)

    Kaya, Kurtulus; Unsal-Delialioglu, Sibel; Kurt, Murat; Altinok, Nermin; Ozel, Sumru

    2006-03-01

    This aim of this study was to determine the reliability and validity of an established ideomotor apraxia test when applied to a Turkish stroke patient population and to healthy controls. The study group comprised 50 patients with right hemiplegia and 36 with left hemiplegia, who had developed the condition as a result of a cerebrovascular accident, and 33 age-matched healthy subjects. The subjects were evaluated for apraxia using an established ideomotor apraxia test. The cut-off value of the test and the reliability coefficient between observers were determined. Apraxia was found in 54% patients with right hemiplegia (most being severe) and in 25% of left hemiplegic patients (most being mild). The apraxia scores for patients with right hemiplegia were found to be significantly lower than for those with left hemiplegia and for healthy subjects. There was no statistically significant difference between patients with left hemiplegia and healthy subjects. It was shown that the ideomotor apraxia test could distinguish apraxic from non-apraxic subjects. The reliability coefficient among observers in the study was high and a reliability study of the ideomotor apraxia test was therefore performed.

  2. Validation of microsatellite instability histology scores with Bethesda guidelines in hereditary nonpolyposis colorectal cancer

    Directory of Open Access Journals (Sweden)

    Mustafa Kaya

    2017-01-01

    Conclusions: The MSI scoring systems, MsPath, and PathScore, are reliable systems and effectively correlated with BG for predicting patients who need advanced analysis techniques because of the risk of HNPCC.

  3. Psychometric Properties of Raw and Scale Scores on Mixed-Format Tests

    Science.gov (United States)

    Kolen, Michael J.; Lee, Won-Chan

    2011-01-01

    This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…

  4. Psychometric Evaluation of the Lower Extremity Computerized Adaptive Test, the Modified Harris Hip Score, and the Hip Outcome Score.

    Science.gov (United States)

    Hung, Man; Hon, Shirley D; Cheng, Christine; Franklin, Jeremy D; Aoki, Stephen K; Anderson, Mike B; Kapron, Ashley L; Peters, Christopher L; Pelt, Christopher E

    2014-12-01

    The applicability and validity of many patient-reported outcome measures in the high-functioning population are not well understood. To compare the psychometric properties of the modified Harris Hip Score (mHHS), the Hip Outcome Score activities of daily living subscale (HOS-ADL) and sports (HOS-sports), and the Lower Extremity Computerized Adaptive Test (LE CAT). The hypotheses was that all instruments would perform well but that the LE CAT would show superiority psychometrically because a combination of CAT and a large item bank allows for a high degree of measurement precision. Cohort study (diagnosis); Level of evidence, 2. Data were collected from 472 advanced-age, active participants from the Huntsman World Senior Games in 2012. Validity evidences were examined through item fit, dimensionality, monotonicity, local independence, differential item functioning, person raw score to measure correlation, and instrument coverage (ie, ceiling and floor effects), and reliability evidences were examined through Cronbach alpha and person separation index. All instruments demonstrated good item fit, unidimensionality, monotonicity, local independence, and person raw score to measure correlations. The HOS-ADL had high ceiling effects of 36.02%, and the mHHS had ceiling effects of 27.54%. The LE CAT had ceiling effects of 8.47%, and the HOS-sports had no ceiling effects. None of the instruments had any floor effects. The mHHS had a very low Cronbach alpha of 0.41 and an extremely low person separation index of 0.08. Reliabilities for the LE CAT were excellent and for the HOS-ADL and HOS-sports were good. The LE CAT showed better psychometric properties overall than the HOS-ADL, HOS-sports, and mHHS for the senior population. The mHHS demonstrated pronounced ceiling effects and poor reliabilities that should be of concern. The high ceiling effects for the HOS-ADL were also of concern. The LE CAT was superior in all psychometric aspects examined in this study. Future

  5. Multiple Score Comparison: a network meta-analysis approach to comparison and external validation of prognostic scores

    Directory of Open Access Journals (Sweden)

    Sarah R. Haile

    2017-12-01

    Full Text Available Abstract Background Prediction models and prognostic scores have been increasingly popular in both clinical practice and clinical research settings, for example to aid in risk-based decision making or control for confounding. In many medical fields, a large number of prognostic scores are available, but practitioners may find it difficult to choose between them due to lack of external validation as well as lack of comparisons between them. Methods Borrowing methodology from network meta-analysis, we describe an approach to Multiple Score Comparison meta-analysis (MSC which permits concurrent external validation and comparisons of prognostic scores using individual patient data (IPD arising from a large-scale international collaboration. We describe the challenges in adapting network meta-analysis to the MSC setting, for instance the need to explicitly include correlations between the scores on a cohort level, and how to deal with many multi-score studies. We propose first using IPD to make cohort-level aggregate discrimination or calibration scores, comparing all to a common comparator. Then, standard network meta-analysis techniques can be applied, taking care to consider correlation structures in cohorts with multiple scores. Transitivity, consistency and heterogeneity are also examined. Results We provide a clinical application, comparing prognostic scores for 3-year mortality in patients with chronic obstructive pulmonary disease using data from a large-scale collaborative initiative. We focus on the discriminative properties of the prognostic scores. Our results show clear differences in performance, with ADO and eBODE showing higher discrimination with respect to mortality than other considered scores. The assumptions of transitivity and local and global consistency were not violated. Heterogeneity was small. Conclusions We applied a network meta-analytic methodology to externally validate and concurrently compare the prognostic properties

  6. Dutch validation of the low anterior resection syndrome score.

    Science.gov (United States)

    Hupkens, B J P; Breukink, S O; Olde Reuver Of Briel, C; Tanis, P J; de Noo, M E; van Duijvendijk, P; van Westreenen, H L; Dekker, J W T; Chen, T Y T; Juul, T

    2018-04-21

    The aim of this study was to validate the Dutch translation of the low anterior resection syndrome (LARS) score in a population of Dutch rectal cancer patients. Patients who underwent surgery for rectal cancer received the LARS score questionnaire, a single quality of life (QoL) category question and the European Organization for Research and Treatment of Cancer (EORTC) QLQ-C30 questionnaire. A subgroup of patients received the LARS score twice to assess the test-retest reliability. A total of 165 patients were included in the analysis, identified in six Dutch centres. The response rate was 62.0%. The percentage of patients who reported 'major LARS' was 59.4%. There was a high proportion of patients with a perfect or moderate fit between the QoL category question and the LARS score, showing a good convergent validity. The LARS score was able to discriminate between patients with or without neoadjuvant radiotherapy (P = 0.003), between total and partial mesorectal excision (P = 0.008) and between age groups (P = 0.039). There was a statistically significant association between a higher LARS score and an impaired function on the global QoL subscale and the physical, role, emotional and social functioning subscales of the EORTC QLQ-C30 questionnaire. The test-retest reliability of the LARS score was good, with an interclass correlation coefficient of 0.79. The good psychometric properties of the Dutch version of the LARS score are comparable overall to the earlier validations in other countries. Therefore, the Dutch translation can be considered to be a valid tool for assessing LARS in Dutch rectal cancer patients. Colorectal Disease © 2018 The Association of Coloproctology of Great Britain and Ireland.

  7. Internal Structure of Mini-CEX Scores for Internal Medicine Residents: Factor Analysis and Generalizability

    Science.gov (United States)

    Cook, David A.; Beckman, Thomas J.; Mandrekar, Jayawant N.; Pankratz, V. Shane

    2010-01-01

    The mini-CEX is widely used to rate directly observed resident-patient encounters. Although several studies have explored the reliability of mini-CEX scores, the dimensionality of mini-CEX scores is incompletely understood. Objective: Explore the dimensionality of mini-CEX scores through factor analysis and generalizability analysis. Design:…

  8. Reliability and validity of the upper-body dressing scale in Japanese patients with vascular dementia with hemiparesis.

    Science.gov (United States)

    Endo, Arisa; Suzuki, Makoto; Akagi, Atsumi; Chiba, Naoyuki; Ishizaka, Ikuyo; Matsunaga, Atsuhiko; Fukuda, Michinari

    2015-03-01

    The purpose of this study was to examine the reliability and validity of the Upper-body Dressing Scale (UBDS) for buttoned shirt dressing, which evaluates the learning process of new component actions of upper-body dressing in patients diagnosed with dementia and hemiparesis. This was a preliminary correlational study of concurrent validity and reliability in which 10 vascular dementia patients with hemiparesis were enrolled and assessed repeatedly by six occupational therapists by means of the UBDS and the dressing item of the Functional Independence Measure (FIM). Intraclass correlation coefficient was 0.97 for intra-rater reliability and 0.99 for inter-rater reliability. The level of correlation between UBDS score and FIM dressing item scores was -0.93. UBDS scores for paralytic hand passed into the sleeve and sleeve pulled up beyond the shoulder joint were worse than the scores for the other components of the task. The UBDS has good reliability and validity for vascular dementia patients with hemiparesis. Further research is needed to investigate the relation between UBDS score and the effect of intervention and to clarify sensitivity or responsiveness of the scale to clinical change. Copyright © 2014 John Wiley & Sons, Ltd.

  9. The Motivated Strategies for Learning Questionnaire: score validity among medicine residents.

    Science.gov (United States)

    Cook, David A; Thompson, Warren G; Thomas, Kris G

    2011-12-01

    The Motivated Strategies for Learning Questionnaire (MSLQ) purports to measure motivation using the expectancy-value model. Although it is widely used in other fields, this instrument has received little study in health professions education. The purpose of this study was to evaluate the validity of MSLQ scores. We conducted a validity study evaluating the relationships of MSLQ scores to other variables and their internal structure (reliability and factor analysis). Participants included 210 internal medicine and family medicine residents participating in a web-based course on ambulatory medicine at an academic medical centre. Measurements included pre-course MSLQ scores, pre- and post-module motivation surveys, post-module knowledge test and post-module Instructional Materials Motivation Survey (IMMS) scores. Internal consistency was universally high for all MSLQ items together (Cronbach's α = 0.93) and for each domain (α ≥ 0.67). Total MSLQ scores showed statistically significant positive associations with post-test knowledge scores. For example, a 1-point rise in total MSLQ score was associated with a 4.4% increase in post-test scores (β = 4.4; p motivation and satisfaction. Scores on MSLQ domains demonstrated associations that generally aligned with our hypotheses. Self-efficacy and control of learning belief scores demonstrated the strongest domain-specific relationships with knowledge scores (β = 2.9 for both). Confirmatory factor analysis showed a borderline model fit. Follow-up exploratory factor analysis revealed the scores of five factors (self-efficacy, intrinsic interest, test anxiety, extrinsic goals, attribution) demonstrated psychometric and predictive properties similar to those of the original scales. Scores on the MSLQ are reliable and predict meaningful outcomes. However, the factor structure suggests a simplified model might better fit the empiric data. Future research might consider how assessing and responding to motivation could enhance

  10. Reliability-based design of wind turbine blades

    DEFF Research Database (Denmark)

    Toft, Henrik Stensgaard; Sørensen, John Dalsgaard

    2011-01-01

    Reliability-based design of wind turbine blades requires identification of the important failure modes/limit states along with stochastic models for the uncertainties and methods for estimating the reliability. In the present paper it is described how reliability-based design can be applied to wi...

  11. [Results of applying a paediatric early warning score system as a healthcare quality improvement plan].

    Science.gov (United States)

    Rivero-Martín, M J; Prieto-Martínez, S; García-Solano, M; Montilla-Pérez, M; Tena-Martín, E; Ballesteros-García, M M

    2016-06-01

    The aims of this study were to introduce a paediatric early warning score (PEWS) into our daily clinical practice, as well as to evaluate its ability to detect clinical deterioration in children admitted, and to train nursing staff to communicate the information and response effectively. An analysis was performed on the implementation of PEWS in the electronic health records of children (0-15 years) in our paediatric ward from February 2014 to September 2014. The maximum score was 6. Nursing staff reviewed scores >2, and if >3 medical and nursing staff reviewed it. Monitoring indicators: % of admissions with scoring; % of complete data capture; % of scores >3; % of scores >3 reviewed by medical staff, % of changes in treatment due to the warning system, and number of patients who needed Paediatric Intensive Care Unit (PICU) admission, or died without an increased warning score. The data were collected from all patients (931) admitted. The scale was measured 7,917 times, with 78.8% of them with complete data capture. Very few (1.9%) showed scores >3, and 14% of them with changes in clinical management (intensifying treatment or new diagnostic tests). One patient (scored 2) required PICU admission. There were no deaths. Parents or nursing staff concern was registered in 80% of cases. PEWS are useful to provide a standardised assessment of clinical status in the inpatient setting, using a unique scale and implementing data capture. Because of the lack of severe complications requiring PICU admission and deaths, we will have to use other data to evaluate these scales. Copyright © 2016 SECA. Published by Elsevier Espana. All rights reserved.

  12. Interrater and intrarater reliability of the Knosp scale for pituitary adenoma grading.

    Science.gov (United States)

    Mooney, Michael A; Hardesty, Douglas A; Sheehy, John P; Bird, Robert; Chapple, Kristina; White, William L; Little, Andrew S

    2017-05-01

    OBJECTIVE The goal of this study was to determine the interrater and intrarater reliability of the Knosp grading scale for predicting pituitary adenoma cavernous sinus (CS) involvement. METHODS Six independent raters (3 neurosurgery residents, 2 pituitary surgeons, and 1 neuroradiologist) participated in the study. Each rater scored 50 unique pituitary MRI scans (with contrast) of biopsy-proven pituitary adenoma. Reliabilities for the full scale were determined 3 ways: 1) using all 50 scans, 2) using scans with midrange scores versus end scores, and 3) using a dichotomized scale that reflects common clinical practice. The performance of resident raters was compared with that of faculty raters to assess the influence of training level on reliability. RESULTS Overall, the interrater reliability of the Knosp scale was "strong" (0.73, 95% CI 0.56-0.84). However, the percent agreement for all 6 reviewers was only 10% (26% for faculty members, 30% for residents). The reliability of the middle scores (i.e., average rated Knosp Grades 1 and 2) was "very weak" (0.18, 95% CI -0.27 to 0.56) and the percent agreement for all reviewers was only 5%. When the scale was dichotomized into tumors unlikely to have intraoperative CS involvement (Grades 0, 1, and 2) and those likely to have CS involvement (Grades 3 and 4), the reliability was "strong" (0.60, 95% CI 0.39-0.75) and the percent agreement for all raters improved to 60%. There was no significant difference in reliability between residents and faculty (residents 0.72, 95% CI 0.55-0.83 vs faculty 0.73, 95% CI 0.56-0.84). Intrarater reliability was moderate to strong and increased with the level of experience. CONCLUSIONS Although these findings suggest that the Knosp grading scale has acceptable interrater reliability overall, it raises important questions about the "very weak" reliability of the scale's middle grades. By dichotomizing the scale into clinically useful groups, the authors were able to address the poor

  13. Mammography image assessment; validity and reliability of current scheme

    International Nuclear Information System (INIS)

    Hill, C.; Robinson, L.

    2015-01-01

    Mammographers currently score their own images according to criteria set out by Regional Quality Assurance. The criteria used are based on the ‘Perfect, Good, Moderate, Inadequate’ (PGMI) marking criteria established by the National Health Service Breast Screening Programme (NHSBSP) in their Quality Assurance Guidelines of 2006 1 . This document discusses the validity and reliability of the current mammography image assessment scheme. Commencing with a critical review of the literature this document sets out to highlight problems with the national approach to the use of marking schemes. The findings suggest that ‘PGMI’ scheme is flawed in terms of reliability and validity and is not universally applied across the UK. There also appear to be differences in schemes used by trainees and qualified mammographers. Initial recommendations are to be made in collaboration with colleagues within the National Health Service Breast Screening Programme (NHSBSP), Higher Education Centres, College of Radiographers and the Royal College of Radiologists in order to identify a mammography image appraisal scheme that is fit for purpose. - Highlights: • Currently no robust evidence based marking tools in use for the assessment of images in mammography. • Is current system valid, reliable and robust? • How can the current image assessment tool be improved? • Should students and qualified mammographers use the same tool? • What marking criteria are available for image assessment?

  14. A clinical assessment tool used for physiotherapy students--is it reliable?

    Science.gov (United States)

    Lewis, Lucy K; Stiller, Kathy; Hardy, Frances

    2008-01-01

    Educational institutions providing professional programs such as physiotherapy must provide high-quality student assessment procedures. To ensure that assessment is consistent, assessment tools should have an acceptable level of reliability. There is a paucity of research evaluating the reliability of clinical assessment tools used for physiotherapy students. This study evaluated the inter- and intrarater reliability of an assessment tool used for physiotherapy students during a clinical placement. Five clinical educators and one academic participated in the study. Each rater independently marked 22 student written assessments that had been completed by students after viewing a videotaped patient physiotherapy assessment. The raters repeated the marking process 7 weeks later, with the assessments provided in a randomised order. The interrater reliability (Intraclass Correlation Coefficient) for the total scores was 0.32, representing a poor level of reliability. A high level of intrarater reliability (percentage agreement) was found for the clinical educators, with a difference in section scores of one mark or less on 93.4% of occasions. Further research should be undertaken to reevaluate the reliability of this clinical assessment tool following training. The reliability of clinical assessment tools used in other areas of physiotherapy education should be formally measured rather than assumed.

  15. Reliability of IOTA score and ADNEX model in the screening of ovarian malignancy in postmenopausal women.

    Science.gov (United States)

    Nohuz, Erdogan; De Simone, Luisa; Chêne, Gautier

    2018-04-28

    The IOTA (International Ovarian Tumor Analysis) group has developed the ADNEX (Assessment of Different NEoplasias in the adneXa) model to predict the risk that an ovarian mass is benign, borderline or malignant. This study aimed to test reliability of these risks prediction models to improve the performance of pelvic ultrasound and discriminate between benign and malignant cysts. Postmenopausal women with an adnexal mass (including ovarian, para-ovarian and tubal) and who underwent a standardized ultrasound examination before surgery were included. Prospectively and retrospectively collected data and ultrasound appearances of the tumors were described using the terms and definitions of the IOTA group and tested in accordance with the ADNEX model and were compared to the final histological diagnosis. Of the 107 menopausal patients recruited between 2011 and 2016, 14 were excluded (incomplete inclusion criteria). Thus, 93 patients constituted a cohort in whom 89 had benign cysts (83 ovarian and 6 tubal or para-ovarian cysts), 1 had border line tumor and 3 had invasive ovarian cancers (1 at first stage, 1 at advanced stage and 1 metastatic tumor in the ovary). The overall prevalence of malignancy was 4.3%. Every benign ovarian cyst was classified as probably benign by IOTA score which showed also a high specificity with the totality of probably malignant lesion proved malignant by histological exam. The limit of this score was the important rate of not classified or undetermined cysts. However, the malignancy risks calculated by ADNEX model allowed identifying the totality of malignancy. Thus, the combination of the two methods of analysis showed a sensitivity and specificity rates of respectively 100% and 98%. Evaluation of malignancy risks by these 2 tests highlighted a negative predictive value of 100% (there was no case of false negative) and a positive predictive value of 80%. On the basis of our findings, the IOTA classification and the ADNEX multimodal

  16. Keeping Your Audience in Mind: Applying Audience Analysis to the Design of Interactive Score Reports

    Science.gov (United States)

    Zapata-Rivera, Juan Diego; Katz, Irvin R.

    2014-01-01

    Score reports have one or more intended audiences: the people who use the reports to make decisions about test takers, including teachers, administrators, parents and test takers. Attention to audience when designing a score report supports assessment validity by increasing the likelihood that score users will interpret and use assessment results…

  17. An artificial intelligence system for reliability studies

    International Nuclear Information System (INIS)

    Llory, M.; Ancelin, C.; Bannelier, M.; Bouhadana, H.; Bouissou, M.; Lucas, J.Y.; Magne, L.; Villate, N.

    1990-01-01

    The EDF (French Electricity Company) software developed for computer aided reliability studies is considered. Such software tools were applied in the study of the safety requirements of the Paluel nuclear power plant. The reliability models, based on IF-THEN type rules, and the generation of models by the expert system are described. The models are then processed applying algorithm structures [fr

  18. Scaled CMOS Technology Reliability Users Guide

    Science.gov (United States)

    White, Mark

    2010-01-01

    The desire to assess the reliability of emerging scaled microelectronics technologies through faster reliability trials and more accurate acceleration models is the precursor for further research and experimentation in this relevant field. The effect of semiconductor scaling on microelectronics product reliability is an important aspect to the high reliability application user. From the perspective of a customer or user, who in many cases must deal with very limited, if any, manufacturer's reliability data to assess the product for a highly-reliable application, product-level testing is critical in the characterization and reliability assessment of advanced nanometer semiconductor scaling effects on microelectronics reliability. A methodology on how to accomplish this and techniques for deriving the expected product-level reliability on commercial memory products are provided.Competing mechanism theory and the multiple failure mechanism model are applied to the experimental results of scaled SDRAM products. Accelerated stress testing at multiple conditions is applied at the product level of several scaled memory products to assess the performance degradation and product reliability. Acceleration models are derived for each case. For several scaled SDRAM products, retention time degradation is studied and two distinct soft error populations are observed with each technology generation: early breakdown, characterized by randomly distributed weak bits with Weibull slope (beta)=1, and a main population breakdown with an increasing failure rate. Retention time soft error rates are calculated and a multiple failure mechanism acceleration model with parameters is derived for each technology. Defect densities are calculated and reflect a decreasing trend in the percentage of random defective bits for each successive product generation. A normalized soft error failure rate of the memory data retention time in FIT/Gb and FIT/cm2 for several scaled SDRAM generations is

  19. Validity and Reliability of the Persian Version of the Dysphagia Handicap Index (DHI).

    Science.gov (United States)

    Asadollahpour, Faezeh; Baghban, Kowsar; Asadi, Mozhgan

    2015-05-01

    The Dysphagia Handicap Index (DHI) is one of the instruments used for measuring a dysphagic patient's self-assessment. In some ways, it reflects the patient's quality of life. Although it has been recognized and widely applied in English speaking populations, it has not been used in its present forms in Persian speaking countries. The purpose of this study was to adapt a Persian version of the DHI and to evaluate its validity, consistency, and reliability in the Persian population with oropharyngeal dysphagia. Some stages for cross-cultural adaptation were performed, which consisted in translation, synthesis, back translation, review by an expert committee, and final proof reading. The generated Persian DHI was administered to 85 patients with oropharyngeal dysphagia and 89 control subjects at Zahedan city between May 2013 and August 2013. The patients and control subjects answered the same questionnaire 2 weeks later to verify the test-retest reliability. Internal consistency and test-retest reliability were evaluated. The results of the patients and the control group were compared. The Persian DHI showed good internal consistency (Cronbach's alpha coefficients range from 0.82 to 0.94). Also, good test-retest reliability was found for the total scores of the Persian DHI (r=0.89). There was a significant difference between the DHI scores of the control group and those of the oropharyngeal dysphagia group (P‹0.001). The Persian version of the DHI achieved Face and translation validity. This study demonstrated that the Persian DHI is a valid tool for self-assessment of the handicapping effects of dysphagia on the physical, functional, and emotional aspects of patient life and can be a useful tool for screening and treatment planning for the Persian-speaking dysphagic patients, regardless of the cause or the severity of the dysphagia.

  20. Validity and Reliability of the Persian Version of the Dysphagia Handicap Index (DHI

    Directory of Open Access Journals (Sweden)

    faezeh asadollahpour

    2015-05-01

    Full Text Available Introduction: The Dysphagia Handicap Index (DHI is one of the instruments used for measuring a dysphagic patient’s self-assessment. In some ways, it reflects the patient’s quality of life. Although it has been recognized and widely applied in English speaking populations, it has not been used in its present forms in Persian speaking countries. The purpose of this study was to adapt a Persian version of the DHI and to evaluate its validity, consistency, and reliability in the Persian population with oropharyngeal dysphagia.   Materials and Methods: Some stages for cross-cultural adaptation were performed, which consisted in translation, synthesis, back translation, review by an expert committee, and final proof reading. The generated Persian DHI was administered to 85 patients with oropharyngeal dysphagia and 89 control subjects at Zahedan city between May 2013 and August 2013. The patients and control subjects answered the same questionnaire 2 weeks later to verify the test-retest reliability. Internal consistency and test-retest reliability were evaluated. The results of the patients and the control group were compared.   Results: The Persian DHI showed good internal consistency (Cronbach’s alpha coefficients range from 0.82 to 0.94. Also, good test-retest reliability was found for the total scores of the Persian DHI (r=0.89. There was a significant difference between the DHI scores of the control group and those of the oropharyngeal dysphagia group (P‹0.001.   Conclusion:  The Persian version of the DHI achieved Face and translation validity. This study demonstrated that the Persian DHI is a valid tool for self-assessment of the handicapping effects of dysphagia on the physical, functional, and emotional aspects of patient life and can be a useful tool for screening and treatment planning for the Persian-speaking dysphagic patients, regardless of the cause or the severity of the dysphagia.

  1. Is the renal excretion of orally applied diatrizoate (Gastrografin copyright) a reliable marker of gastrointestinal perforation or dehiscence of a gastrointestinal anastomosis?

    International Nuclear Information System (INIS)

    Born, M.; Axmann, C.; Kader, R.; Falkenhausen, M. von; Manka, C.; Willinek, W.A.; Schild, H.

    2004-01-01

    Purpose: Renal excretion of orally or rectally applied Gastrografin is reported to be a reliable indicator of a perforation or a post-operative anastomotic dehiscence of the GI-tract. The study was conducted to determine whether increased attenuation of the urine measured by CT after oral or rectal application of Gastrografin can give reliable evidence of any leakage from the gastrointestinal tract. Materials and Methods: Urine samples of 33 patients, who underwent a Gastrografin-enhanced fluoroscopic examination of the esophagus or the GI-tract for different clinical reasons, were examined by CT. The samples had been taken immediately before and 60 to 90 minutes after application of 100 ml Gastrografin. The results were compared with those of 5 healthy volunteers, who took urine samples before, 30, 60, 90, and 120 minutes after drinking 100 ml of Gastrografin. Results: Maximal attenuation of the volunteers' urine samples was achieved 60 to 90 minutes after Gastrografin application with a mean of 50 Hounsfield units (HU), SD=17 HU. The urine of three patients with radiologically proven fistula or dehiscence of a GI-tract anastomosis had no relevant increase in attenuation. Three other cases without any clinical or radiological evidence of an anastomotic leak had a substantial increase in the attenuation of the urine probes (87, 110, and 290 HU, respectively). Conclusion: The CT-measured urine samples as evidence of renal excretion of orally or rectally applied Gastrografin are not reliable for the detection of leaks from the GI-tract. (orig.)

  2. Cross-cultural adaptation and validation of the Italian version of the Kerlan-Jobe Orthopaedic Clinic Shoulder and Elbow score.

    Science.gov (United States)

    Merolla, Giovanni; Corona, Katia; Zanoli, Gustavo; Cerciello, Simone; Giannotti, Stefano; Porcellini, Giuseppe

    2017-12-01

    The Kerlan-Jobe Orthopaedic Clinic (KJOC) Shoulder and Elbow score is a reliable and sensitive tool to measure the performance of overhead athletes. The purpose of this study was to carry out a cross-cultural adaptation and validation of the KJOC questionnaire in Italian and to assess its reliability, validity, and responsiveness. Ninety professional athletes with a painful shoulder were included in this study and were assigned to the "injury group" (n = 32) or the "overuse group" (n = 58); 65 were managed conservatively and 25 were treated by arthroscopic surgery. To assess the reliability of the KJOC score, patients were asked to fill in the questionnaire at baseline and after 2 weeks. To test the construct validity, KJOC scores were compared to those obtained with the Italian version of the Disabilities of the Arm, Shoulder, and Hand (DASH) scale, and with the DASH sports/performing arts module. To test KJOC score responsiveness, the follow-up KJOC scores of the participants treated conservatively were compared to those of the patients treated by arthroscopic surgery. Statistical analysis demonstrated that the KJOC questionnaire is reliable in terms of the single items and the overall score (ICC 0.95-0.99); that it has high construct validity (r s  = -0.697; p differences in shoulder function (p < 0.0001). The Italian version of the KJOC Shoulder and Elbow score performed in a similar way to the English version and demonstrated good validity, reliability, and responsiveness after conservative and surgical treatment. II.

  3. Applying ethnic-specific bone mineral density T-scores to Chinese women in the USA.

    Science.gov (United States)

    Lo, J C; Kim, S; Chandra, M; Ettinger, B

    2016-12-01

    Caucasian reference data are used to classify bone mineral density in US women of all races. However, use of Chinese American reference data yields lower osteoporosis prevalence in Chinese women. The reduction in osteoporosis labeling may be relevant for younger Chinese women at low fracture risk. Caucasian reference data are used for osteoporosis classification in US postmenopausal women regardless of race, including Asians who tend to have lower bone mineral density (BMD) than women of white race. This study examines BMD classification by ethnic T-scores for Chinese women. Using BMD data in a Northern California healthcare population, Chinese women aged 50-79 years were compared to age-matched white women (1:5 ratio), with femoral neck (FN), total hip (TH), and lumbar spine (LS) T-scores calculated using Caucasian versus Chinese American reference data. Comparing 4039 Chinese and 20,195 white women (44.8 % age 50-59 years, 37.5 % age 60-69 years, 17.7 % age 70-79 years), Chinese women had lower BMD T-scores at the FN, TH, and LS (median T-score 0.29-0.72 units lower across age groups, p age 50-64 years and 43.2 to 21.0 % for age 65-79 years). Use of Chinese American BMD reference data yields higher (ethnic) T-scores by 0.4-0.5 units, with a large proportion of Chinese women reclassified from osteoporosis to osteopenia. The reduction in osteoporosis labeling with ethnic T-scores may be relevant for younger Chinese women at low fracture risk.

  4. The test-retest reliability of the latent construct of executive function depends on whether tasks are represented as formative or reflective indicators.

    Science.gov (United States)

    Willoughby, Michael T; Kuhn, Laura J; Blair, Clancy B; Samek, Anya; List, John A

    2017-10-01

    This study investigates the test-retest reliability of a battery of executive function (EF) tasks with a specific interest in testing whether the method that is used to create a battery-wide score would result in differences in the apparent test-retest reliability of children's performance. A total of 188 4-year-olds completed a battery of computerized EF tasks twice across a period of approximately two weeks. Two different approaches were used to create a score that indexed children's overall performance on the battery-i.e., (1) the mean score of all completed tasks and (2) a factor score estimate which used confirmatory factor analysis (CFA). Pearson and intra-class correlations were used to investigate the test-retest reliability of individual EF tasks, as well as an overall battery score. Consistent with previous studies, the test-retest reliability of individual tasks was modest (rs ≈ .60). The test-retest reliability of the overall battery scores differed depending on the scoring approach (r mean  = .72; r factor_ score  = .99). It is concluded that the children's performance on individual EF tasks exhibit modest levels of test-retest reliability. This underscores the importance of administering multiple tasks and aggregating performance across these tasks in order to improve precision of measurement. However, the specific strategy that is used has a large impact on the apparent test-retest reliability of the overall score. These results replicate our earlier findings and provide additional cautionary evidence against the routine use of factor analytic approaches for representing individual performance across a battery of EF tasks.

  5. Large Sample Confidence Intervals for Item Response Theory Reliability Coefficients

    Science.gov (United States)

    Andersson, Björn; Xin, Tao

    2018-01-01

    In applications of item response theory (IRT), an estimate of the reliability of the ability estimates or sum scores is often reported. However, analytical expressions for the standard errors of the estimators of the reliability coefficients are not available in the literature and therefore the variability associated with the estimated reliability…

  6. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores

    Directory of Open Access Journals (Sweden)

    Maréchal Eric

    2008-08-01

    constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Conclusion Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.

  7. [Definition of the Diagnosis Osteomyelitis-Osteomyelitis Diagnosis Score (ODS)].

    Science.gov (United States)

    Schmidt, H G K; Tiemann, A H; Braunschweig, R; Diefenbeck, M; Bühler, M; Abitzsch, D; Haustedt, N; Walter, G; Schoop, R; Heppert, V; Hofmann, G O; Glombitza, M; Grimme, C; Gerlach, U-J; Flesch, I

    2011-08-01

    The disease "osteomyelitis" is characterised by different symptoms and parameters. Decisive roles in the development of the disease are played by the causative bacteria, the route of infection and the individual defense mechanisms of the host. The diagnosis is based on different symptoms and findings from the clinical history, clinical symptoms, laboratory results, diagnostic imaging, microbiological and histopathological analyses. While different osteomyelitis classifications have been published, there is to the best of our knowledge no score that gives information how sure the diagnosis "osteomyelitis" is in general. For any scientific study of a disease a valid definition is essential. We have developed a special osteomyelitis diagnosis score for the reliable classification of clinical, laboratory and technical findings. The score is based on five diagnostic procedures: 1) clinical history and risk factors, 2) clinical examination and laboratory results, 3) diagnostic imaging (ultrasound, radiology, CT, MRI, nuclear medicine and hybrid methods), 4) microbiology, and 5) histopathology. Each diagnostic procedure is related to many individual findings, which are weighted by a score system, in order to achieve a relevant value for each assessment. If the sum of the five diagnostic criteria is 18 or more points, the diagnosis of osteomyelitis can be viewed as "safe" (diagnosis class A). Between 8-17 points the diagnosis is "probable" (diagnosis class B). Less than 8 points means that the diagnosis is "possible, but unlikely" (class C diagnosis). Since each parameter can score six points at a maximum, a reliable diagnosis can only be achieved if at least 3 parameters are scored with 6 points. The osteomyelitis diagnosis score should help to avoid the false description of a clinical presentation as "osteomyelitis". A safe diagnosis is essential for the aetiology, treatment and outcome studies of osteomyelitis. © Georg Thieme Verlag KG Stuttgart · New York.

  8. An Objective Fluctuation Score for Parkinson's Disease

    Science.gov (United States)

    Horne, Malcolm K.; McGregor, Sarah; Bergquist, Filip

    2015-01-01

    Introduction Establishing the presence and severity of fluctuations is important in managing Parkinson’s Disease yet there is no reliable, objective means of doing this. In this study we have evaluated a Fluctuation Score derived from variations in dyskinesia and bradykinesia scores produced by an accelerometry based system. Methods The Fluctuation Score was produced by summing the interquartile range of bradykinesia scores and dyskinesia scores produced every 2 minutes between 0900-1800 for at least 6 days by the accelerometry based system and expressing it as an algorithm. Results This Score could distinguish between fluctuating and non-fluctuating patients with high sensitivity and selectivity and was significant lower following activation of deep brain stimulators. The scores following deep brain stimulation lay in a band just above the score separating fluctuators from non-fluctuators, suggesting a range representing adequate motor control. When compared with control subjects the score of newly diagnosed patients show a loss of fluctuation with onset of PD. The score was calculated in subjects whose duration of disease was known and this showed that newly diagnosed patients soon develop higher scores which either fall under or within the range representing adequate motor control or instead go on to develop more severe fluctuations. Conclusion The Fluctuation Score described here promises to be a useful tool for identifying patients whose fluctuations are progressing and may require therapeutic changes. It also shows promise as a useful research tool. Further studies are required to more accurately identify therapeutic targets and ranges. PMID:25928634

  9. Development and Validation of MRI Sacroiliac Joint Scoring Methods for the Semiaxial Scan Plane Corresponding to the Berlin and SPARCC MRI Scoring Methods, and of a New Global MRI Sacroiliac Joint Method

    DEFF Research Database (Denmark)

    Hededal, Pernille; Østergaard, Mikkel; Sørensen, Inge Juul

    2018-01-01

    OBJECTIVE: To develop semiaxial magnetic resonance imaging (MRI) scoring methods for assessment of sacroiliac joint (SIJ) bone marrow edema (BME) in patients with axial spondyloarthritis, and to compare the reliability with equivalent semicoronal scoring methods. METHODS: Two semiaxial SIJ MRI sc...

  10. Validity and Reliability of the Abbreviated Barratt Impulsiveness Scale in Spanish (BIS-15S)*

    Science.gov (United States)

    Orozco-Cabal, Luis; Rodríguez, Maritza; Herin, David V.; Gempeler, Juanita; Uribe, Miguel

    2010-01-01

    Objective This study determined the validity and reliability of a new, abbreviated version of the Spanish Barratt Impulsiveness Scale (BIS-15S) in Colombian subjects. Method The BIS-15S was tested in non-clinical (n=283) and clinical (n=164) native Spanish-speakers. Intra-scale reliability was calculated using Cronbach’s α, and test-retest reliability was measured with Pearson correlations. Psychometric properties were determined using standard statistics. A factor analysis was performed to determine BIS-15S factor structure. Results 447 subjects participated in the study. Clinical subjects were older and more educated compared to non-clinical subjects. Impulsivity scores were normally distributed in each group. BIS-15S total, motor, non-planning and attention scores were significantly lower in non-clinical vs. clinical subjects. Subjects with substance-related disorders had the highest BIS-15S total scores, followed by subjects with bipolar disorders and bulimia nervosa/binge eating. Internal consistency was 0.793 and test-retest reliability was 0.80. Factor analysis confirmed a three-factor structure (attention, motor, non-planning) accounting for 47.87% of the total variance in BIS-15S total scores. Conclusions The BIS-15S is a valid and reliable self-report measure of impulsivity in this population. Further research is needed to determine additional components of impulsivity not investigated by this measure. PMID:21152412

  11. Assessing reliability in energy supply systems

    International Nuclear Information System (INIS)

    McCarthy, Ryan W.; Ogden, Joan M.; Sperling, Daniel

    2007-01-01

    Reliability has always been a concern in the energy sector, but concerns are escalating as energy demand increases and the political stability of many energy supply regions becomes more questionable. But how does one define and measure reliability? We introduce a method to assess reliability in energy supply systems in terms of adequacy and security. It derives from reliability assessment frameworks developed for the electricity sector, which are extended to include qualitative considerations and to be applicable to new energy systems by incorporating decision-making processes based on expert opinion and multi-attribute utility theory. The method presented here is flexible and can be applied to any energy system. To illustrate its use, we apply the method to two hydrogen pathways: (1) centralized steam reforming of imported liquefied natural gas with pipeline distribution of hydrogen, and (2) on-site electrolysis of water using renewable electricity produced independently from the electricity grid

  12. Reliability and Validity of 3 Methods of Assessing Orthopedic Resident Skill in Shoulder Surgery.

    Science.gov (United States)

    Bernard, Johnathan A; Dattilo, Jonathan R; Srikumaran, Uma; Zikria, Bashir A; Jain, Amit; LaPorte, Dawn M

    Traditional measures for evaluating resident surgical technical skills (e.g., case logs) assess operative volume but not level of surgical proficiency. Our goal was to compare the reliability and validity of 3 tools for measuring surgical skill among orthopedic residents when performing 3 open surgical approaches to the shoulder. A total of 23 residents at different stages of their surgical training were tested for technical skill pertaining to 3 shoulder surgical approaches using the following measures: Objective Structured Assessment of Technical Skills (OSATS) checklists, the Global Rating Scale (GRS), and a final pass/fail assessment determined by 3 upper extremity surgeons. Adverse events were recorded. The Cronbach α coefficient was used to assess reliability of the OSATS checklists and GRS scores. Interrater reliability was calculated with intraclass correlation coefficients. Correlations among OSATS checklist scores, GRS scores, and pass/fail assessment were calculated with Spearman ρ. Validity of OSATS checklists was determined using analysis of variance with postgraduate year (PGY) as a between-subjects factor. Significance was set at p shoulder approaches. Checklist scores showed superior interrater reliability compared with GRS and subjective pass/fail measurements. GRS scores were positively correlated across training years. The incidence of adverse events was significantly higher among PGY-1 and PGY-2 residents compared with more experienced residents. OSATS checklists are a valid and reliable assessment of technical skills across 3 surgical shoulder approaches. However, checklist scores do not measure quality of technique. Documenting adverse events is necessary to assess quality of technique and ultimate pass/fail status. Multiple methods of assessing surgical skill should be considered when evaluating orthopedic resident surgical performance. Copyright © 2016 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights

  13. Reliability Estimation Based Upon Test Plan Results

    National Research Council Canada - National Science Library

    Read, Robert

    1997-01-01

    The report contains a brief summary of aspects of the Maximus reliability point and interval estimation technique as it has been applied to the reliability of a device whose surveillance tests contain...

  14. Reliability and Validity of the New Tanaka B Intelligence Scale Scores: A Group Intelligence Test

    OpenAIRE

    Uno, Yota; Mizukami, Hitomi; Ando, Masahiko; Yukihiro, Ryoji; Iwasaki, Yoko; Ozaki, Norio

    2014-01-01

    OBJECTIVE: The present study evaluated the reliability and concurrent validity of the new Tanaka B Intelligence Scale, which is an intelligence test that can be administered on groups within a short period of time. METHODS: The new Tanaka B Intelligence Scale and Wechsler Intelligence Scale for Children-Third Edition were administered to 81 subjects (mean age ± SD 15.2 ± 0.7 years) residing in a juvenile detention home; reliability was assessed using Cronbach's alpha coefficient, and concurre...

  15. Comparison of scoring approaches for the NEI VFQ-25 in low vision.

    Science.gov (United States)

    Dougherty, Bradley E; Bullimore, Mark A

    2010-08-01

    The aim of this study was to evaluate different approaches to scoring the National Eye Institute Visual Functioning Questionnaire-25 (NEI VFQ-25) in patients with low vision including scoring by the standard method, by Rasch analysis, and by use of an algorithm created by Massof to approximate Rasch person measure. Subscale validity and use of a 7-item short form instrument proposed by Ryan et al. were also investigated. NEI VFQ-25 data from 50 patients with low vision were analyzed using the standard method of summing Likert-type scores and calculating an overall average, Rasch analysis using Winsteps software, and the Massof algorithm in Excel. Correlations between scores were calculated. Rasch person separation reliability and other indicators were calculated to determine the validity of the subscales and of the 7-item instrument. Scores calculated using all three methods were highly correlated, but evidence of floor and ceiling effects was found with the standard scoring method. None of the subscales investigated proved valid. The 7-item instrument showed acceptable person separation reliability and good targeting and item performance. Although standard scores and Rasch scores are highly correlated, Rasch analysis has the advantages of eliminating floor and ceiling effects and producing interval-scaled data. The Massof algorithm for approximation of the Rasch person measure performed well in this group of low-vision patients. The validity of the subscales VFQ-25 should be reconsidered.

  16. A survey of NASA and military standards on fault tolerance and reliability applied to robotics

    Science.gov (United States)

    Cavallaro, Joseph R.; Walker, Ian D.

    1994-01-01

    There is currently increasing interest and activity in the area of reliability and fault tolerance for robotics. This paper discusses the application of Standards in robot reliability, and surveys the literature of relevant existing standards. A bibliography of relevant Military and NASA standards for reliability and fault tolerance is included.

  17. Histologic scoring indices for evaluation of disease activity in Crohn's disease.

    Science.gov (United States)

    Novak, Gregor; Parker, Claire E; Pai, Rish K; MacDonald, John K; Feagan, Brian G; Sandborn, William J; D'Haens, Geert; Jairath, Vipul; Khanna, Reena

    2017-07-21

    Histologic assessment of mucosal disease activity has been increasingly used in clinical trials of treatment for Crohn's disease. However, the operating properties of the currently existing histologic scoring indices remain unclear. A systematic review was undertaken to evaluate the development and operating characteristics of available histologic disease activity indices in Crohn's disease. Electronic searches of MEDLINE, EMBASE, PubMed, and the Cochrane Library (CENTRAL) databases from inception to 20 July 2016 were supplemented by manual reviews of bibliographies and abstracts submitted to major gastroenterology meetings (Digestive Disease Week, United European Gastroenterology Week, European Crohn's and Colitis Organisation). Any study design (e.g. randomised controlled trial, cohort study, case series) that evaluated a histologic disease activity index in patients with Crohn's disease was considered for inclusion. Study participants included adult patients (> 16 years), diagnosed with Crohn's disease using conventional clinical, radiographic or endoscopic criteria. Two authors independently reviewed the titles and abstracts of the studies identified from the literature search. The full text of potentially relevant citations were reviewed for inclusion and the study investigators were contacted as needed for clarification. Any disagreements regarding study eligibility were resolved by discussion and consensus with a third author.Two authors independently extracted and recorded data using a standard form. The following data were recorded from each eligible study: number of patients enrolled; number of patients per treatment arm; patient characteristics: age and gender distribution; description of histologic disease activity index utilized; and outcomes such as content validity, construct validity, criterion validity, responsiveness, intra-rater reliability, inter-rater reliability, and feasibility. Sixteen reports of 14 studies describing 14 different numerical

  18. Reliability of FAMACHA© chart for the evaluation of anaemia in ...

    African Journals Online (AJOL)

    The reliability of FAMACHA© chart for identifying anaemic goats was compared with Packed Cell Volume (PCV). The colour of the lower eyelids was graded with FAMACHA© chart based on FAMACHA© scores (FS) of 1-5. The animals were scored from severely anaemic (white or FS 5) through moderately anaemic (pink or ...

  19. Reliability measures in item response theory: manifest versus latent correlation functions.

    Science.gov (United States)

    Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel; Verbeke, Geert; De Boeck, Paul

    2015-02-01

    For item response theory (IRT) models, which belong to the class of generalized linear or non-linear mixed models, reliability at the scale of observed scores (i.e., manifest correlation) is more difficult to calculate than latent correlation based reliability, but usually of greater scientific interest. This is not least because it cannot be calculated explicitly when the logit link is used in conjunction with normal random effects. As such, approximations such as Fisher's information coefficient, Cronbach's α, or the latent correlation are calculated, allegedly because it is easy to do so. Cronbach's α has well-known and serious drawbacks, Fisher's information is not meaningful under certain circumstances, and there is an important but often overlooked difference between latent and manifest correlations. Here, manifest correlation refers to correlation between observed scores, while latent correlation refers to correlation between scores at the latent (e.g., logit or probit) scale. Thus, using one in place of the other can lead to erroneous conclusions. Taylor series based reliability measures, which are based on manifest correlation functions, are derived and a careful comparison of reliability measures based on latent correlations, Fisher's information, and exact reliability is carried out. The latent correlations are virtually always considerably higher than their manifest counterparts, Fisher's information measure shows no coherent behaviour (it is even negative in some cases), while the newly introduced Taylor series based approximations reflect the exact reliability very closely. Comparisons among the various types of correlations, for various IRT models, are made using algebraic expressions, Monte Carlo simulations, and data analysis. Given the light computational burden and the performance of Taylor series based reliability measures, their use is recommended. © 2014 The British Psychological Society.

  20. Acute imaging does not improve ASTRAL score's accuracy despite having a prognostic value.

    Science.gov (United States)

    Ntaios, George; Papavasileiou, Vasileios; Faouzi, Mohamed; Vanacker, Peter; Wintermark, Max; Michel, Patrik

    2014-10-01

    The ASTRAL score was recently shown to reliably predict three-month functional outcome in patients with acute ischemic stroke. The study aims to investigate whether information from multimodal imaging increases ASTRAL score's accuracy. All patients registered in the ASTRAL registry until March 2011 were included. In multivariate logistic-regression analyses, we added covariates derived from parenchymal, vascular, and perfusion imaging to the 6-parameter model of the ASTRAL score. If a specific imaging covariate remained an independent predictor of three-month modified Rankin score>2, the area-under-the-curve (AUC) of this new model was calculated and compared with ASTRAL score's AUC. We also performed similar logistic regression analyses in arbitrarily chosen patient subgroups. When added to the ASTRAL score, the following covariates on admission computed tomography/magnetic resonance imaging-based multimodal imaging were not significant predictors of outcome: any stroke-related acute lesion, any nonstroke-related lesions, chronic/subacute stroke, leukoaraiosis, significant arterial pathology in ischemic territory on computed tomography angiography/magnetic resonance angiography/Doppler, significant intracranial arterial pathology in ischemic territory, and focal hypoperfusion on perfusion-computed tomography. The Alberta Stroke Program Early CT score on plain imaging and any significant extracranial arterial pathology on computed tomography angiography/magnetic resonance angiography/Doppler were independent predictors of outcome (odds ratio: 0·93, 95% CI: 0·87-0·99 and odds ratio: 1·49, 95% CI: 1·08-2·05, respectively) but did not increase ASTRAL score's AUC (0·849 vs. 0·850, and 0·8563 vs. 0·8564, respectively). In exploratory analyses in subgroups of different prognosis, age or stroke severity, no covariate was found to increase ASTRAL score's AUC, either. The addition of information derived from multimodal imaging does not increase ASTRAL score

  1. Development of a valid and reliable test to assess trauma radiograph interpretation performance

    International Nuclear Information System (INIS)

    Neep, M.J.; Steffens, T.; Riley, V.; Eastgate, P.; McPhail, S.M.

    2017-01-01

    Objectives: The purpose of this investigation was to develop and examine the preliminary validity and reliability among radiographers of a test to assess trauma radiograph interpretation performance suitable for use among health professionals. Methods: Stage 1 examined 14,159 consecutive appendicular and axial examinations from a hospital emergency department over a 12 month period to quantify a typical anatomical region case-mix of trauma radiographs. A sample of radiographic cases representative of affected anatomical regions was then developed into the Image Interpretation Test (IIT). Stage 2 involved prospective investigations of the IIT's reliability (inter-rater, intra-rater, internal consistency) and validity (concurrent) among 41 radiographers. Results: The IIT included 60 cases. The median (interquartile range) clinical experience of participants was 5 (2–10) years. Case scores were internally consistent (Cronbach's alpha = 0.90). Favourable inter-rater reliability (kappa > 0.70 for 58/60 cases, Intra-class correlation coefficient (ICC) > 0.99 for total score) and intra-rater reliability (kappa > 0.90 for 60/60 cases, ICC > 0.99 for total score) was observed. There was a positive association between radiographers' confidence in image interpretation and IIT score (coefficient = 1.52, r-squared = 0.60, p < 0.001). Conclusions: The IIT developed during this investigation included a selection of radiographic cases consistent with anatomical regions represented in an adult trauma case-mix. This study has also provided foundational preliminary evidence to support the reliability and validity of the IIT among radiographers. The findings suggest that it is possible to assess image interpretation performance of adult trauma radiographs with this test. - Highlights: • Development of an Image Interpretation Test (IIT). • Cases consistent with anatomical regions represented in a typical adult trauma case-mix. • Development of a

  2. Validation and reliability of the Turkish Utian Quality-of-Life Scale in postmenopausal women.

    Science.gov (United States)

    Abay, Halime; Kaplan, Sena

    2016-04-01

    There are a limited number of menopause-specific quality-of-life scales for the Turkish population. This study was conducted to evaluate the validity and reliability of the Turkish Utian Quality-of-Life Scale in postmenopausal women. The study group was comprised of 250 postmenopausal women who applied to a training and research hospital's menopause clinic in Turkey. A survey form and the Turkish Utian quality-of-Life Scale were used to collect data, and the Turkish version of Short Form-36 was used to evaluate reliability with an equivalent form. Language-validity, content-validity, and construct-validity methods were used to assess the validity of the scale, and Cronbach's α coefficient calculation and the equivalent-form reliability methods were used to assess the reliability of the scale. The Turkish Utian Quality-of-Life Scale was determined to be a valid and reliable instrument for measuring the quality of life of postmenopausal women. Confirmatory factor analysis demonstrates that the instrument fits well with 23 items and a four-factor model. The Cronbach's α coefficient for the quality-of-life domains were as follows: 0.88 overall, 0.79 health, 0.78 emotional, 0.76 sexual, and 0.75 occupational. Reliability of the instrument was confirmed through significant correlations between scores on the Turkish version of the Utian Quality-of-Life Scale and the Turkish version of the Short Form-36 (r = 0.745, P measuring quality of life during menopause.

  3. Direct concurrent comparison of multiple pediatric acute asthma scoring instruments.

    Science.gov (United States)

    Johnson, Michael D; Nkoy, Flory L; Sheng, Xiaoming; Greene, Tom; Stone, Bryan L; Garvin, Jennifer

    2017-09-01

    Appropriate delivery of Emergency Department (ED) treatment to children with acute asthma requires clinician assessment of acute asthma severity. Various clinical scoring instruments exist to standardize assessment of acute asthma severity in the ED, but their selection remains arbitrary due to few published direct comparisons of their properties. Our objective was to test the feasibility of directly comparing properties of multiple scoring instruments in a pediatric ED. Using a novel approach supported by a composite data collection form, clinicians categorized elements of five scoring instruments before and after initial treatment for 48 patients 2-18 years of age with acute asthma seen at the ED of a tertiary care pediatric hospital ED from August to December 2014. Scoring instruments were compared for inter-rater reliability between clinician types and their ability to predict hospitalization. Inter-rater reliability between clinician types was not different between instruments at any point and was lower (weighted kappa range 0.21-0.55) than values reported elsewhere. Predictive ability of most instruments for hospitalization was higher after treatment than before treatment (p < 0.05) and may vary between instruments after treatment (p = 0.054). We demonstrate the feasibility of comparing multiple clinical scoring instruments simultaneously in ED clinical practice. Scoring instruments had higher predictive ability for hospitalization after treatment than before treatment and may differ in their predictive ability after initial treatment. Definitive conclusions about the best instrument or meaningful comparison between instruments will require a study with a larger sample size.

  4. Item Analysis to Improve Reliability for an Internal Medicine Undergraduate OSCE

    Science.gov (United States)

    Auewarakul, Chirayu; Downing, Steven M.; Praditsuwan, Rungnirand; Jaturatamrong, Uapong

    2005-01-01

    Utilization of objective structured clinical examinations (OSCEs) for final assessment of medical students in Internal Medicine requires a representative sample of OSCE stations. The reliability and generalizability of OSCE scores provides validity evidence for OSCE scores and supports its contribution to the final clinical grade of medical…

  5. A wireless-sensor scoring and training system for combative sports

    Science.gov (United States)

    Partridge, Kane; Hayes, Jason P.; James, Daniel A.; Hill, Craig; Gin, Gareth; Hahn, Allan

    2005-02-01

    Although historically among the most popular of sports, today, combative sports are often viewed as an expression of our savage past. Of primary concern are the long term effects of participating in these sports on the health of participants. The scoring of such sports has also been the subject of much debate, with a panel of judges making decisions about very quick events involving large sums of prize money. This paper describes an electronic system for use primarily in the sport of boxing, though it is suitable for martial arts such as karate and taekwondo. The technology is based on a previously described sensor platform and integrates a network of sensors on the athlete"s head, body and hands. Using a Bluetooth network, physical contacts are monitored in near real-time or post event on a remote computer to determine legal hits and hence derivative measures like scoring and final outcomes. It is hoped that this system can be applied to reduce the need for full contact contests as well as provide a more reliable method of determining the outcome of a bout. Other benefits presented here include the ability to analyse an athlete's performance post match or training session, such as assessing the efficacy of training drills and effects of fatigue.

  6. NASA reliability preferred practices for design and test

    Science.gov (United States)

    1991-01-01

    Given here is a manual that was produced to communicate within the aerospace community design practices that have contributed to NASA mission success. The information represents the best technical advice that NASA has to offer on reliability design and test practices. Topics covered include reliability practices, including design criteria, test procedures, and analytical techniques that have been applied to previous space flight programs; and reliability guidelines, including techniques currently applied to space flight projects, where sufficient information exists to certify that the technique will contribute to mission success.

  7. Development and reliability of the rating of compensatory movements in upper limb prosthesis wearers during work-related tasks.

    Science.gov (United States)

    van der Laan, Tallie M J; Postema, Sietke G; Reneman, Michiel F; Bongers, Raoul M; van der Sluis, Corry K

    2018-02-10

    Reliability study. Quantifying compensatory movements during work-related tasks may help to prevent musculoskeletal complaints in individuals with upper limb absence. (1) To develop a qualitative scoring system for rating compensatory shoulder and trunk movements in upper limb prosthesis wearers during the performance of functional capacity evaluation tests adjusted for use by 1-handed individuals (functional capacity evaluation-one handed [FCE-OH]); (2) to examine the interrater and intrarater reliability of the scoring system; and (3) to assess its feasibility. Movement patterns of 12 videotaped upper limb prosthesis wearers and 20 controls were analyzed. Compensatory movements were defined for each FCE-OH test, and a scoring system was developed, pilot tested, and adjusted. During reliability testing, 18 raters (12 FCE experts and 6 physiotherapists/gait analysts) scored videotapes of upper limb prosthesis wearers performing 4 FCE-OH tests 2 times (2 weeks apart). Agreement was expressed in % and kappa value. Feasibility (focus area's "acceptability", "demand," and "implementation") was determined by using a questionnaire. After 2 rounds of pilot testing and adjusting, reliability of a third version was tested. The interrater reliability for the first and second rating sessions were к = 0.54 (confidence interval [CI]: 0.52-0.57) and к = 0.64 (CI: 0.61-0.66), respectively. The intrarater reliability was к = 0.77 (CI: 0.72-0.82). The feasibility was good but could be improved by a training program. It seems possible to identify compensatory movements in upper limb prosthesis wearers during the performance of FCE-OH tests reliably by observation using the developed observational scoring system. Interrater reliability was satisfactory in most instances; intrarater reliability was good. Feasibility was established. Copyright © 2018 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.

  8. Validation and reliability of a Behcet’s Syndrome Activity Scale in Korea

    Science.gov (United States)

    Choi, Hyo Jin; Seo, Mi Ryoung; Ryu, Hee Jung; Baek, Han Joo

    2016-01-01

    Background/Aims: We prepared a cross-cultural adaptation of the Behcet’s Syndrome Activity Scale (BSAS) and evaluated its reliability and validity in Korea. Methods: Fifty patients with Behcet’s disease (BD) who attended the Rheumatology Clinic of Gachon University Gil Medical Center were included in this study. The first BSAS questionnaire was administered at each clinic visit, and the second questionnaire was completed at home within 24 hours of the visit. A Behcet’s Disease Current Activity Form (BDCAF) and a Behcet’s Disease Quality of Life (BDQOL) form were also given to patients. The test-retest reliability was analyzed by intraclass correlation coefficients (ICC). To assess the validity, the total BSAS score was compared with the BDCAF score, the patient/physician global assessment, and the BDQOL by Spearman rank correlation. Results: Twelve males and 38 females were enrolled. The mean age was 48.5 years and the mean disease duration was 6.7 years. Thirty-eight patients (76.0%) returned the questionnaire by mail. For the test-retest reliability, the two assessments were significantly correlated on all 10 items of the BSAS questionnaire (p < 0.05) and the total BSAS score (ICC, 0.925; p < 0.001). The total BSAS score was statistically correlated with the BDQOL, BDCAF, and patient/physician global assessment (p < 0.01). Conclusions: The Korean version of BSAS is a reliable and valid instrument to measure BD activity. PMID:26767871

  9. Examination of the reliability and validity of the Mindful Eating Questionnaire in pregnant women.

    Science.gov (United States)

    Apolzan, John W; Myers, Candice A; Cowley, Amanda D; Brady, Heather; Hsia, Daniel S; Stewart, Tiffany M; Redman, Leanne M; Martin, Corby K

    2016-05-01

    Mindfulness is theorized to affect the eating behavior and weight of pregnant women, yet no measure has been validated during pregnancy. This study qualitatively and quantitatively evaluated the reliability and validity of the Mindful Eating Questionnaire (MEQ) in overweight and obese pregnant women. Participants completed focus groups and cognitive interviews. The MEQ was administered twice to measure test-retest reliability. The Eating Inventory (EI) and Mindful Attention Awareness Scale (MAAS) were administered to assess convergent validity, and the Neighborhood Environment Walkability Scale (NEWS) assessed discriminant validity. Participants were 20 ± 8 weeks gestation (mean ± SD), 30 ± 2 years old, and 55% were obese. The MEQ total score had good test-retest reliability (r = .85). The total score internal consistency reliability was poor (Cronbach's α = .56). The external cues subscale (ECS) was not internally consistent (α = .31). Other subscales ranged from α = .59-.68. When the ECS was excluded, the MEQ total score internal consistency was acceptable (α = .62). Convergent validity was supported by the MEQ total score (with and without ECS) correlating significantly with the MAAS and the EI disinhibition and hunger subscales. Discriminant validity of the MEQ was supported by the MEQ and NEWS total scores and subscales not being significantly correlated. The quantitative results were supported by the qualitative context and content analysis. With the exception of the ECS, the MEQ's reliability and validity was supported in pregnant women, and most of the subscales were more robust in pregnant women than in the original sample of healthy adults. The MEQ's use with overweight and obese pregnant women is supported. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Development and validation of the Pediatric Anesthesia Behavior score--an objective measure of behavior during induction of anesthesia.

    Science.gov (United States)

    Beringer, Richard M; Greenwood, Rosemary; Kilpatrick, Nicky

    2014-02-01

    Measuring perioperative behavior changes requires validated objective rating scales. We developed a simple score for children's behavior during induction of anesthesia (Pediatric Anesthesia Behavior score) and assessed its reliability, concurrent validity, and predictive validity. Data were collected as part of a wider observational study of perioperative behavior changes in children undergoing general anesthesia for elective dental extractions. One-hundred and two healthy children aged 2-12 were recruited. Previously validated behavioral scales were used as follows: the modified Yale Preoperative Anxiety Scale (m-YPAS); the induction compliance checklist (ICC); the Pediatric Anesthesia Emergence Delirium scale (PAED); and the Post-Hospitalization Behavior Questionnaire (PHBQ). Pediatric Anesthesia Behavior (PAB) score was independently measured by two investigators, to allow assessment of interobserver reliability. Concurrent validity was assessed by examining the correlation between the PAB score, the m-YPAS, and the ICC. Predictive validity was assessed by examining the association between the PAB score, the PAED scale, and the PHBQ. The PAB score correlated strongly with both the m-YPAS (P risk of developing postoperative behavioral disturbance. This study provides evidence for its reliability and validity. © 2013 John Wiley & Sons Ltd.

  11. The STAR score: a method for auditing clinical records

    Science.gov (United States)

    Tuffaha, H

    2012-01-01

    INTRODUCTION Adequate medical note keeping is critical in delivering high quality healthcare. However, there are few robust tools available for the auditing of notes. The aim of this paper was to describe the design, validation and implementation of a novel scoring tool to objectively assess surgical notes. METHODS An initial ‘path finding’ study was performed to evaluate the quality of note keeping using the CRABEL scoring tool. The findings prompted the development of the Surgical Tool for Auditing Records (STAR) as an alternative. STAR was validated using inter-rater reliability analysis. An audit cycle of surgical notes using STAR was performed. The results were analysed and a structured form for the completion of surgical notes was introduced to see if the quality improved in the next audit cycle using STAR. An education exercise was conducted and all participants said the exercise would change their practice, with 25% implementing major changes. RESULTS Statistical analysis of STAR showed that it is reliable (Cronbach’s a = 0.959). On completing the audit cycle, there was an overall increase in the STAR score from 83.344% to 97.675% (p<0.001) with significant improvements in the documentation of the initial clerking from 59.0% to 96.5% (p<0.001) and subsequent entries from 78.4% to 96.1% (p<0.001). CONCLUSIONS The authors believe in the value of STAR as an effective, reliable and reproducible tool. Coupled with the application of structured forms to note keeping, it can significantly improve the quality of surgical documentation and can be implemented universally. PMID:22613300

  12. The STAR score: a method for auditing clinical records.

    Science.gov (United States)

    Tuffaha, H; Amer, T; Jayia, P; Bicknell, C; Rajaretnam, N; Ziprin, P

    2012-05-01

    Adequate medical note keeping is critical in delivering high quality healthcare. However, there are few robust tools available for the auditing of notes. The aim of this paper was to describe the design, validation and implementation of a novel scoring tool to objectively assess surgical notes. An initial 'path finding' study was performed to evaluate the quality of note keeping using the CRABEL scoring tool. The findings prompted the development of the Surgical Tool for Auditing Records (STAR) as an alternative. STAR was validated using inter-rater reliability analysis. An audit cycle of surgical notes using STAR was performed. The results were analysed and a structured form for the completion of surgical notes was introduced to see if the quality improved in the next audit cycle using STAR. An education exercise was conducted and all participants said the exercise would change their practice, with 25% implementing major changes. Statistical analysis of STAR showed that it is reliable (Cronbach's α = 0.959). On completing the audit cycle, there was an overall increase in the STAR score from 83.344% to 97.675% (p < 0.001) with significant improvements in the documentation of the initial clerking from 59.0% to 96.5% (p < 0.001) and subsequent entries from 78.4% to 96.1% (p < 0.001). The authors believe in the value of STAR as an effective, reliable and reproducible tool. Coupled with the application of structured forms to note keeping, it can significantly improve the quality of surgical documentation and can be implemented universally.

  13. Approach to reliability assessment

    International Nuclear Information System (INIS)

    Green, A.E.; Bourne, A.J.

    1975-01-01

    Experience has shown that reliability assessments can play an important role in the early design and subsequent operation of technological systems where reliability is at a premium. The approaches to and techniques for such assessments, which have been outlined in the paper, have been successfully applied in variety of applications ranging from individual equipments to large and complex systems. The general approach involves the logical and systematic establishment of the purpose, performance requirements and reliability criteria of systems. This is followed by an appraisal of likely system achievment based on the understanding of different types of variational behavior. A fundamental reliability model emerges from the correlation between the appropriate Q and H functions for performance requirement and achievement. This model may cover the complete spectrum of performance behavior in all the system dimensions

  14. RENZI SCORE FOR OBSTRUCTED DEFECATION SYNDROME - VALIDATION OF THE PORTUGUESE VERSION ACCORDING TO THE COSMIN CHECKLIST.

    Science.gov (United States)

    Caetano, Ana Celia; Dias, Sara; Santa-Cruz, André; Rolanda, Carla

    2018-01-01

    Recently, the Obstructed Defecation Syndrome score (ODS score) was developed and validated by Renzi to assess clinical staging and to allow evaluation and comparison of the efficacy of treatment of this disorder. Our goal is to validate the Portuguese version of Renzi ODS score, according to the Consensus based Standards for the selection of the Health Measurement Instruments (COSMIN) checklist. Following guidelines for cross-cultural validity, Renzi ODS score was translated into the Portuguese language. Then, a group of patients and healthy controls were invited to fill in the Renzi ODS score at baseline, after 2 weeks and 3 months, respectively. We assessed internal consistency, reliability and measurement error, content and construct validity, responsiveness and interpretability. A total of 113 individuals (77 patients; 36 healthy controls) completed the questionnaire. Seventy and 30 patients repeated the Renzi ODS score after 2 weeks and 3 months respectively. Factor analysis confirmed the unidimensionality of the scale. Cronbach's α coefficient of 0.77 supported item's homogeneity. Weighted quadratic kappa of 0.89 established test-retest reliability. The smallest detectable change at the individual level was 2.66 and at the group level was 0.30. Renzi ODS score and the total (-0.32) and physical (-0.43) SF-36 scores correlated negatively. Patient and control's groups significantly differed (11 points). The change score of Renzi ODS score between baseline and 3 months correlated negatively with the clinical evolution (-0.86). ROC analysis showed minimal important change of 2.00 with AUC 0.97. Neither floor nor ceiling effects were observed. This work validated the Portuguese version of Renzi ODS score. We can now use this reliable, responsive, and interpretable (at the group level) tool to evaluate Portuguese ODS patients.

  15. Preliminary validation of 2 magnetic resonance image scoring systems for osteoarthritis of the hip according to the OMERACT filter.

    Science.gov (United States)

    Maksymowych, Walter P; Cibere, Jolanda; Loeuille, Damien; Weber, Ulrich; Zubler, Veronika; Roemer, Frank W; Jaremko, Jacob L; Sayre, Eric C; Lambert, Robert G W

    2014-02-01

    Development of a validated magnetic resonance image (MRI) scoring system is essential in hip OA because radiographs are insensitive to change. We assessed the feasibility and reliability of 2 previously developed scoring methods: (1) the Hip Inflammation MRI Scoring System (HIMRISS) and (2) the Hip Osteoarthritis MRI Scoring System (HOAMS). Six readers (3 radiologists, 3 rheumatologists) participated in 2 reading exercises. In Reading Exercise 1, MRI of the hip of 20 subjects were read at a single time point followed by further standardization of methodology. In Reading Exercise 2, MRI of the hip of 18 subjects from a randomized controlled trial, assessed at 2 timepoints, and 27 subjects from a cross-sectional study were read for HIMRISS and HOAMS bone marrow lesions (BML) and synovitis. Reliability was assessed using intraclass correlation coefficient (ICC) and kappa statistics. Both methods were considered feasible. For Reading 1, HIMRISS ICC were 0.52, 0.61, 0.70, and 0.58 for femoral BML, acetabular BML, effusion, and total scores, respectively; and for HOAMS, summed BML and synovitis ICC were 0.52 and 0.46, respectively. For Reading 2, HIMRISS and HOAMS ICC for BML and synovitis-effusion improved substantially. Interobserver reliability for change scores was 0.81 and 0.71 for HIMRISS femoral and HOAMS summed BML, respectively. Responsiveness and discrimination was moderate to high for synovitis-effusion. Significant associations were noted between BML or synovitis scores and Western Ontario and McMaster Universities Osteoarthritis Index pain scores for baseline values (p ≤ 0.001). The BML and synovitis-effusion components of both HIMRISS and HOAMS scoring systems are feasible and reliable, and should be validated further.

  16. Automatic Algorithm for the Determination of the Anderson-wilkins Acuteness Score In Patients With St Elevation Myocardial Infarction

    DEFF Research Database (Denmark)

    Fakhri, Yama; Sejersten, Maria; Schoos, Mikkel Malby

    2016-01-01

    using 50 ECGs. Each ECG lead (except aVR) was manually scored according to AW-score by two independent experts (Exp1 and Exp2) and automatically by our designed algorithm (auto-score). An adjudicated manual score (Adj-score) was determined between Exp1 and Exp2. The inter-rater reliabilities (IRRs...

  17. Validity and test–retest reliability of the Persian version of the Montgomery–Asberg Depression Rating Scale

    Science.gov (United States)

    Ahmadpanah, Mohammad; Sheikhbabaei, Meisam; Haghighi, Mohammad; Roham, Fatemeh; Jahangard, Leila; Akhondi, Amineh; Sadeghi Bahmani, Dena; Bajoghli, Hafez; Holsboer-Trachsler, Edith; Brand, Serge

    2016-01-01

    Background and aims The Montgomery–Asberg Depression Rating Scale (MADRS) is an expert’s rating tool to assess the severity and symptoms of depression. The aim of the present two studies was to validate the Persian version of the MADRS and determine its test–retest reliability in patients diagnosed with major depressive disorders (MDD). Methods In study 1, the translated MADRS and the Hamilton Depression Rating Scale (HDRS) were applied to 210 patients diagnosed with MDD and 100 healthy adults. In study 2, 200 patients diagnosed with MDD were assessed with the MADRS in face-to-face interviews. Thereafter, 100 patients were assessed 3–14 days later, again via face-to-face-interviews, while the other 100 patients were assessed 3–14 days later via a telephone interview. Results Study 1: The MADRS and HDRS scores between patients with MDD and healthy controls differed significantly. Agreement between scoring of the MADRS and HDRS was high (r=0.95). Study 2: The intraclass correlation coefficient (test–retest reliability) was r=0.944 for the face-to-face interviews, and r=0.959 for the telephone interviews. Conclusion The present data suggest that the Persian MADRS has high validity and excellent test–retest reliability over a time interval of 3–14 days, irrespective of whether the second assessment was carried out face-to-face or via a telephone interview. PMID:27022265

  18. Reliability and Validity of the ‘‘Personal Well-Being Index- Cognitive Disability’’ on Mentally Retarded Students

    Directory of Open Access Journals (Sweden)

    Alireza Agha Yousefi

    2013-06-01

    Full Text Available Objective:Having a good quality of life has always been desirable for humans, and the concept of a good life and the ways of achieving it have become important over the years. Personal wellbeing is the mental component of quality of life. Thus, the current study was conducted to assess the reliability and validity of the ‘‘Personal Well-Being Index- Cognitive Disability’’ on mentally retarded students.Method:200 mentally retarded students in north districts of Tehran (districts 1, 2 and 3 were selected by systematic random sampling. The collected data using Personal Well-Being Index- Cognitive Disability was analyzed by Cronbach’s alpha coefficient for internal consistency and linear multivariate regression for construct validity.Results:Results confirmed the reliability and validity for the Personal Well-Being Index- Cognitive Disability in mentally retarded students of exceptional schools. Studying the internal consistency of seven items showed that all the items were correlated with the total score and their scores averages were similar to each other. This indicates that the test’s questions have reliability with regard to evaluation of a common feature and results showed Personal Well-Being Index- Cognitive Disability had the most extensive coverage of construct validity .Conclusion:Personal Well-Being Index- Cognitive Disability scale could be applied to measure personal wellbeing in mentally retarded students.

  19. Reliability of concussion history in former professional football players.

    Science.gov (United States)

    Kerr, Zachary Y; Marshall, Stephen W; Guskiewicz, Kevin M

    2012-03-01

    The reliability of athletes to recall and self-report a concussion history has never been quantified. This study examined the reliability of the self-report concussion history measure and explored determinants of recall in the number of self-reported concussions in a group of retired professional football players. In 2001, a short questionnaire was administered to a cohort of former professional football players to ascertain the number of self-reported concussions they sustained during their professional playing careers. In 2010, the same instrument was readministered to a subset (n = 899) of the original cohort to assess reliability. Overall reliability was moderate (weighted Cohen κ = 0.48). The majority (62.1%) reported the same number of concussions in both administrations (2001 and 2010); 31.4% reported more concussions in the second administration. Compared with the "same number reported" group, the "greater number reported" group had more deficits in the second administration in their Short Form 36 physical health (composite score combining physical functioning, role physical, bodily pain, general health) and mental health (e.g., composite score combining vitality, social functioning, role emotional) scales. The self-reported concussion history had moderate reliability in former professional football players, on the basis of two administrations of the same instrument, 9 yr apart. However, changes in health status may be differentially associated with recall of concussions.

  20. Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

    Science.gov (United States)

    Lee, Yi-Hsuan; Zhang, Jinming

    2017-01-01

    Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…

  1. Automated Scoring of Constructed-Response Science Items: Prospects and Obstacles

    Science.gov (United States)

    Liu, Ou Lydia; Brew, Chris; Blackmore, John; Gerard, Libby; Madhok, Jacquie; Linn, Marcia C.

    2014-01-01

    Content-based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept-based scoring tool for content-based scoring, c-rater™, for four science items with rubrics…

  2. Reliability of cognitive tests of ELSA-Brasil, the brazilian longitudinal study of adult health

    Science.gov (United States)

    Batista, Juliana Alves; Giatti, Luana; Barreto, Sandhi Maria; Galery, Ana Roscoe Papini; Passos, Valéria Maria de Azeredo

    2013-01-01

    Cognitive function evaluation entails the use of neuropsychological tests, applied exclusively or in sequence. The results of these tests may be influenced by factors related to the environment, the interviewer or the interviewee. OBJECTIVES We examined the test-retest reliability of some tests of the Brazilian version from the Consortium to Establish a Registry for Alzheimer's disease. METHODS The ELSA-Brasil is a multicentre study of civil servants (35-74 years of age) from public institutions across six Brazilian States. The same tests were applied, in different order of appearance, by the same trained and certified interviewer, with an approximate 20-day interval, to 160 adults (51% men, mean age 52 years). The Intraclass Correlation Coefficient (ICC) was used to assess the reliability of the measures; and a dispersion graph was used to examine the patterns of agreement between them. RESULTS We observed higher retest scores in all tests as well as a shorter test completion time for the Trail Making Test B. ICC values for each test were as following: Word List Learning Test (0.56), Word Recall (0.50), Word Recognition (0.35), Phonemic Verbal Fluency Test (VFT, 0.61), Semantic VFT (0.53) and Trail B (0.91). The Bland-Altman plot showed better correlation of executive function (VFT and Trail B) than of memory tests. CONCLUSIONS Better performance in retest may reflect a learning effect, and suggest that retest should be repeated using alternate forms or after longer periods. In this sample of adults with high schooling level, reliability was only moderate for memory tests whereas the measurement of executive function proved more reliable. PMID:29213860

  3. Reliability of Cognitive Tests of ELSA-Brasil, the Brazilian Longitudinal Study of Adult Health

    Directory of Open Access Journals (Sweden)

    Juliana Alves Batista

    Full Text Available ABSTRACT Cognitive function evaluation entails the use of neuropsychological tests, applied exclusively or in sequence. The results of these tests may be influenced by factors related to the environment, the interviewer or the interviewee. Objectives: We examined the test-retest reliability of some tests of the Brazilian version from the Consortium to Establish a Registry for Alzheimer's disease. Methods: The ELSA-Brasil is a multicentre study of civil servants (35-74 years of age from public institutions across six Brazilian States. The same tests were applied, in different order of appearance, by the same trained and certified interviewer, with an approximate 20-day interval, to 160 adults (51% men, mean age 52 years. The Intraclass Correlation Coefficient (ICC was used to assess the reliability of the measures; and a dispersion graph was used to examine the patterns of agreement between them. Results: We observed higher retest scores in all tests as well as a shorter test completion time for the Trail Making Test B. ICC values for each test were as following: Word List Learning Test (0.56, Word Recall (0.50, Word Recognition (0.35, Phonemic Verbal Fluency Test (VFT, 0.61, Semantic VFT (0.53 and Trail B (0.91. The Bland-Altman plot showed better correlation of executive function (VFT and Trail B than of memory tests. Conclusions: Better performance in retest may reflect a learning effect, and suggest that retest should be repeated using alternate forms or after longer periods. In this sample of adults with high schooling level, reliability was only moderate for memory tests whereas the measurement of executive function proved more reliable.

  4. Validity and reliability of tests determining performance-related components of wheelchair basketball

    NARCIS (Netherlands)

    De Groot, Sonja; Balvers, Inge J. M.; Kouwenhoven, Sanne M.; Janssen, Thomas W. J.

    2012-01-01

    The purpose of this study was to investigate the reliability and validity of wheelchair basketball field tests. Nineteen wheelchair basketball players performed 10 test items twice to determine the reliability. The validity of the tests was assessed by relating the scores to the players'

  5. Validity and reliability of tests determining performance-related components of wheelchair basketball

    NARCIS (Netherlands)

    de Groot, Sonja; Balvers, Inge J.M.; Kouwenhoven, Sanne M.; Janssen, Thomas W.J.

    The purpose of this study was to investigate the reliability and validity of wheelchair basketball field tests. Nineteen wheelchair basketball players performed 10 test items twice to determine the reliability. The validity of the tests was assessed by relating the scores to the players'

  6. Reliability and validity of migraine disability assessment questionnaire-Thai version (Thai-MIDAS).

    Science.gov (United States)

    Seethong, Piman; Nimmannit, Akarin; Chaisewikul, Rungsan; Prayoonwiwat, Naraporn; Chotinaiwattarakul, Wattanachai

    2013-02-01

    To assess the validity and test-retest reliability of a Thai translation of the Migraine Disability Assessment (MIDAS) Questionnaire in Thai patients with migraine. Migraineurs from the Headache Clinic in Siriraj Hospital were recruited and asked to complete a 13-weeks diary and answered the Thai-MIDAS at once. Some participants were asked to provide the 2nd Thai-MIDAS in the next 2 weeks for test-retest reliability. Ninety-three patients had completed the 13-weeks diaries. Age range was 18-58 years with mean 37.69 +/- 9.60 years. All 5 items and the total score of Thai-MIDAS were moderately correlated with data from 13-weeks diary (Spearman's correlation coefficient = 0.32-0.62). The test-retest reliability of the total score of Thai-MIDAS in 30 patients demonstrated a highly reliable degree of intraclass correlation (ICC = 0.76, 95% CI 0.49-0.88). The present study reveals that the Thai-MIDAS has satisfactory validity and reliability in comparison with the original English MIDAS version.

  7. Reliability of the Alzheimer's disease assessment scale (ADAS-Cog) in longitudinal studies.

    Science.gov (United States)

    Khan, Anzalee; Yavorsky, Christian; DiClemente, Guillermo; Opler, Mark; Liechti, Stacy; Rothman, Brian; Jovic, Sofija

    2013-11-01

    Considering the scarcity of longitudinal assessments of reliability, there is need for a more precise understanding of cognitive decline in Alzheimer's Disease (AD). The primary goal was to assess longitudinal changes in inter-rater reliability, test retest reliability and internal consistency of scores of the ADAS-Cog. 2,618 AD subjects were enrolled in seven randomized, double-blind, placebo-controlled, multicenter-trials from 1986 to 2009. Reliability, internal-consistency and cross-sectional analysis of ADAS-Cog and MMSE across seven visits were examined. Intra-class correlation (ICC) for ADAS-Cog was moderate to high supporting their reliability. Absolute Agreement ICCs 0.392 (Visit-7) to 0.806 (Visit-2) showed a progressive decrease in correlations across time. Item analysis revealed a decrease in item correlations, with the lowest correlations for Visit 7 for Commands (ICC=0.148), Comprehension (ICC=0.092), Spoken Language (ICC=0.044). Suitable assessment of AD treatments is maintained through accurate measurement of clinically significant outcomes. Targeted rater education ADAS-Cog items over-time can improve ability to administer and score the scale.

  8. Distribution system reliability evaluation using credibility theory | Xu ...

    African Journals Online (AJOL)

    In this paper, a hybrid algorithm based on fuzzy simulation and Failure Mode and Effect Analysis (FMEA) is applied to determine fuzzy reliability indices of distribution system. This approach can obtain fuzzy expected values and their variances of reliability indices, and the credibilities of reliability indices meeting specified ...

  9. MODIFIED ALVARADO SCORING IN ACUTE APPENDICITIS

    Directory of Open Access Journals (Sweden)

    Varadarajan Sujath

    2016-12-01

    Full Text Available BACKGROUND Acute appendicitis is one of the most common surgical emergencies with a lifetime presentation of approximately 1 in 7. Its incidence is 1.5-1.9/1000 in males and females. Surgery for acute appendicitis is based on history, clinical examination and laboratory investigations (e.g. WBC count. Imaging techniques add very little to the efficacy in the diagnosis of appendix. A negative appendicectomy rate of 20-40% has been reported in literature. A difficulty in diagnosis is experienced in very young patients and females of reproductive age. The diagnostic accuracy in assessing acute appendicitis has not improved in spite of rapid advances in management. MATERIALS AND METHODS The modified Alvarado score was applied and assessed for its accuracy in preparation diagnosis of acute appendicitis in 50 patients. The aim of our study is to understand the various presentations of acute appendicitis including the age and gender incidence and the application of the modified Alvarado scoring system in our hospital setup and assessment of the efficacy of the score. RESULTS Our study shows that most involved age group is 3 rd decade with male preponderance. On application of Alvarado score, nausea and vomiting present in 50% and anorexia in 30%, leucocytosis was found in 75% of cases. Sensitivity and specificity of our study were 65% and 40% respectively with positive predictive value of 85% and negative predictive value of 15%. CONCLUSION This study showed that clinical scoring like the Alvarado score can be a cheap and quick tool to apply in emergency departments to rule out acute appendicitis. The implementation of modified Alvarado score is simple and cost effective.

  10. Interrater and Test-Retest Reliability and Minimal Detectable Change of the Balance Evaluation Systems Test (BESTest) and Subsystems With Community-Dwelling Older Adults.

    Science.gov (United States)

    Wang-Hsu, Elizabeth; Smith, Susan S

    2017-01-10

    Falls are a common cause of injuries and hospital admissions in older adults. Balance limitation is a potentially modifiable factor contributing to falls. The Balance Evaluation Systems Test (BESTest), a clinical balance measure, categorizes balance into 6 underlying subsystems. Each of the subsystems is scored individually and summed to obtain a total score. The reliability of the BESTest and its individual subsystems has been reported in patients with various neurological disorders and cancer survivors. However, the reliability and minimal detectable change (MDC) of the BESTest with community-dwelling older adults have not been reported. The purposes of our study were to (1) determine the interrater and test-retest reliability of the BESTest total and subsystem scores; and (2) estimate the MDC of the BESTest and its individual subsystem scores with community-dwelling older adults. We used a prospective cohort methodological design. Community-dwelling older adults (N = 70; aged 70-94 years; mean = 85.0 [5.5] years) were recruited from a senior independent living community. Trained testers (N = 3) administered the BESTest. All participants were tested with the BESTest by the same tester initially and then retested 7 to 14 days later. With 32 of the participants, a second tester concurrently scored the retest for interrater reliability. Testers were blinded to each other's scores. Intraclass correlation coefficients [ICC(2,1)] were used to determine the interrater and test-retest reliability. Test-retest reliability was also analyzed using method error and the associated coefficients of variation (CVME). MDC was calculated using standard error of measurement. Interrater reliability (N = 32) of the BESTest total score was ICC(2, 1) = 0.97 (95% confidence interval [CI], 0.94-0.99). The ICCs for the individual subsystem scores ranged from 0.85 to 0.94. Test-retest reliability (N = 70) of the BESTest total score was ICC(2,1) = 0.93 (95% CI, 0.89-0.96). ICCs for the

  11. Reliability and Accuracy of Cross-sectional Radiographic Assessment of Severe Knee Osteoarthritis: Role of Training and Experience.

    Science.gov (United States)

    Klara, Kristina; Collins, Jamie E; Gurary, Ellen; Elman, Scott A; Stenquist, Derek S; Losina, Elena; Katz, Jeffrey N

    2016-07-01

    To dêtermine the reliability of radiographic assessment of knee osteoarthritis (OA) by nonclinician readers compared to an experienced radiologist. The radiologist trained 3 nonclinicians to evaluate radiographic characteristics of knee OA. The radiologist and nonclinicians read preoperative films of 36 patients prior to total knee replacement. Intrareader and interreader reliability were measured using the weighted κ statistic and intraclass correlation coefficient (ICC). Scores κ reliability among nonclinicians (κ) ranged from 0.40 to 1.0 for individual radiographic features and 0.72 to 1.0 for Kellgren-Lawrence (KL) grade. ICC ranged from 0.89 to 0.98 for the Osteoarthritis Research Society International (OARSI) summary score. Interreader agreement among nonclinicians ranged from κ of 0.45 to 0.94 for individual features, and 0.66 to 0.97 for KL grade. ICC ranged from 0.87 to 0.96 for the OARSI Summary Score. Interreader reliability between nonclinicians and the radiologist ranged from κ of 0.56 to 0.85 for KL grade. ICC ranged from 0.79 to 0.88 for the OARSI Summary Score. Intrareader and interreader agreement was variable for individual radiograph features but substantial for summary KL grade and OARSI Summary Score. Investigators face tradeoffs between cost and reader experience. These data suggest that in settings where costs are constrained, trained nonclinicians may be suitable readers of radiographic knee OA, particularly if a summary score (KL grade or OARSI Score) is used to determine radiographic severity.

  12. Reliability of "Google" for obtaining medical information

    Directory of Open Access Journals (Sweden)

    Mihir Kothari

    2015-01-01

    Full Text Available Internet is used by many patients to obtain relevant medical information. We assessed the impact of "Google" search on the knowledge of the parents whose ward suffered from squint. In 21 consecutive patients, the "Google" search improved the mean score of the correct answers from 47% to 62%. We found that "Google" search was useful and reliable source of information for the patients with regards to the disease etiopathogenesis and the problems caused by the disease. The internet-based information, however, was incomplete and not reliable with regards to the disease treatment.

  13. Divorce and Child Behavior Problems: Applying Latent Change Score Models to Life Event Data

    Science.gov (United States)

    Malone, Patrick S.; Lansford, Jennifer E.; Castellino, Domini R.; Berlin, Lisa J.; Dodge, Kenneth A.; Bates, John E.; Pettit, Gregory S.

    2009-01-01

    Effects of parents' divorce on children's adjustment have been studied extensively. This article applies new advances in trajectory modeling to the problem of disentangling the effects of divorce on children's adjustment from related factors such as the child's age at the time of divorce and the child's gender. Latent change score models were used to examine trajectories of externalizing behavior problems in relation to children's experience of their parents' divorce. Participants included 356 boys and girls whose biological parents were married at kindergarten entry. The children were assessed annually through Grade 9. Mothers reported whether they had divorced or separated in each 12-month period, and teachers reported children's externalizing behavior problems each year. Girls' externalizing behavior problem trajectories were not affected by experiencing their parents' divorce, regardless of the timing of the divorce. In contrast, boys who were in elementary school when their parents divorced showed an increase in externalizing behavior problems in the year of the divorce. This increase persisted in the years following the divorce. Boys who were in middle school when their parents divorced showed an increase in externalizing behavior problems in the year of the divorce followed by a decrease to below baseline levels in the year after the divorce. This decrease persisted in the following years. PMID:20209039

  14. Reliability of four experimental mechanical pain tests in children

    Directory of Open Access Journals (Sweden)

    Soee AL

    2013-02-01

    Full Text Available Ann-Britt L Soee,1 Lise L Thomsen,2 Birte Tornoe,1,3 Liselotte Skov11Department of Pediatrics, Children’s Headache Clinic, Copenhagen University Hospital Herlev, Copenhagen, Denmark; 2Department of Neuropediatrics, Juliane Marie Centre, Copenhagen University Hospital Rigshospitalet, København Ø, Denmark; 3Department of Physiotherapy, Medical Department O, Copenhagen University Hospital Herlev, Herlev, DenmarkPurpose: In order to study pain in children, it is necessary to determine whether pain measurement tools used in adults are reliable measurements in children. The aim of this study was to explore the intrasession reliability of pressure pain thresholds (PPT in healthy children. Furthermore, the aim was also to study the intersession reliability of the following four tests: (1 Total Tenderness Score; (2 PPT; (3 Visual Analog Scale score at suprapressure pain threshold; and (4 area under the curve (stimulus–response functions for pressure versus pain.Participants and methods: Twenty-five healthy school children, 8–14 years of age, participated. Test 2, PPT, was repeated three times at 2 minute intervals on the same day to estimate PPT intrasession reliability using Cronbach’s alpha. Tests 1–4 were repeated after median 21 (interquartile range 10.5–22 days, and Pearson’s correlation coefficient was used to describe the intersession reliability.Results: The PPT test was precise and reliable (Cronbach’s alpha ≥ 0.92. All tests showed a good to excellent correlation between days (intersessions r = 0.66–0.81. There were no indications of significant systematic differences found in any of the four tests between days.Conclusion: All tests seemed to be reliable measurements in pain evaluation in healthy children aged 8–14 years. Given the small sample size, this conclusion needs to be confirmed in future studies.Keywords: repeatability, intraindividual reliability, pressure pain threshold, pain measurement, algometer

  15. Quality of Life among Persons with HIV/AIDS in Iran: Internal Reliability and Validity of an International Instrument and Associated Factors

    Directory of Open Access Journals (Sweden)

    Pedram Razavi

    2012-01-01

    Full Text Available The purpose of this cross-sectional study on 191 HIV/AIDS patient was to prepare the first Persian translation of complete WHOQOL-HIV instrument, evaluate its reliability and validity, and apply it to determine quality of life and its associated factors in Tehran, Iran. Student's t-test was used to compare quality of life between groups. Mean Cronbach’s α of facets in all six domains of instrument were more than 0.6 indicating good reliability. Item/total corrected correlations coefficients had a lower limit of more than 0.5 in all facets except for association between energy and fatigue facet and physical domain. Compared to younger participants, patients older than 35 years had significantly lower scores in overall quality of life (P = 0.003, social relationships (P = 0.021, and spirituality/religion/personal beliefs (P = 0.024. Unemployed patients had significantly lower scores in overall quality of life (P = 0.01, level of independence (P = 0.004, and environment (P = 0.001 compared to employed participants. This study demonstrated that the standard, complete WHOQOL-HIV 120 instrument translated into Farsi and evaluated among Iranian participants provides a reliable and valid basis for future research on quality of life for HIV and other patients in Iran.

  16. Validation of a single summary score for the Prolapse/Incontinence Sexual Questionnaire-IUGA revised (PISQ-IR).

    Science.gov (United States)

    Constantine, Melissa L; Pauls, Rachel N; Rogers, Rebecca R; Rockwood, Todd H

    2017-12-01

    The Prolapse/Incontinence Sexual Questionnaire-International Urogynecology Association (IUGA) Revised (PISQ-IR) measures sexual function in women with pelvic floor disorders (PFDs) yet is unwieldy, with six individual subscale scores for sexually active women and four for women who are not. We hypothesized that a valid and responsive summary score could be created for the PISQ-IR. Item response data from participating women who completed a revised version of the PISQ-IR at three clinical sites were used to generate item weights using a magnitude estimation (ME) and Q-sort (Q) approaches. Item weights were applied to data from the original PISQ-IR validation to generate summary scores. Correlation and factor analysis methods were used to evaluate validity and responsiveness of summary scores. Weighted and nonweighted summary scores for the sexually active PISQ-IR demonstrated good criterion validity with condition-specific measures: Incontinence Severity Index = 0.12, 0.11, 0.11; Pelvic Floor Distress Inventory-20 = 0.39, 0.39, 0.12; Epidemiology of Prolapse and Incontinence Questionnaire-Q35 = 0.26 0,.25, 0.40); Female Sexual Functioning Index subscale total score = 0.72, 0.75, 0.72 for nonweighted, ME, and Q summary scores, respectively. Responsiveness evaluation showed weighted and nonweighted summary scores detected moderate effect sizes (Cohen's d > 0.5). Weighted items for those NSA demonstrated significant floor effects and did not meet criterion validity. A PISQ-IR summary score for use with sexually active women, nonweighted or calculated with ME or Q item weights, is a valid and reliable measure for clinical use. The summary scores provide value for assesing clinical treatment of pelvic floor disorders.

  17. Inter- and intra-observer reliability of masking in plantar pressure measurement analysis.

    Science.gov (United States)

    Deschamps, K; Birch, I; Mc Innes, J; Desloovere, K; Matricali, G A

    2009-10-01

    Plantar pressure measurement is an important tool in gait analysis. Manual placement of small masks (masking) is increasingly used to calculate plantar pressure characteristics. Little is known concerning the reliability of manual masking. The aim of this study was to determine the reliability of masking on 2D plantar pressure footprints, in a population with forefoot deformity (i.e. hallux valgus). Using a random repeated-measure design, four observers identified the third metatarsal head on a peak-pressure barefoot footprint, using a small mask. Subsequently, the location of all five metatarsal heads was identified, using the same size of masks and the same protocol. The 2D positional variation of the masks and the peak pressure (PP) and pressure time integral (PTI) values of each mask were calculated. For single-masking the lowest inter-observer reliability was found for the distal-proximal direction, causing a clear, adverse impact on the reliability of the pressure characteristics (PP and PTI). In the medial-lateral direction the inter-observer reliability could be scored as high. Intra-observer reliability was better and could be scored as high or good for both directions, with a correlated improved reliability of the pressure characteristics. Reliability of multi-masking showed a similar pattern, but overall values tended to be lower. Therefore, small sized masking in order to define pressure characteristics in the forefoot should be done with care.

  18. Advances in reliability and system engineering

    CERN Document Server

    Davim, J

    2017-01-01

    This book presents original studies describing the latest research and developments in the area of reliability and systems engineering. It helps the reader identifying gaps in the current knowledge and presents fruitful areas for further research in the field. Among others, this book covers reliability measures, reliability assessment of multi-state systems, optimization of multi-state systems, continuous multi-state systems, new computational techniques applied to multi-state systems and probabilistic and non-probabilistic safety assessment.

  19. Reliability assessment of a peer evaluation instrument in a team-based learning course

    Directory of Open Access Journals (Sweden)

    Wahawisan J

    2016-03-01

    Full Text Available Objective: To evaluate the reliability of a peer evaluation instrument in a longitudinal team-based learning setting. Methods: Student pharmacists were instructed to evaluate the contributions of their peers. Evaluations were analyzed for the variance of the scores by identifying low, medium, and high scores. Agreement between performance ratings within each group of students was assessed via intra-class correlation coefficient (ICC. Results: We found little variation in the standard deviation (SD based on the score means among the high, medium, and low scores within each group. The lack of variation in SD of results between groups suggests that the peer evaluation instrument produces precise results. The ICC showed strong concordance among raters. Conclusions: Findings suggest that our student peer evaluation instrument provides a reliable method for peer assessment in team-based learning settings.

  20. Translation and validation of the Danish version of the postoperative quality of recovery score QoR-15

    DEFF Research Database (Denmark)

    Kleif, J; Edwards, H M; Sort, R

    2015-01-01

    .12 to -0.43, P half reliability was 0.90 and 0.88. Test-retest reliability was 0.99 (95% CI: 0......BACKGROUND: Patient perceived quality of recovery is an important outcome after surgery and should be measured in clinical trials. Quality of recovery after surgery and general anaesthesia can be measured by the QoR-15. A high score indicates a good recovery and the score ranges from 0 to 150...

  1. The revised FLACC score: Reliability and validation for pain assessment in children with cerebral palsy

    DEFF Research Database (Denmark)

    Pedersen, Line Kjeldgaard; Rahbek, Ole; Nikolajsen, Lone

    2015-01-01

    AbstractBackground and aims Pain in children with cerebral palsy (CP) is difficult to assess and is therefore not sufficiently recognized and treated. Children with severe cognitive impairments have an increased risk of neglected postoperative, procedural and chronic pain resulting in decreased...... quality of life. The r-FLACC (revised Face, Legs, Activity, Cry and Consolability) pain score is an internationally acclaimed tool for assessing pain in children with CP because of its ease to use and its use of core pain behaviours. In addition the r-FLACC pain score may be superior to other pain...... of the r-FLACC pain score for use in Danish children with CP. Methods Twenty-seven children aged 3–15 years old with CP were included after orthopaedic surgery. Two methods for assessment of postoperative pain were used. Pain intensity was assessed by r-FLACC, with a 2 min standardized video recording...

  2. Interobserver reliability when using the Van Herick method to measure anterior chamber depth

    Directory of Open Access Journals (Sweden)

    Ahmed Javed

    2017-01-01

    Conclusion: The Van Herick score has a good interobserver reliability for Grades 1 and 4; however, Grades 2 and 3 require further tests such as gonioscopy or ocular coherence tomography. Temporal and nasal scores demonstrated good agreement; therefore, if the nasal score cannot be measured due to nasal bridge size, the temporal can be used as an approximation.

  3. Turkish Version of Kolcaba's Immobilization Comfort Questionnaire: A Validity and Reliability Study.

    Science.gov (United States)

    Tosun, Betül; Aslan, Özlem; Tunay, Servet; Akyüz, Aygül; Özkan, Hüseyin; Bek, Doğan; Açıksöz, Semra

    2015-12-01

    The purpose of this study was to determine the validity and reliability of the Turkish version of the Immobilization Comfort Questionnaire (ICQ). The sample used in this methodological study consisted of 121 patients undergoing lower extremity arthroscopy in a training and research hospital. The validity study of the questionnaire assessed language validity, structural validity and criterion validity. Structural validity was evaluated via exploratory factor analysis. Criterion validity was evaluated by assessing the correlation between the visual analog scale (VAS) scores (i.e., the comfort and pain VAS scores) and the ICQ scores using Spearman's correlation test. The Kaiser-Meyer-Olkin coefficient and Bartlett's test of sphericity were used to determine the suitability of the data for factor analysis. Internal consistency was evaluated to determine reliability. The data were analyzed with SPSS version 15.00 for Windows. Descriptive statistics were presented as frequencies, percentages, means and standard deviations. A p value ≤ .05 was considered statistically significant. A moderate positive correlation was found between the ICQ scores and the VAS comfort scores; a moderate negative correlation was found between the ICQ and the VAS pain measures in the criterion validity analysis. Cronbach α values of .75 and .82 were found for the first and second measurements, respectively. The findings of this study reveal that the ICQ is a valid and reliable tool for assessing the comfort of patients in Turkey who are immobilized because of lower extremity orthopedic problems. Copyright © 2015. Published by Elsevier B.V.

  4. Analysis of the reliability and validity of the Turkish version of the intermittent and constant osteoarthritis pain questionnaire.

    Science.gov (United States)

    Erel, Suat; Şimşek, İbrahim Engin; Özkan, Hüseyin

    2015-01-01

    The aim of this study was to analyze the validity and reliability of the Turkish version (ICOAP-TR) of the intermittent and constant osteoarthritis pain (ICOAP) questionnaire in patients with knee osteoarthritis (OA). Thirty-eight volunteer patients diagnosed with knee OA answered the questionnaire twice with an interval of 2-4 days. The reliability of the measurement was assessed using Cronbach's alpha coefficient and intraclass correlation (ICC) for test-retest reliability. Criterion validity was tested against the Western Ontario and McMaster Universities Arthritis Index (WOMAC) pain score and visual analog scale (VAS) designed to assess the perceived discomfort rated by the patient. Test-retest reliability was found to be ICC=0.942 for total score, 0.902 for constant pain subscale, and 0.945 for intermittent pain subscale. Internal consistency was tested using Cronbach's alpha and was found to be 0.970 for total score, 0.948 for constant pain subscale, and 0.972 for intermittent pain subscale. For criterion validity, the correlation between the total score of ICOAP-TR and WOMAC pain subscale was r=0.779 (p<0.05), and correlation between total score of ICOAP-TR and VAS was r=0.570 (p<0.05). The ICOAP-TR is a reliable and valid instrument to be used with patients with knee OA.

  5. Peer-review for selection of oral presentations for conferences: Are we reliable?

    Science.gov (United States)

    Deveugele, Myriam; Silverman, Jonathan

    2017-11-01

    Although peer-review for journal submission, grant-applications and conference submissions has been called 'a counter- stone of science', and even 'the gold standard for evaluating scientific merit', publications on this topic remain scares. Research that has investigated peer-review reveals several issues and criticisms concerning bias, poor quality review, unreliability and inefficiency. The most important weakness of the peer review process is the inconsistency between reviewers leading to inadequate inter-rater reliability. To report the reliability of ratings for a large international conference and to suggest possible solutions to overcome the problem. In 2016 during the International Conference on Communication in Healthcare, organized by EACH: International Association for Communication in Healthcare, a calibration exercise was proposed and feedback was reported back to the participants of the exercise. Most abstracts, as well as most peer-reviewers, receive and give scores around the median. Contrary to the general assumption that there are high and low scorers, in this group only 3 peer-reviewers could be identified with a high mean, while 7 has a low mean score. Only 2 reviewers gave only high ratings (4 and 5). Of the eight abstracts included in this exercise, only one abstract received a high mean score and one a low mean score. Nevertheless, both these abstracts received both low and high scores; all other abstracts received all possible scores. Peer-review of submissions for conferences are, in accordance with the literature, unreliable. New and creative methods will be needed to give the participants of a conference what they really deserve: a more reliable selection of the best abstracts. More raters per abstract improves the inter-rater reliability; training of reviewers could be helpful; providing feedback to reviewers can lead to less inter-rater disagreement; fostering negative peer-review (rejecting the inappropriate submissions) rather than a

  6. Reliability and concurrent validity of the Dutch hip and knee replacement expectations surveys.

    Science.gov (United States)

    van den Akker-Scheek, Inge; van Raay, Jos J A M; Reininga, Inge H F; Bulstra, Sjoerd K; Zijlstra, Wiebren; Stevens, Martin

    2010-10-19

    Preoperative expectations of outcome of total hip and knee arthroplasty are important determinants of patients' satisfaction and functional outcome. Aims of the study were (1) to translate the Hospital for Special Surgery Hip Replacement Expectations Survey and Knee Replacement Expectations Survey into Dutch and (2) to study test-retest reliability and concurrent validity. Patients scheduled for total hip (N = 112) or knee replacement (N = 101) were sent the Dutch Expectations Surveys twice with a 2 week interval to determine test-retest reliability. To determine concurrent validity, the Expectation WOMAC was sent. The results for the Dutch Hip Replacement Expectations Survey revealed good test-retest reliability (ICC 0.87), no bias and good internal consistency (alpha 0.86) (N = 72). The correlation between the Hip Expectations Score and the Expectation WOMAC score was 0.59 (N = 86). The results for the Dutch Knee Replacement Expectations Survey revealed good test-retest reliability (ICC 0.79), no bias and good internal consistency (alpha 0.91) (N = 46). The correlation with the Expectation WOMAC score was 0.52 (N = 57). Both Dutch Expectations Surveys are reliable instruments to determine patients' expectations before total hip or knee arthroplasty. As for concurrent validity, the correlation between both surveys and the Expectation WOMAC was moderate confirming that the same construct was determined. However, patients scored systematically lower on the Expectation WOMAC compared to the Dutch Expectation Surveys. Research on patients' expectations before total hip and knee replacement has only been performed in a limited amount of countries. With the Dutch Expectations Surveys it is now possible to determine patients' expectations in another culture and healthcare setting.

  7. Evaluation of airway protection: Quantitative timing measures versus penetration/aspiration score.

    Science.gov (United States)

    Kendall, Katherine A

    2017-10-01

    Quantitative measures of swallowing function may improve the reliability and accuracy of modified barium swallow (MBS) study interpretation. Quantitative study analysis has not been widely instituted, however, secondary to concerns about the time required to make measures and a lack of research demonstrating impact on MBS interpretation. This study compares the accuracy of the penetration/aspiration (PEN/ASP) scale (an observational visual-perceptual assessment tool) to quantitative measures of airway closure timing relative to the arrival of the bolus at the upper esophageal sphincter in identifying a failure of airway protection during deglutition. Retrospective review of clinical swallowing data from a university-based outpatient clinic. Swallowing data from 426 patients were reviewed. Patients with normal PEN/ASP scores were identified, and the results of quantitative airway closure timing measures for three liquid bolus sizes were evaluated. The incidence of significant airway closure delay with and without a normal PEN/ASP score was determined. Inter-rater reliability for the quantitative measures was calculated. In patients with a normal PEN/ASP score, 33% demonstrated a delay in airway closure on at least one swallow during the MBS study. There was no correlation between PEN/ASP score and airway closure delay. Inter-rater reliability for the quantitative measure of airway closure timing was nearly perfect (intraclass correlation coefficient = 0.973). The use of quantitative measures of swallowing function, in conjunction with traditional visual perceptual methods of MBS study interpretation, improves the identification of airway closure delay, and hence, potential aspiration risk, even when no penetration or aspiration is apparent on the MBS study. 4. Laryngoscope, 127:2314-2318, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.

  8. End-stage dementia spark of life: reliability and validity of the "GATOS" questionnaire.

    Science.gov (United States)

    Tsoucalas, Gregory; Bourelia, Stamati; Kalogirou, Vaso; Giatsiou, Styliani; Mavrogiannaki, Eirini; Gatos, Georgios; Galanos, Antonis; Repana, Olga; Iliadou, Eleni; Antoniou, Antonis; Sgantzos, Markos; Gatos, Konstantinos

    2015-01-01

    Fl oor effects are present in most dementia assessment tools as dementia progresses and the in-depth assessment of patients considered more or less on vegetative state is questionable. To develop a questionnaire (the "Gatos Clinical Test-GCT") for the assessment of end-stage demented patients. Five hundred patients with dementia of various causes and an MMSE score between 0 and 2 were enrolled in the study. The GCT consists of 14 closed type questions rated on a Likert scale. The total score is used to evaluate patient's dementia. Various aspects of validity and reliability (including face, content and structural validity as well as test-retest reliability) were examined. Three subscales "Autonomy/Alertness", "Gnosias" and "Somatokinetic function" were defined, with a Cronbach equal to 0.851, 0.756 and 0.598 respectively. The GCT subscales and total score were statistically significant higher in patients with MMSE score 1 or 2 compared with those with MMSE score 0 (pGATOS" questionnaire is a valid and reliable test for patients with severe dementia, aiming at identification of those patients who could sustain some quality of life. It is a relatively short and easy to administer tool. As dementia prevalence is expected to rise further worldwide we believe that GCT could offer valuable services to health professionals, caregivers and patients.

  9. Review of Industrial Applications of Structural Reliability Theory

    DEFF Research Database (Denmark)

    Thoft-Christensen, Palle

    For the last two decades we have seen an increasing interest in applying structural reliability theory to many different industries. However, the number of real practical applications is much smaller than what one would expect.......For the last two decades we have seen an increasing interest in applying structural reliability theory to many different industries. However, the number of real practical applications is much smaller than what one would expect....

  10. A Reliable and Valid Weighted Scoring Instrument for Use in Grading APA-Style Empirical Research Report

    Science.gov (United States)

    Greenberg, Kathleen Puglisi

    2012-01-01

    The scoring instrument described in this article is based on a deconstruction of the seven sections of an American Psychological Association (APA)-style empirical research report into a set of learning outcomes divided into content-, expression-, and format-related categories. A double-weighting scheme used to score the report yields a final grade…

  11. Interrater reliability of the Melbourne Assessment of Unilateral Upper Limb Function for children with hemiplegic cerebral palsy.

    LENUS (Irish Health Repository)

    Spirtos, Michelle

    2012-02-01

    OBJECTIVE: We examined the interrater reliability of the Melbourne Assessment of Unilateral Upper Limb Function. METHOD: Three occupational therapists independently scored 34 videotaped assessments of children with hemiplegic cerebral palsy aged 6 yr, 1 mo, to 14 yr, 5 mo. Intraclass correlation coefficients (ICCs) at a 95% confidence interval were calculated for total scores, category scores, and item scores. RESULTS: The correlation between raters\\' total scores was high (ICC = .961). The highest correlation for test components between raters was found for fluency (ICC = .902), followed by range of movement (ICC = .866), and the lowest correlation was found for quality of movement (ICC = .683). The ICCs for individual test item scores varied and ranged from .368 to .899. CONCLUSION: This study demonstrated high interrater reliability for total scores, with scoring of some individual components and items requiring further consideration from both a clinical and a research perspective.

  12. Applying Computerized-Scoring Models of Written Biological Explanations across Courses and Colleges: Prospects and Limitations

    Science.gov (United States)

    Ha, Minsu; Nehm, Ross H.; Urban-Lurain, Mark; Merrill, John E.

    2011-01-01

    Our study explored the prospects and limitations of using machine-learning software to score introductory biology students' written explanations of evolutionary change. We investigated three research questions: 1) Do scoring models built using student responses at one university function effectively at another university? 2) How many human-scored…

  13. The inter-rater reliability and prognostic value of coma scales in Nepali children with acute encephalitis syndrome.

    Science.gov (United States)

    Ray, Stephen; Rayamajhi, Ajit; Bonnett, Laura J; Solomon, Tom; Kneen, Rachel; Griffiths, Michael J

    2018-02-01

    Background Acute encephalitis syndrome (AES) is a common cause of coma in Nepali children. The Glasgow coma scale (GCS) is used to assess the level of coma in these patients and predict outcome. Alternative coma scales may have better inter-rater reliability and prognostic value in encephalitis in Nepali children, but this has not been studied. The Adelaide coma scale (ACS), Blantyre coma scale (BCS) and the Alert, Verbal, Pain, Unresponsive scale (AVPU) are alternatives to the GCS which can be used. Methods Children aged 1-14 years who presented to Kanti Children's Hospital, Kathmandu with AES between September 2010 and November 2011 were recruited. All four coma scales (GCS, ACS, BCS and AVPU) were applied on admission, 48 h later and on discharge. Inter-rater reliability (unweighted kappa) was measured for each. Correlation and agreement between total coma score and outcome (Liverpool outcome score) was measured by Spearman's rank and Bland-Altman plot. The prognostic value of coma scales alone and in combination with physiological variables was investigated in a subgroup (n = 22). A multivariable logistic regression model was fitted by backward stepwise. Results Fifty children were recruited. Inter-rater reliability using the variables scales was fair to moderate. However, the scales poorly predicted clinical outcome. Combining the scales with physiological parameters such as systolic blood pressure improved outcome prediction. Conclusion This is the first study to compare four coma scales in Nepali children with AES. The scales exhibited fair to moderate inter-rater reliability. However, the study is inadequately powered to answer the question on the relationship between coma scales and outcome. Further larger studies are required.

  14. Oswestry Disability Index scoring made easy.

    Science.gov (United States)

    Mehra, A; Baker, D; Disney, S; Pynsent, P B

    2008-09-01

    Low back pain effects up to 80% of the population at some time during their active life. Questionnaires are available to help measure pain and disability. The Oswestry Disability Index (ODI) is the most commonly used outcome measure for low back pain. The aim of this study was to see if training in completing the ODI forms improved the scoring accuracy. The last 100 ODI forms completed in a hospital's spinal clinic were reviewed retrospectively and errors in the scoring were identified. Staff members involved in scoring the questionnaire were made aware of the errors and the correct method of scoring explained. A chart was created with all possible scores to aid the staff with scoring. A prospective audit on 50 questionnaires was subsequently performed. The retrospective study showed that 33 of the 100 forms had been incorrectly scored. All questionnaires where one or more sections were not completed by the patient were incorrectly scored. A scoring chart was developed and staff training was implemented. This reduced the error rate to 14% in the prospective audit. Clinicians applying outcome measures should read the appropriate literature to ensure they understand the scoring system. Staff must then be given adequate training in the application of the questionnaires.

  15. Content and Construct Validity, Reliability, and Responsiveness of the Rheumatoid Arthritis Flare Questionnaire

    DEFF Research Database (Denmark)

    Bartlett, Susan J; Barbic, Skye P; Bykerk, Vivian P

    2017-01-01

    -FQ), and the voting results at OMERACT 2016. METHODS: Classic and modern psychometric methods were used to assess reliability, validity, sensitivity, factor structure, scoring, and thresholds. Interviews with patients and clinicians also assessed content validity, utility, and meaningfulness of RA-FQ scores. RESULTS......: People with RA in observational trials in Canada (n = 896) and France (n = 138), and an RCT in the Netherlands (n = 178) completed 5 items (11-point numerical rating scale) representing RA Flare core domains. There was moderate to high evidence of reliability, content and construct validity...... to identify and measure RA flares. Its review through OMERACT Filter 2.0 shows evidence of reliability, content and construct validity, and responsiveness. These properties merit its further validation as an outcome for clinical trials....

  16. Differential reliability : probabilistic engineering applied to wood members in bending-tension

    Science.gov (United States)

    Stanley K. Suddarth; Frank E. Woeste; William L. Galligan

    1978-01-01

    Reliability analysis is a mathematical technique for appraising the design and materials of engineered structures to provide a quantitative estimate of probability of failure. Two or more cases which are similar in all respects but one may be analyzed by this method; the contrast between the probabilities of failure for these cases allows strong analytical focus on the...

  17. Assessment of the reliability of ultrasonic inspection methods

    International Nuclear Information System (INIS)

    Haines, N.F.; Langston, D.B.; Green, A.J.; Wilson, R.

    1982-01-01

    The reliability of NDT techniques has remained an open question for many years. A reliable technique may be defined as one that, when rigorously applied by a number of inspection teams, consistently finds then correctly sizes all defects of concern. In this paper we report an assessment of the reliability of defect detection by manual ultrasonic methods applied to the inspection of thick section pressure vessel weldments. Initially we consider the available data relating to the inherent physical capabilities of ultrasonic techniques to detect cracks in weldment and then, independently, we assess the likely variability in team to team performance when several teams are asked to follow the same specified test procedure. The two aspects of 'capability' and 'variability' are brought together to provide quantitative estimates of the overall reliability of ultrasonic inspection of thick section pressure vessel weldments based on currently existing data. The final section of the paper considers current research programmes on reliability and presents a view on how these will help to further improve NDT reliability. (author)

  18. The Introduction of Adult Appendicitis Score Reduced Negative Appendectomy Rate.

    Science.gov (United States)

    Sammalkorpi, H E; Mentula, P; Savolainen, H; Leppäniemi, A

    2017-09-01

    Implementation of a clinical risk score into diagnostics of acute appendicitis may provide accurate diagnosis with selective use of imaging studies. The aim of this study was to prospectively validate recently described diagnostic scoring system, Adult Appendicitis Score, and evaluate its effects on negative appendectomy rate. Adult Appendicitis Score stratifies patients into three groups: high, intermediate, and low risk of appendicitis. The score was implemented in diagnostics of adult patients suspected of acute appendicitis in two university hospitals. We analyzed the effects of Adult Appendicitis Score on diagnostic accuracy, imaging studies, and treatment. The study population was compared with a reference population of 829 patients suspected of acute appendicitis originally enrolled for the study of construction of the Adult Appendicitis Score. This study enrolled 908 patients of whom 432 (48%) had appendicitis. The score stratified 49% of all appendicitis patients into high-risk group with specificity of 93.3%. In the low-risk group, prevalence of appendicitis was 7%. The histologically confirmed negative appendectomy rate decreased from 18.2% to 8.7%, pAppendicitis Score is a reliable tool for stratification of patients into selective imaging, which results in low negative appendectomy rate.

  19. Reliability measures for indexed semi-Markov chains applied to wind energy production

    International Nuclear Information System (INIS)

    D'Amico, Guglielmo; Petroni, Filippo; Prattico, Flavio

    2015-01-01

    The computation of the dependability measures is a crucial point in many engineering problems as well as in the planning and development of a wind farm. In this paper we address the issue of energy production by wind turbines by using an indexed semi-Markov chain as a model of wind speed. We present the mathematical model, the data and technical characteristics of a commercial wind turbine (Aircon HAWT-10kW). We show how to compute some of the main dependability measures such as reliability, availability and maintainability functions. We compare the results of the model with real energy production obtained from data available in the Lastem station (Italy) and sampled every 10 min. - Highlights: • Semi-Markov models. • Time series generation of wind speed. • Computation of availability, reliability and maintainability.

  20. Validation of the Foot and Ankle Outcome Score in adult acquired flatfoot deformity.

    Science.gov (United States)

    Mani, Sriniwasan B; Brown, Haydée C; Nair, Pallavi; Chen, Lan; Do, Huong T; Lyman, Stephen; Deland, Jonathan T; Ellis, Scott J

    2013-08-01

    The American Orthopaedic Foot and Ankle Society (AOFAS) Ankle-Hindfoot Score has been under recent scrutiny. The Foot and Ankle Outcome Score (FAOS) is an alternative subjective survey, assessing outcomes in 5 subscales. It is validated for lateral ankle instability and hallux valgus patients. The aim of our study was to validate the FAOS for assessing outcomes in flexible adult acquired flatfoot deformity (AAFD). Patients from the authors' institution diagnosed with flexible AAFD from 2006 to 2011 were eligible for the study. In all, 126 patients who completed the FAOS and the Short-Form 12 (SF-12) on the same visit were included in the construct validity component. Correlation was deemed moderate if the Spearman's correlation coefficient was .4 to .7. Content validity was assessed in 63 patients by a questionnaire that asked patients to rate the relevance of each FAOS question, with a score of 2 or greater considered acceptable. Reliability was measured using intraclass correlation coefficients (ICCs) in 41 patients who completed a second FAOS survey. In 49 patients, preoperative and postoperative FAOS scores were compared to determine responsiveness. All of the FAOS subscales demonstrated moderate correlation with 2 physical health related SF-12 domains. Mental health related domains showed poor correlation. Content validity was high for the Quality of Life (QoL; mean 2.26) and Sports/Recreation subscales (mean 2.12). All subscales exhibited very good test-retest reliability, with ICCs of .7 and above. Symptoms, QoL, pain, and daily activities (ADLs) were responsive to change in postoperative patients (P validated the FAOS for AAFD with acceptable construct and content validity, reliability, and responsiveness. Given its previous validation for patients with ankle instability and hallux valgus, the additional findings in this study support its use as an alternative to less reliable outcome surveys. Level II, prospective comparative study.

  1. Cumulative trauma disorders in the upper extremities: reliability of the postural and repetitive risk-factors index.

    Science.gov (United States)

    James, C P; Harburn, K L; Kramer, J F

    1997-08-01

    This study addresses test-retest reliability of the Postural and Repetitive Risk-Factors Index (PRRI) for work-related upper body injuries. This assessment was developed by the present authors. A repeated measures design was used to assess the test-retest reliability of a videotaped work-site assessment of subjects' movements. Ten heavy users of video display terminals (VDTs) from a local banking industry participated in the study. The 10 subjects' movements were videotaped for 2 hours on each of 2 separate days, while working on-site at their VDTs. The videotaped assessment, which utilized known postural risk factors for developing musculoskeletal disorder, pain, and discomfort in heavy VDT users (ie, repetitiveness, awkward and static postures, and contraction time), was called the PRRI. The videotaped movement assessments were subsequently analyzed in 15-minute sessions (five sessions per 2-hour videotape, which produced a total of 10 sessions over the 2 testing days), and each session was chosen randomly from the videotape. The subjects' movements were given a postural risk score according to the criteria in the PRRI. Each subject was therefore tested a total of 10 times (ie, 10 sessions), over two days. The maximum PRRI score for both sides of the body was 216 points. Reliability coefficients (RCs) for the PRRI scores were calculated, and the reliability of any one session met the minimum criterion for excellent reliability, which was .75. A two-way analysis of variance (ANOVA) confirmed that there was no statistically significant difference between sessions (p < .05). Calculations using the standard error of measurement (SEM) indicated that an individual tested once, on one day and with a PRRI score of 25, required a change of at least 8 points in order to be confident that a true change in score had occurred. The significant results from the reliability tests indicated that the PRRI was a reliable measurement tool that could be used by occupational health

  2. Effects of measurement method and transcript availability on inexperienced raters' stuttering frequency scores.

    Science.gov (United States)

    Chakraborty, Nalanda; Logan, Kenneth J

    To examine the effects of measurement method and transcript availability on the accuracy, reliability, and efficiency of inexperienced raters' stuttering frequency measurements. 44 adults, all inexperienced at evaluating stuttered speech, underwent 20 min of preliminary training in stuttering measurement and then analyzed a series of sentences, with and without access to transcripts of sentence stimuli, using either a syllable-based analysis (SBA) or an utterance-based analysis (UBA). Participants' analyses were compared between groups and to a composite analysis from two experienced evaluators. Stuttering frequency scores from the SBA and UBA groups differed significantly from the experienced evaluators' scores; however, UBA scores were significantly closer to the experienced evaluators' scores and were completed significantly faster than the SBA scores. Transcript availability facilitated scoring accuracy and efficiency in both groups. The internal reliability of stuttering frequency scores was acceptable for the SBA and UBA groups; however, the SBA group demonstrated only modest point-by-point agreement with ratings from the experienced evaluators. Given its accuracy and efficiency advantages over syllable-based analysis, utterance-based fluency analysis appears to be an appropriate context for introducing stuttering frequency measurement to raters who have limited experience in stuttering measurement. To address accuracy gaps between experienced and inexperienced raters, however, use of either analysis must be supplemented with training activities that expose inexperienced raters to the decision-making processes used by experienced raters when identifying stuttered syllables. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. The use of reliability analysis techniques applied to nuclear power station emergency core cooling systems

    International Nuclear Information System (INIS)

    Danielsen, A.; Snaith, E.R.

    1975-01-01

    A reliability investigation carried out by the Safety and Reliability Services of the UKAEA, and the SSEB, of the essential system/reactor coolant system for a large nuclear power station is described. In AGR type reactors, after all reactor shutdown conditions, it is necessary to restore forced gas circulation and sufficient boiler feed to maintain the heat removal capacity of the boilers. The coolant requirements are provided by several independent mechanical systems of primary coolant fans, feedwater pumps, and valves integrated with electrical power sources, switchgear, and automatic control equipment. Reliability is treated as one aspect of system performance and quantified in terms of failure to meet a specific objective. Based on the reliability performance of the constituent components the optimum system configuration is determined together with the preferred plant operating procedures and maintenance requirements. (author)

  4. The reliability and validity of fatigue measures during short-duration maximal-intensity intermittent cycling.

    Science.gov (United States)

    Glaister, Mark; Stone, Michael H; Stewart, Andrew M; Hughes, Michael; Moir, Gavin L

    2004-08-01

    The purpose of the present study was to assess the reliability and validity of fatigue measures, as derived from 4 separate formulae, during tests of repeat sprint ability. On separate days over a 3-week period, 2 groups of 7 recreationally active men completed 6 trials of 1 of 2 maximal (20 x 5 seconds) intermittent cycling tests with contrasting recovery periods (10 or 30 seconds). All trials were conducted on a friction-braked cycle ergometer, and fatigue scores were derived from measures of mean power output for each sprint. Apart from formula 1, which calculated fatigue from the percentage difference in mean power output between the first and last sprint, all remaining formulae produced fatigue scores that showed a reasonably good level of test-retest reliability in both intermittent test protocols (intraclass correlation range: 0.78-0.86; 95% likely range of true values: 0.54-0.97). Although between-protocol differences in the magnitude of the fatigue scores suggested good construct validity, within-protocol differences highlighted limitations with each formula. Overall, the results support the use of the percentage decrement score as the most valid and reliable measure of fatigue during brief maximal intermittent work.

  5. Optimization of Reliability Centered Maintenance Bassed on Maintenance Costs and Reliability with Consideration of Location of Components

    Directory of Open Access Journals (Sweden)

    Mahdi Karbasian

    2011-03-01

    Full Text Available The reliability of designing systems such as electrical and electronic circuits, power generation/ distribution networks and mechanical systems, in which the failure of a component may cause the whole system failure, and even the reliability of cellular manufacturing systems that their machines are connected to as series are critically important. So far approaches for improving the reliability of these systems have been mainly based on the enhancement of inherent reliability of any system component or increasing system reliability based on maintenance strategies. Also in some of the resources, only the influence of the location of systems' components on reliability is studied. Therefore, it seems other approaches have been rarely applied. In this paper, a multi criteria model has been proposed to perform a balance among a system's reliability, location costs, and its system maintenance. Finally, a numerical example has been presented and solved by the Lingo software.

  6. REPRODUCIBILITY OF THE MODIFIED STAR EXCURSION BALANCE TEST COMPOSITE AND SPECIFIC REACH DIRECTION SCORES.

    Science.gov (United States)

    van Lieshout, Remko; Reijneveld, Elja A E; van den Berg, Sandra M; Haerkens, Gijs M; Koenders, Niek H; de Leeuw, Arina J; van Oorsouw, Roel G; Paap, Davy; Scheffer, Else; Weterings, Stijn; Stukstette, Mirelle J

    2016-06-01

    The mSEBT is a screening tool used to evaluate dynamic balance. Most research investigating measurement properties focused on intrarater reliability and was done in small samples. To know whether the mSEBT is useful to discriminate dynamic balance between persons and to evaluate changes in dynamic balance, more research into intra- and interrater reliability and smallest detectable change (synonymous with minimal detectable change) is needed. To estimate intra- and interrater reliability and smallest detectable change of the mSEBT in adults at risk for ankle sprain. Cross-sectional, test-retest design. Fifty-five healthy young adults participating in sports at risk for ankle sprain participated (mean ± SD age, 24.0 ± 2.9 years). Each participant performed three test sessions within one hour and was rated by two physical therapists (session 1, rater 1; session 2, rater 2; session 3, rater 1). Participants and raters were blinded for previous measurements. Normalized composite and reach direction scores for the right and left leg were collected. Analysis of variance was used to calculate intraclass correlation coefficient values for intra- and interrater reliability. Smallest detectable change values were calculated based on the standard error of measurement. Intra- and interrater reliability for both legs was good to excellent (intraclass correlation coefficient ranging from 0.87 to 0.94). The intrarater smallest detectable change for the composite score of the right leg was 7.2% and for the left 6.2%. The interrater smallest detectable change for the composite score of the right leg was 6.9% and for the left 5.0%. The mSEBT is a reliable measurement instrument to discriminate dynamic balance between persons. Most smallest detectable change values of the mSEBT appear to be large. More research is needed to investigate if the mSEBT is usable for evaluative purposes. Level 2.

  7. Body Movement Music Score – Introduction of a newly developed model for the analysis and description of body qualities, movement and music in music therapy

    Directory of Open Access Journals (Sweden)

    Hanna Agnieszka Skrzypek

    2017-01-01

    Full Text Available Background In music therapy, there is a range of music therapy concepts that, in addition to music, describe and analyse the body and movement. A model that equally examines the body, movement and music has not been developed. The Body Movement Music Score (BMMS is a newly developed and evaluated music therapy model for analysing body qualities, movement, playing style of musical instruments and music and to describe body behaviour and body expression, movement behaviour and movement expression, playing behaviour and musical expression in music therapy treatment. The basis for the development of the Body Movement Music Score was the evaluation of the analytical movement model Emotorics-Emotive Body Movement Mind Paradigm (Emotorics-EBMMP by Yona Shahar Levy for the analysis and description of the emotive-motor behaviour and movement expression of schizophrenic patients in music therapy treatment. Participants and procedure The application of the Body Movement Music Score is presented in a videotaped example from the music therapy treatment of one schizophrenic patient. Results The results of applying the Body Movement Music Score are presented in the form of Body Qualities I Analysis, Body Qualities II Analysis, Movement Analysis, Playing Style Analysis and Music Analysis Profiles. Conclusions The Body Movement Music Score has been developed and evaluated for the music therapy treatment of schizophrenic patients. For the development of the model, a proof of reliability is necessary to verify the reliability and limitations of the model in practice and show that the Body Movement Music Score could be used for both practical and clinical work, for documentation purposes and to impact research in music therapy.

  8. Validity and Reliability of Asbestos Knowledge and Awareness Questionnaire for Environmental Asbestos Exposure in Rural Areas

    Directory of Open Access Journals (Sweden)

    Selma Metintaş

    2017-04-01

    Full Text Available Objective: There is no treatment for asbestos–related diseases, but they can be prevented. One of the first interventions is to improve the knowledge level of people in order to protect people from asbestos and asbestos–related diseases. The present study was conducted to develop a questionnaire for measuring the knowledge and awareness level of asbestos and also assess its validity and reliability in a rural population that is exposed to asbestos environmentally. Methods: A questionnaire, interviewer–administered, that included 37 items was employed on a convenient sample consisting of adult persons who attended a tertiary teaching hospital in Eskişehir where asbestos exposure is widespread in its rural areas. After assessment of validity and reliability of the results, the questionnaire was refined to 19 items and one subscale. Results: A total of 760 participants were included in this study. The mean age of participants was 53.2±15.1 years and 51.6% of them were male. The discrimination and difficulty indices of the asbestos knowledge and awareness questionnaire ranged between 20.0–60.5% and 0.39–0.98, respectively. Cronbach’s alpha coefficient was 0.951 for overall items. The median (min–max and mean (SD score of the study population were 30 (19–56 and 33.9 (11.9, respectively. The score increased correspondingly with greater knowledge levels. Conclusion: This questionnaire is a practical and easy tool to apply with acceptable reliability and validity on high-risk adults in rural areas with environmental asbestos exposure.

  9. Reliability Generalization: An Examination of the Positive Affect and Negative Affect Schedule

    Science.gov (United States)

    Leue, Anja; Lange, Sebastian

    2011-01-01

    The assessment of positive affect (PA) and negative affect (NA) by means of the Positive Affect and Negative Affect Schedule has received a remarkable popularity in the social sciences. Using a meta-analytic tool--namely, reliability generalization (RG)--population reliability scores of both scales have been investigated on the basis of a random…

  10. Reliability of the Balance Evaluation Systems Test (BESTest) and BESTest sections for adults with hemiparesis

    Science.gov (United States)

    Rodrigues, Letícia C.; Marques, Aline P.; Barros, Paula B.; Michaelsen, Stella M.

    2014-01-01

    BACKGROUND: The Balance Evaluation Systems Test (BESTest) was recently created to allow the development of treatments according to the specific balance system affected in each patient. The Brazilian version of the BESTest has not been specifically tested after stroke. OBJECTIVE: To evaluate the intra- and inter-rater reliability and concurrent and convergent validity of the total score of the BESTest and BESTest sections for adults with hemiparesis after stroke. METHOD: The study included 16 subjects (61.1±7.5 years) with chronic hemiparesis (54.5±43.5 months after stroke). The BESTest was administered by two raters in the same week and one of the raters repeated the test after a one-week interval. Intraclass correlation coefficient (ICC) was calculated to assess intra- and interrater reliability. Concurrent validity with the Berg Balance Scale (BBS) and convergent validity with the Activities-specific Balance Confidence scale (ABC-Brazil) were assessed using Pearson's correlation coefficient. RESULTS: Both the BESTest total score (ICC=0.98) and the BESTest sections (ICC between 0.85 and 0.96) have excellent intrarater reliability. Interrater reliability for the total score was excellent (ICC=0.93) and, for the sections, it ranged between 0.71 and 0.94. The correlation coefficient between the BESTest and the BBS and ABC-Brazil were 0.78 and 0.59, respectively. CONCLUSIONS: The Brazilian version of the BESTest demonstrated adequate reliability when measured by sections and could identify what balance system was affected in patients after stroke. Concurrent validity was excellent with the BBS total score and good to excellent with the sections. The total scores but not the sections present adequate convergent validity with the ABC-Brazil. However, other psychometric properties should be further investigated. PMID:25003281

  11. The reliability and validity of the Turkish version of Fullerton Advanced Balance (FAB-T) scale.

    Science.gov (United States)

    Iyigun, Gozde; Kirmizigil, Berkiye; Angin, Ender; Oksuz, Sevim; Can, Filiz; Eker, Levent; Rose, Debra J

    2018-06-04

    The aim of this study was to evaluate the reliability and validity of the Turkish version of the FAB(FAB-T) scale in the older Turkish adults. The reliability and validity of the scale was tested on 200 community-dwelling older adults. FAB-T scale was scored by different physiotherapists on different days to evaluate inter-rater and intrarater reliability. The Berg Balance Scale (BBS) was used for the evaluation of convergent validity, and the content validity of the FAB-T scale was investigated. The FAB-T scale showed very high inter- and intra-rater reliability. For inter-rater agreement, on the individual test items and total score ICC values were 0.92 (95 %CI; 0.90-0.94) and 0.96 (95% CI; 0.95-0.97) respectively. The intra-rater agreement, on the individual test items and total score ICC values were 0.93 (95 %CI; 0.91- 0.95) and 0.96 (95% CI; 0.95- 0.97) respectively. There was a good agreement between the FAB-T and BBS scales. A high correlation was found between the BBS and FAB-T scales [rho = 0.70 (%95 CI; 0.62-0.76)] indicating good convergent validity. Considering the content validity of the FAB-T scale, no floor (floor score: 0%) or ceiling (ceiling score: 6.5%) effect was detected. The FAB-T scale was successfully translated from the original English version (FAB) and demonstrated strong psychometric features. It was found that the FAB-T scale has very high inter-rater and intra-rater reliability. Considering the convergent validity, the scale has high correlation with the BBS. The FAB-T has no floor and ceiling effect. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Joint interval reliability for Markov systems with an application in transmission line reliability

    International Nuclear Information System (INIS)

    Csenki, Attila

    2007-01-01

    We consider Markov reliability models whose finite state space is partitioned into the set of up states U and the set of down states D . Given a collection of k disjoint time intervals I l =[t l ,t l +x l ], l=1,...,k, the joint interval reliability is defined as the probability of the system being in U for all time instances in I 1 union ... union I k . A closed form expression is derived here for the joint interval reliability for this class of models. The result is applied to power transmission lines in a two-state fluctuating environment. We use the Linux versions of the free packages Maxima and Scilab in our implementation for symbolic and numerical work, respectively

  13. Reliability of electronic systems

    International Nuclear Information System (INIS)

    Roca, Jose L.

    2001-01-01

    Reliability techniques have been developed subsequently as a need of the diverse engineering disciplines, nevertheless they are not few those that think they have been work a lot on reliability before the same word was used in the current context. Military, space and nuclear industries were the first ones that have been involved in this topic, however not only in these environments it is that it has been carried out this small great revolution in benefit of the increase of the reliability figures of the products of those industries, but rather it has extended to the whole industry. The fact of the massive production, characteristic of the current industries, drove four decades ago, to the fall of the reliability of its products, on one hand, because the massively itself and, for other, to the recently discovered and even not stabilized industrial techniques. Industry should be changed according to those two new requirements, creating products of medium complexity and assuring an enough reliability appropriated to production costs and controls. Reliability began to be integral part of the manufactured product. Facing this philosophy, the book describes reliability techniques applied to electronics systems and provides a coherent and rigorous framework for these diverse activities providing a unifying scientific basis for the entire subject. It consists of eight chapters plus a lot of statistical tables and an extensive annotated bibliography. Chapters embrace the following topics: 1- Introduction to Reliability; 2- Basic Mathematical Concepts; 3- Catastrophic Failure Models; 4-Parametric Failure Models; 5- Systems Reliability; 6- Reliability in Design and Project; 7- Reliability Tests; 8- Software Reliability. This book is in Spanish language and has a potentially diverse audience as a text book from academic to industrial courses. (author)

  14. Long term test-retest reliability of Oswestry Disability Index in male office workers.

    Science.gov (United States)

    Irmak, Rafet; Baltaci, Gul; Ergun, Nevin

    2015-01-01

    The Oswestry Disability Index (ODI) is one of the most common condition specific outcome measures used in the management of spinal disorders. But there is insufficient study on healthy populations and long term test-retest reliability. This is important because healthy populations are often used for control groups in low back pain interventions, and knowing the reliability of the controls affects the interpretation of the findings of these studies. The purpose of this study is to determine the long term test-retest reliability of ODI in office workers. Participants who have no chronic low back pain history were included in study. Subjects were assessed by the Turkish-ODI 2.0 (e-forms) on 1st, 2nd, 4th, 8th, 15th, 30th days to determine the stability of ODI scores over time. The study began with 58 (12 female, 46 male) participants. 36 (3 female, 33 male) participated for the full 30 days. Kolmogorov-Smirnov and Friedman tests were used. Test-retest reliability was evaluated by using nonparametric statistics. All tests were done by using SPSS-11. There was no statistically significant difference among the median scores of each day. (χ= 6.482, p >  0.05). The difference between median score of the days with 1st day was neither statistically nor clinically significant. ODI has long term test re-test reliability in healthy subjects over a 1 month time interval.

  15. Integrating reliability analysis and design

    International Nuclear Information System (INIS)

    Rasmuson, D.M.

    1980-10-01

    This report describes the Interactive Reliability Analysis Project and demonstrates the advantages of using computer-aided design systems (CADS) in reliability analysis. Common cause failure problems require presentations of systems, analysis of fault trees, and evaluation of solutions to these. Results have to be communicated between the reliability analyst and the system designer. Using a computer-aided design system saves time and money in the analysis of design. Computer-aided design systems lend themselves to cable routing, valve and switch lists, pipe routing, and other component studies. At EG and G Idaho, Inc., the Applicon CADS is being applied to the study of water reactor safety systems

  16. How do cognitively impaired elderly patients define "testament": reliability and validity of the testament definition scale.

    Science.gov (United States)

    Heinik, J; Werner, P; Lin, R

    1999-01-01

    The testament definition scale (TDS) is a specifically designed six-item scale aimed at measuring the respondent's capacity to define "testament." We assessed the reliability and validity of this new short scale in 31 community-dwelling cognitively impaired elderly patients. Interrater reliability for the six items ranged from .87 to .97. The interrater reliability for the total score was .77. Significant correlations were found between the TDS score and the Mini-Mental State Examination (MMSE) and the Cambridge Cognitive Examination scores (r = .71 and .72 respectively, p = .001). Criterion validity yielded significantly different means for subjects with MMSE scores of 24-30 and 0-23: mean 3.9 and 1.6 respectively (t(20) = 4.7, p = .001). Using a cutoff point of 0-2 vs. 3+, 79% of the subjects were correctly classified as severely cognitively impaired, with only 8.3% false positives, and a positive predictive value of 94%. Thus, TDS was found both reliable and valid. This scale, however, is not synonymous with testamentary capacity. The discussion deals with the methodological limitations of this study, and highlights the practical as well as the theoretical relevance of TDS. Future studies are warranted to elucidate the relationships between TDS and existing legal requirements of testamentary capacity.

  17. Test-Retest Reliability and Minimal Detectable Change of Randomized Dichotic Digits in Learning-Disabled Children: Implications for Dichotic Listening Training.

    Science.gov (United States)

    Mahdavi, Mohammad Ebrahim; Pourbakht, Akram; Parand, Akram; Jalaie, Shohreh

    2018-03-01

    Evaluation of dichotic listening to digits is a common part of many studies for diagnosis and managing auditory processing disorders in children. Previous researchers have verified test-retest relative reliability of dichotic digits results in normal children and adults. However, detecting intervention-related changes in the ear scores after dichotic listening training requires information regarding trial-to-trial typical variation of individual ear scores that is estimated using indices of absolute reliability. Previous studies have not addressed absolute reliability of dichotic listening results. To compare the results of the Persian randomized dichotic digits test (PRDDT) and its relative and absolute indices of reliability between typical achieving (TA) and learning-disabled (LD) children. A repeated measures observational study. Fifteen LD children were recruited from a previously performed study with age range of 7-12 yr. The control group consisted of 15 TA schoolchildren with age range of 8-11 yr. The Persian randomized dichotic digits test was administered on the children under free recall condition in two test sessions 7-12 days apart. We compared the average of the ear scores and ear advantage between TA and LD children. Relative indices of reliability included Pearson's correlation and intraclass correlation (ICC 2,1 ) coefficients and absolute reliability was evaluated by calculation of standard error of measurement (SEM) and minimal detectable change (MDC) using the raw ear scores. The Pearson correlation coefficient indicated that in both groups of children the ear scores of test and retest sessions were strongly and positively (greater than +0.8) correlated. The ear scores showed excellent ICC coefficient of consistency (0.78-0.82) and fair to excellent ICC coefficient of absolute agreement (0.62-0.74) in TA children and excellent ICC coefficients of consistency and absolute agreement in LD children (0.76-0.87). SEM and SEM% of the ear scores in TA

  18. Reliability of the "Ten Test" for assessment of discriminative sensation in hand trauma.

    Science.gov (United States)

    Berger, Michael J; Regan, William R; Seal, Alex; Bristol, Sean G

    2016-10-01

    "Ten Test" (TT) is a bedside measure of discriminative sensation, whereby the magnitude of abnormal sensation to moving light touch is normalized to an area of normal sensation on an 11-point Likert scale (0-10). The purposes of this study were to determine reliability parameters of the TT in a cohort of patients presenting to a hand trauma clinic with subjectively altered sensation post-injury and to compare the reliability of TT to that of the Weinstein Enhanced Sensory Test (WEST). Study participants (n = 29, mean age = 37 ± 12) comprised patients presenting to an outpatient hand trauma clinic with recent hand trauma and self reported abnormal sensation. Participants underwent TT and WEST by two separate raters on the same day. Interrater reliability, response stability and responsiveness of each test were determined by the intraclass correlation coefficient (ICC: 2, 1), standard error of measurement (SEM) with 95% confidence intervals (CI) and minimal detectable difference score, with 95% CI (MDD95), respectively. The TT displayed excellent interrater reliability (ICC = 0.95, 95% CI 0.89-0.97) compared to good reliability for WEST (ICC = 0.78, 95% CI 0.58-0.89). The range of true scores expected with 95% confidence based on the SEM (i.e. response stability), was ±1.1 for TT and ±1.1 for WEST. MDD95 scores reflecting test responsiveness were 1.5 and 1.6 for TT and WEST, respectively. The TT displayed excellent reliability parameters in this patient population. Reliability parameters were stronger for TT compared to WEST. These results provide support for the use of TT as a component of the sensory exam in hand trauma. Copyright © 2016 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. All rights reserved.

  19. Reliability analysis of reactor systems by applying probability method; Analiza pouzdanosti reaktorskih sistema primenom metoda verovatnoce

    Energy Technology Data Exchange (ETDEWEB)

    Milivojevic, S [Institute of Nuclear Sciences Boris Kidric, Vinca, Beograd (Serbia and Montenegro)

    1974-12-15

    Probability method was chosen for analysing the reactor system reliability is considered realistic since it is based on verified experimental data. In fact this is a statistical method. The probability method developed takes into account the probability distribution of permitted levels of relevant parameters and their particular influence on the reliability of the system as a whole. The proposed method is rather general, and was used for problem of thermal safety analysis of reactor system. This analysis enables to analyze basic properties of the system under different operation conditions, expressed in form of probability they show the reliability of the system on the whole as well as reliability of each component.

  20. Reliability of thermal-hydraulic passive safety systems

    International Nuclear Information System (INIS)

    D'Auria, F.; Araneo, D.; Pierro, F.; Galassi, G.

    2014-01-01

    The scholar will be informed of reliability concepts applied to passive system adopted for nuclear reactors. Namely, for classical components and systems the failure concept is associated with malfunction of breaking of hardware. In the case of passive systems the failure is associated with phenomena. A method for studying the reliability of passive systems is discussed and is applied. The paper deals with the description of the REPAS (Reliability Evaluation of Passive Safety System) methodology developed by University of Pisa (UNIPI) and with results from its application. The general objective of the REPAS methodology is to characterize the performance of a passive system in order to increase the confidence toward its operation and to compare the performances of active and passive systems and the performances of different passive systems

  1. Automation of reliability evaluation procedures through CARE - The computer-aided reliability estimation program.

    Science.gov (United States)

    Mathur, F. P.

    1972-01-01

    Description of an on-line interactive computer program called CARE (Computer-Aided Reliability Estimation) which can model self-repair and fault-tolerant organizations and perform certain other functions. Essentially CARE consists of a repository of mathematical equations defining the various basic redundancy schemes. These equations, under program control, are then interrelated to generate the desired mathematical model to fit the architecture of the system under evaluation. The mathematical model is then supplied with ground instances of its variables and is then evaluated to generate values for the reliability-theoretic functions applied to the model.

  2. Validity and Reliability of Visual Analog Scale Foot and Ankle: The Turkish Version.

    Science.gov (United States)

    Gur, Gozde; Turgut, Elif; Dilek, Burcu; Baltaci, Gul; Bek, Nilgun; Yakut, Yavuz

    The present study tested the reliability and validity of the Turkish version of the visual analog scale foot and ankle (VAS-FA) among healthy subjects and patients with foot problems. A total of 128 participants, 65 healthy subjects and 63 patients with foot problems, were evaluated. The VAS-FA was translated into Turkish and administered to the 128 subjects on 2 separate occasions with a 5-day interval. The test-retest reliability and internal consistency were assessed with the intraclass correlation coefficient and Cronbach's α. The validity was assessed using the correlations with Turkish versions of the Foot Function Index, the Foot and Ankle Outcome Score, and the Short-Form 36-item Health Survey. A statistically significant difference was found between the healthy group and the patient group in the overall score and subscale scores of the VAS-FA (p Foot Function Index, Foot and Ankle Outcome Score, and Short-Form 36-item Health Survey scores in the healthy and patient groups both. The Turkish version of the VAS-FA is sensitive enough to distinguish foot and ankle-specific pathologic conditions from asymptomatic conditions. The Turkish version of the VAS-FA is a reliable and valid method and can be used for foot-related problems. Copyright © 2017 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  3. The reliability and validity of the Tokyo Autistic Behaviour Scale.

    Science.gov (United States)

    Kurita, H; Miyake, Y

    1990-03-01

    The Tokyo Autistic Behavior Scale (TABS) consisting of 39 items provisionally grouped in four areas--interpersonal-social relationship, language-communication, habit-mannerism and others--is an instrument used by a child's caretaker to rate the child's autistic behaviors on a 3-point scale. Test-retest reliability was satisfactory (i.e., an r for a total score was .94). Among six DSM-III diagnostic groups, infantile autism showed a significantly higher total TABS score than the other five groups, and a taxonomic validity coefficient was .54. An r between total scores of the TABS and the Childhood Autism Rating Scale--Tokyo Version was .59. The area scores showed a lower validity than the total score. The TABS appears to be a useful instrument to assess autistic behavior.

  4. [Reliability and validity of the modified Perceived Health Competence Scale (PHCS) Japanese version].

    Science.gov (United States)

    Togari, Taisuke; Yamazaki, Yoshihiko; Koide, Syotaro; Miyata, Ayako

    2006-01-01

    In community and workplace health plans, the Perceived Health Competence Scale (PHCS) is employed as an index of health competency. The purpose of this research was to examine the reliability and validity of a modified Japanese PHCS. Interviews were sought with 3,000 randomly selected Japanese individuals using a two-step stratified method. Valid PHCS responses were obtained from 1,910 individuals, yielding a 63.7% response rate. Reliability was assessed using Cronbach's alpha coefficient (henceforth, alpha) to evaluate internal consistency, and by employing item-total correlation and alpha coefficient analyses to assess the effect of removal of variables from the model. To examine content validity, we assessed the correlation between the PHCS score and four respondent attribute characteristics, that is, sex, age, the presence of chronic disease, and the existence of chronic disease at age 18. The correlation between PHCS score and commonly employed healthy lifestyle indices was examined to assess construct validity. General linear model statistical analysis was employed. The modified Japanese PHCS demonstrated a satisfactory alpha coefficient of 0.869. Moreover, reliability was confirmed by item-total correlation and alpha coefficient analyses after removal of variables from the model. Differences in PHCS scores were seen between individuals 60 years and older, and younger individuals. These with current chronic disease, or who had had a chronic disease at age 18, tended to have lower PHCS scores. After controlling for the presence of current or age 18 chronic disease, age, and sex, significant correlations were seen between PHCS scores and tobacco use, dietary habits, and exercise, but not alcohol use or frequency of medical consultation. This study supports the reliability and validity, and hence supports the use, of the modified Japanese PHCS. Future longitudinal research is needed to evaluate the predictive power of modified Japanese PHCS scores, to examine

  5. Using perinatal morbidity scoring tools as a primary study outcome.

    Science.gov (United States)

    Hutcheon, Jennifer A; Bodnar, Lisa M; Platt, Robert W

    2017-11-01

    Perinatal morbidity scores are tools that score or weight different adverse events according to their relative severity. Perinatal morbidity scores are appealing for maternal-infant health researchers because they provide a way to capture a broad range of adverse events to mother and newborn while recognising that some events are considered more serious than others. However, they have proved difficult to implement as a primary outcome in applied research studies because of challenges in testing if the scores are significantly different between two or more study groups. We outline these challenges and describe a solution, based on Poisson regression, that allows differences in perinatal morbidity scores to be formally evaluated. The approach is illustrated using an existing maternal-neonatal scoring tool, the Adverse Outcome Index, to evaluate the safety of labour and delivery before and after the closure of obstetrical services in small rural communities. Applying the proposed Poisson regression to the case study showed a protective risk ratio for adverse outcome following closures as compared with the original analysis, where no difference was found. This approach opens the door for considerably broader use of perinatal morbidity scoring tools as a primary outcome in applied population and clinical maternal-infant health research studies. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  6. INTRA-RATER RELIABILITY OF THE MULTIPLE SINGLE-LEG HOP-STABILIZATION TEST AND RELATIONSHIPS WITH AGE, LEG DOMINANCE AND TRAINING.

    Science.gov (United States)

    Sawle, Leanne; Freeman, Jennifer; Marsden, Jonathan

    2017-04-01

    Balance is a complex construct, affected by multiple components such as strength and co-ordination. However, whilst assessing an athlete's dynamic balance is an important part of clinical examination, there is no gold standard measure. The multiple single-leg hop-stabilization test is a functional test which may offer a method of evaluating the dynamic attributes of balance, but it needs to show adequate intra-tester reliability. The purpose of this study was to assess the intra-rater reliability of a dynamic balance test, the multiple single-leg hop-stabilization test on the dominant and non-dominant legs. Intra-rater reliability study. Fifteen active participants were tested twice with a 10-minute break between tests. The outcome measure was the multiple single-leg hop-stabilization test score, based on a clinically assessed numerical scoring system. Results were analysed using an Intraclass Correlations Coefficient (ICC 2,1 ) and Bland-Altman plots. Regression analyses explored relationships between test scores, leg dominance, age and training (an alpha level of p = 0.05 was selected). ICCs for intra-rater reliability were 0.85 for the dominant and non-dominant legs (confidence intervals = 0.62-0.95 and 0.61-0.95 respectively). Bland-Altman plots showed scores within two standard deviations. A significant correlation was observed between the dominant and non-dominant leg on balance scores (R 2 =0.49, ptest demonstrated strong intra-tester reliability with active participants. Younger participants who trained more, have better balance scores. This test may be a useful measure for evaluating the dynamic attributes of balance. 3.

  7. The validity and reliability of the Moroccan version of the Revised Fibromyalgia Impact Questionnaire.

    Science.gov (United States)

    Srifi, Najlaa; Bahiri, Rachid; Rostom, Samira; Bendeddouche, Imad; Lazrek, Noufissa; Hajjaj-Hassouni, Najia

    2013-01-01

    The Revised Fibromyalgia Impact Questionnaire (FIQ-R) is an updated version of the FIQ attempts to address the limitations of the Fibromyalgia Impact Questionnaire (FIQ). As there is no Moroccan version of the FIQ-R available, we aimed to investigate the validity and reliability of a Moroccan translation of the FIQR in Moroccan fibromyalgia (FM) patients. After translating the FIQR into Moroccan, it was administered to 80 patients with FM. All of the patients filled out the questionnaire together with Arabic version of short form-36 (SF-36). The tender-point count was calculated from tender points identified by thumb palpation. Three days later, FM patients filled out the Moroccan FIQR at their second visit. The test-retest reliability of the Moroccan FIQR questions ranged from 0.72 to 0.87. The test and retest reliability of total FIQR score was 0.84. Cronbach's alpha was 0.91 for FIQR visit 1 (the first assessment) and 0.92 for FIQR visit 2 (the second assessment), indicating acceptable levels of internal consistency for both assessments. Significant correlations for construct validity were obtained between the Moroccan FIQ-R total and domain scores and the subscales of the SF-36 (FIQR total versus SF-36 physical component score and mental component score were r = -0.69, P FIQ-R showed adequate reliability and validity. This instrument can be used in the clinical evaluation of Moroccan and Arabic-speaking patients with FM.

  8. Interrater reliability of the mind map assessment rubric in a cohort of medical students

    Directory of Open Access Journals (Sweden)

    Zipp Genevieve

    2009-04-01

    Full Text Available Abstract Background Learning strategies are thinking tools that students can use to actively acquire information. Examples of learning strategies include mnemonics, charts, and maps. One strategy that may help students master the tsunami of information presented in medical school is the mind map learning strategy. Currently, there is no valid and reliable rubric to grade mind maps and this may contribute to their underutilization in medicine. Because concept maps and mind maps engage learners similarly at a metacognitive level, a valid and reliable concept map assessment scoring system was adapted to form the mind map assessment rubric (MMAR. The MMAR can assess mind map depth based upon concept-links, cross-links, hierarchies, examples, pictures, and colors. The purpose of this study was to examine interrater reliability of the MMAR. Methods This exploratory study was conducted at a US medical school as part of a larger investigation on learning strategies. Sixty-six (N = 66 first-year medical students were given a 394-word text passage followed by a 30-minute presentation on mind mapping. After the presentation, subjects were again given the text passage and instructed to create mind maps based upon the passage. The mind maps were collected and independently scored using the MMAR by 3 examiners. Interrater reliability was measured using the intraclass correlation coefficient (ICC statistic. Statistics were calculated using SPSS version 12.0 (Chicago, IL. Results Analysis of the mind maps revealed the following: concept-links ICC = .05 (95% CI, -.42 to .38, cross-links ICC = .58 (95% CI, .37 to .73, hierarchies ICC = .23 (95% CI, -.15 to .50, examples ICC = .53 (95% CI, .29 to .69, pictures ICC = .86 (95% CI, .79 to .91, colors ICC = .73 (95% CI, .59 to .82, and total score ICC = .86 (95% CI, .79 to .91. Conclusion The high ICC value for total mind map score indicates strong MMAR interrater reliability. Pictures and colors demonstrated moderate

  9. Interrater reliability of the mind map assessment rubric in a cohort of medical students.

    Science.gov (United States)

    D'Antoni, Anthony V; Zipp, Genevieve Pinto; Olson, Valerie G

    2009-04-28

    Learning strategies are thinking tools that students can use to actively acquire information. Examples of learning strategies include mnemonics, charts, and maps. One strategy that may help students master the tsunami of information presented in medical school is the mind map learning strategy. Currently, there is no valid and reliable rubric to grade mind maps and this may contribute to their underutilization in medicine. Because concept maps and mind maps engage learners similarly at a metacognitive level, a valid and reliable concept map assessment scoring system was adapted to form the mind map assessment rubric (MMAR). The MMAR can assess mind map depth based upon concept-links, cross-links, hierarchies, examples, pictures, and colors. The purpose of this study was to examine interrater reliability of the MMAR. This exploratory study was conducted at a US medical school as part of a larger investigation on learning strategies. Sixty-six (N = 66) first-year medical students were given a 394-word text passage followed by a 30-minute presentation on mind mapping. After the presentation, subjects were again given the text passage and instructed to create mind maps based upon the passage. The mind maps were collected and independently scored using the MMAR by 3 examiners. Interrater reliability was measured using the intraclass correlation coefficient (ICC) statistic. Statistics were calculated using SPSS version 12.0 (Chicago, IL). Analysis of the mind maps revealed the following: concept-links ICC = .05 (95% CI, -.42 to .38), cross-links ICC = .58 (95% CI, .37 to .73), hierarchies ICC = .23 (95% CI, -.15 to .50), examples ICC = .53 (95% CI, .29 to .69), pictures ICC = .86 (95% CI, .79 to .91), colors ICC = .73 (95% CI, .59 to .82), and total score ICC = .86 (95% CI, .79 to .91). The high ICC value for total mind map score indicates strong MMAR interrater reliability. Pictures and colors demonstrated moderate to strong interrater reliability. We conclude that the

  10. Construct Validity and Reliability of the SARA Gait and Posture Sub-scale in Early Onset Ataxia

    Directory of Open Access Journals (Sweden)

    Tjitske F. Lawerman

    2017-12-01

    Full Text Available Aim: In children, gait and posture assessment provides a crucial marker for the early characterization, surveillance and treatment evaluation of early onset ataxia (EOA. For reliable data entry of studies targeting at gait and posture improvement, uniform quantitative biomarkers are necessary. Until now, the pediatric test construct of gait and posture scores of the Scale for Assessment and Rating of Ataxia sub-scale (SARA is still unclear. In the present study, we aimed to validate the construct validity and reliability of the pediatric (SARAGAIT/POSTURE sub-scale.Methods: We included 28 EOA patients [15.5 (6–34 years; median (range]. For inter-observer reliability, we determined the ICC on EOA SARAGAIT/POSTURE sub-scores by three independent pediatric neurologists. For convergent validity, we associated SARAGAIT/POSTURE sub-scores with: (1 Ataxic gait Severity Measurement by Klockgether (ASMK; dynamic balance, (2 Pediatric Balance Scale (PBS; static balance, (3 Gross Motor Function Classification Scale -extended and revised version (GMFCS-E&R, (4 SARA-kinetic scores (SARAKINETIC; kinetic function of the upper and lower limbs, (5 Archimedes Spiral (AS; kinetic function of the upper limbs, and (6 total SARA scores (SARATOTAL; i.e., summed SARAGAIT/POSTURE, SARAKINETIC, and SARASPEECH sub-scores. For discriminant validity, we investigated whether EOA co-morbidity factors (myopathy and myoclonus could influence SARAGAIT/POSTURE sub-scores.Results: The inter-observer agreement (ICC on EOA SARAGAIT/POSTURE sub-scores was high (0.97. SARAGAIT/POSTURE was strongly correlated with the other ataxia and functional scales [ASMK (rs = -0.819; p < 0.001; PBS (rs = -0.943; p < 0.001; GMFCS-E&R (rs = -0.862; p < 0.001; SARAKINETIC (rs = 0.726; p < 0.001; AS (rs = 0.609; p = 0.002; and SARATOTAL (rs = 0.935; p < 0.001]. Comorbid myopathy influenced SARAGAIT/POSTURE scores by concurrent muscle weakness, whereas comorbid myoclonus predominantly influenced

  11. Development of Reliable and Validated Tools to Evaluate Technical Resuscitation Skills in a Pediatric Simulation Setting: Resuscitation and Emergency Simulation Checklist for Assessment in Pediatrics.

    Science.gov (United States)

    Faudeux, Camille; Tran, Antoine; Dupont, Audrey; Desmontils, Jonathan; Montaudié, Isabelle; Bréaud, Jean; Braun, Marc; Fournier, Jean-Paul; Bérard, Etienne; Berlengi, Noémie; Schweitzer, Cyril; Haas, Hervé; Caci, Hervé; Gatin, Amélie; Giovannini-Chami, Lisa

    2017-09-01

    To develop a reliable and validated tool to evaluate technical resuscitation skills in a pediatric simulation setting. Four Resuscitation and Emergency Simulation Checklist for Assessment in Pediatrics (RESCAPE) evaluation tools were created, following international guidelines: intraosseous needle insertion, bag mask ventilation, endotracheal intubation, and cardiac massage. We applied a modified Delphi methodology evaluation to binary rating items. Reliability was assessed comparing the ratings of 2 observers (1 in real time and 1 after a video-recorded review). The tools were assessed for content, construct, and criterion validity, and for sensitivity to change. Inter-rater reliability, evaluated with Cohen kappa coefficients, was perfect or near-perfect (>0.8) for 92.5% of items and each Cronbach alpha coefficient was ≥0.91. Principal component analyses showed that all 4 tools were unidimensional. Significant increases in median scores with increasing levels of medical expertise were demonstrated for RESCAPE-intraosseous needle insertion (P = .0002), RESCAPE-bag mask ventilation (P = .0002), RESCAPE-endotracheal intubation (P = .0001), and RESCAPE-cardiac massage (P = .0037). Significantly increased median scores over time were also demonstrated during a simulation-based educational program. RESCAPE tools are reliable and validated tools for the evaluation of technical resuscitation skills in pediatric settings during simulation-based educational programs. They might also be used for medical practice performance evaluations. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Exploration of analysis methods for diagnostic imaging tests: problems with ROC AUC and confidence scores in CT colonography.

    Science.gov (United States)

    Mallett, Susan; Halligan, Steve; Collins, Gary S; Altman, Doug G

    2014-01-01

    Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.

  13. Multidisciplinary System Reliability Analysis

    Science.gov (United States)

    Mahadevan, Sankaran; Han, Song; Chamis, Christos C. (Technical Monitor)

    2001-01-01

    The objective of this study is to develop a new methodology for estimating the reliability of engineering systems that encompass multiple disciplines. The methodology is formulated in the context of the NESSUS probabilistic structural analysis code, developed under the leadership of NASA Glenn Research Center. The NESSUS code has been successfully applied to the reliability estimation of a variety of structural engineering systems. This study examines whether the features of NESSUS could be used to investigate the reliability of systems in other disciplines such as heat transfer, fluid mechanics, electrical circuits etc., without considerable programming effort specific to each discipline. In this study, the mechanical equivalence between system behavior models in different disciplines are investigated to achieve this objective. A new methodology is presented for the analysis of heat transfer, fluid flow, and electrical circuit problems using the structural analysis routines within NESSUS, by utilizing the equivalence between the computational quantities in different disciplines. This technique is integrated with the fast probability integration and system reliability techniques within the NESSUS code, to successfully compute the system reliability of multidisciplinary systems. Traditional as well as progressive failure analysis methods for system reliability estimation are demonstrated, through a numerical example of a heat exchanger system involving failure modes in structural, heat transfer and fluid flow disciplines.

  14. Rating scales for dystonia in cerebral palsy: reliability and validity.

    Science.gov (United States)

    Monbaliu, E; Ortibus, E; Roelens, F; Desloovere, K; Deklerck, J; Prinzie, P; de Cock, P; Feys, H

    2010-06-01

    This study investigated the reliability and validity of the Barry-Albright Dystonia Scale (BADS), the Burke-Fahn-Marsden Movement Scale (BFMMS), and the Unified Dystonia Rating Scale (UDRS) in patients with bilateral dystonic cerebral palsy (CP). Three raters independently scored videotapes of 10 patients (five males, five females; mean age 13 y 3 mo, SD 5 y 2 mo, range 5-22 y). One patient each was classified at levels I-IV in the Gross Motor Function Classification System and six patients were classified at level V. Reliability was measured by (1) intraclass correlation coefficient (ICC) for interrater reliability, (2) standard error of measurement (SEM) and smallest detectable difference (SDD), and (3) Cronbach's alpha for internal consistency. Validity was assessed by Pearson's correlations among the three scales used and by content analysis. Moderate to good interrater reliability was found for total scores of the three scales (ICC: BADS=0.87; BFMMS=0.86; UDRS=0.79). However, many subitems showed low reliability, in particular for the UDRS. SEM and SDD were respectively 6.36% and 17.72% for the BADS, 9.88% and 27.39% for the BFMMS, and 8.89% and 24.63% for the UDRS. High internal consistency was found. Pearson's correlations were high. Content validity showed insufficient accordance with the new CP definition and classification. Our results support the internal consistency and concurrent validity of the scales; however, taking into consideration the limitations in reliability, including the large SDD values and the content validity, further research on methods of assessment of dystonia is warranted.

  15. Derivation of Two Critical Appraisal Scores for Trainees to Evaluate Online Educational Resources: A METRIQ Study

    Directory of Open Access Journals (Sweden)

    Teresa M. Chan

    2016-09-01

    Full Text Available Introduction: Online education resources (OERs, like blogs and podcasts, increasingly augment or replace traditional medical education resources such as textbooks and lectures. Trainees’ ability to evaluate these resources is poor, and few quality assessment aids have been developed to assist them. This study aimed to derive a quality evaluation instrument for this purpose. Methods: We used a three-phase methodology. In Phase 1, a previously derived list of 151 OER quality indicators was reduced to 13 items using data from published consensus-building studies (of medical educators, expert podcasters, and expert bloggers and subsequent evaluation by our team. In Phase 2, these 13 items were converted to seven-point Likert scales used by trainee raters (n=40 to evaluate 39 OERs. The reliability and usability of these 13 rating items was determined using responses from trainee raters, and top items were used to create two OER quality evaluation instruments. In Phase 3, these instruments were compared to an external certification process (the ALiEM AIR certification and the gestalt evaluation of the same 39 blog posts by 20 faculty educators. Results: Two quality-evaluation instruments were derived with fair inter-rater reliability: the METRIQ-8 Score (Inter class correlation coefficient [ICC]=0.30, p<0.001 and the METRIQ-5 Score (ICC=0.22, p<0.001. Both scores, when calculated using the derivation data, correlated with educator gestalt (Pearson’s r=0.35, p=0.03 and r=0.41, p<0.01, respectively and were related to increased odds of receiving an ALiEM AIR certification (odds ratio=1.28, p=0.03; OR=1.5, p=0.004, respectively. Conclusion: Two novel scoring instruments with adequate psychometric properties were derived to assist trainees in evaluating OER quality and correlated favourably with gestalt ratings of online educational resources by faculty educators. Further testing is needed to ensure these instruments are accurate when applied by

  16. Derivation of Two Critical Appraisal Scores for Trainees to Evaluate Online Educational Resources: A METRIQ Study

    Science.gov (United States)

    Chan, Teresa M.; Thoma, Brent; Krishnan, Keeth; Lin, Michelle; Carpenter, Christopher R.; Astin, Matt; Kulasegaram, Kulamakan

    2016-01-01

    Introduction Online education resources (OERs), like blogs and podcasts, increasingly augment or replace traditional medical education resources such as textbooks and lectures. Trainees’ ability to evaluate these resources is poor, and few quality assessment aids have been developed to assist them. This study aimed to derive a quality evaluation instrument for this purpose. Methods We used a three-phase methodology. In Phase 1, a previously derived list of 151 OER quality indicators was reduced to 13 items using data from published consensus-building studies (of medical educators, expert podcasters, and expert bloggers) and subsequent evaluation by our team. In Phase 2, these 13 items were converted to seven-point Likert scales used by trainee raters (n=40) to evaluate 39 OERs. The reliability and usability of these 13 rating items was determined using responses from trainee raters, and top items were used to create two OER quality evaluation instruments. In Phase 3, these instruments were compared to an external certification process (the ALiEM AIR certification) and the gestalt evaluation of the same 39 blog posts by 20 faculty educators. Results Two quality-evaluation instruments were derived with fair inter-rater reliability: the METRIQ-8 Score (Inter class correlation coefficient [ICC]=0.30, p<0.001) and the METRIQ-5 Score (ICC=0.22, p<0.001). Both scores, when calculated using the derivation data, correlated with educator gestalt (Pearson’s r=0.35, p=0.03 and r=0.41, p<0.01, respectively) and were related to increased odds of receiving an ALiEM AIR certification (odds ratio=1.28, p=0.03; OR=1.5, p=0.004, respectively). Conclusion Two novel scoring instruments with adequate psychometric properties were derived to assist trainees in evaluating OER quality and correlated favourably with gestalt ratings of online educational resources by faculty educators. Further testing is needed to ensure these instruments are accurate when applied by trainees. PMID

  17. Reliability assessment of Indian Point Unit 3 containment structure

    International Nuclear Information System (INIS)

    Kawakami, J.; Hwang, H.; Chang, M.T.; Reich, M.

    1984-01-01

    In the current design criteria, the load combinations specified for design of concrete containment structures are in the deterministic formats. However, by applying the probability-based reliability method developed by BNL to the concrete containment structures designed according to the criteria, it is possible to evaluate the reliability levels implied in the current design criteria. For this purpose, the reliability analysis is applied to the Indian Point Unit No. 3 containment. The details of the containment structure such as the geometries and the rebar arrangements, etc., are taken from the working drawings and the final safety analysis reports. Three kinds of loads are considered in the reliability analysis. They are, dead load (D), accidental pressure due to a large LOCA (P), and earthquake ground acceleration (E). Reliability analysis of the containment subjected to all combinations of loads is performed. Results are presented in this report

  18. [Reliability and validity of the standardized Mini Mental State Examination in the diagnosis of mild dementia in Turkish population].

    Science.gov (United States)

    Güngen, Can; Ertan, Turan; Eker, Engin; Yaşar, Resmiye; Engin, Funda

    2002-01-01

    Reliability and validity of the Mini Mental State Examination in differentiating mild dementia from normal controls in Turkish population. The Standardized Mini Mental State Examination (SMMSE) and its instruction were translated into Turkish. A total of 212 subjects with mean age of 77 +/- 6, were recruited for the study. 71 were diagnosed to be demented and 141 were evaluated as normal controls. The scale total score was analysed for discriminant validity using Student's t-test. Sensitivity, specificity, positive and negative predictive values and kappa score were calculated for all of the scores between 18 and 29. Kappa value was calculated for the comparison of the dementia diagnosis between the two investigators using the best cut off score obtained in the analysis above. Statistical analysis revealed that the Turkish version of the SMMSE has high discriminant validity and interrater reliability in the diagnosis of mild dementia. The cut off score 23/24 was found to have the highest sensitivity (0.91), specificity (0.95), positive and negative predictive values (0.90 and 0.95) and kappa score (0.86). Interrater reliability analysis showed high correlation (r:0.99) and kappa value (0.92). The results of this study showed that the Turkish version of the SMMSE has high reliability and validity for the diagnosis of mild dementia in Turkish population.

  19. Translation and validation of the Dutch new Knee Society Scoring System ©.

    Science.gov (United States)

    Van Der Straeten, Catherine; Witvrouw, Erik; Willems, Tine; Bellemans, Johan; Victor, Jan

    2013-11-01

    A new version of The Knee Society Knee Scoring System(©) (KSS) has recently been developed. Before this scale can be used in non-English-speaking populations, it has to be translated and validated for a particular population. We evaluated the construct and content validity, the test-retest reliability, and the internal consistency of the Dutch version of the New Knee Society KSS. A Dutch translation was performed using a forward-backward translation protocol. We tested the construct validity of the Dutch New KSS by comparing it with the Dutch versions of the WOMAC, Knee Injury and Osteoarthritis Outcome Score (KOOS), and SF-12 scores in 137 patients undergoing total knee arthroplasty (TKA). Content validity was assessed by comparing pre- and postoperative scores and by checking floor and ceiling effects. To evaluate test-retest reliability and consistency, 47 patients completed the questionnaire a second time with a mean of 8 days interval (range, 2-20 days) between tests. Construct validity was demonstrated because the Dutch New KSS correlated well with the Dutch WOMAC (r = -0.751; p Dutch KOOS (r = -0.723; p Dutch SF-12 (r = 0.569; p Dutch New KSS is an excellent instrument to evaluate TKA outcome in Dutch-speaking patients.

  20. Test-Retest Reliability of the Short-Form Survivor Unmet Needs Survey.

    Science.gov (United States)

    Taylor, Karen; Bulsara, Max; Monterosso, Leanne

    2018-01-01

    Reliable and valid needs assessment measures are important assessment tools in cancer survivorship care. A new 30-item short-form version of the Survivor Unmet Needs Survey (SF-SUNS) was developed and validated with cancer survivors, including hematology cancer survivors; however, test-retest reliability has not been established. The objective of this study was to assess the test-retest reliability of the SF-SUNS with a cohort of lymphoma survivors ( n = 40). Test-retest reliability of the SF-SUNS was conducted at two time points: baseline (time 1) and 5 days later (time 2). Test-retest data were collected from lymphoma cancer survivors ( n = 40) in a large tertiary cancer center in Western Australia. Intraclass correlation analyses compared data at time 1 (baseline) and time 2 (5 days later). Cronbach's alpha analyses were performed to assess the internal consistency at both time points. The majority (23/30, 77%) of items achieved test-retest reliability scores 0.45-0.74 (fair to good). A high degree of overall internal consistency was demonstrated (time 1 = 0.92, time 2 = 0.95), with scores 0.65-0.94 across subscales for both time points. Mixed test-retest reliability of the SF-SUNS was established. Our results indicate the SF-SUNS is responsive to the changing needs of lymphoma cancer survivors. Routine use of cancer survivorship specific needs-based assessments is required in oncology care today. Nurses are well placed to administer these assessments and provide tailored information and resources. Further assessment of test-retest reliability in hematology and other cancer cohorts is warranted.

  1. Comparing continuous and dichotomous scoring of the balanced inventory of desirable responding.

    Science.gov (United States)

    Stöber, Joachim; Dette, Dorothea E; Musch, Jochen

    2002-04-01

    The Balanced Inventory of Desirable Responding (BIDR; Paulhus, 1994) is a widely used instrument to measure the 2 components of social desirability: self-deceptive enhancement and impression management. With respect to scoring of the BIDR, Paulhus (1994) authorized 2 methods, namely continuous scoring (all answers on the continuous answer scale are counted) and dichotomous scoring (only extreme answers are counted). In this article, we report 3 studies with student samples, and continuous and dichotomous scoring of BIDR subscales are compared with respect to reliability, convergent validity, sensitivity to instructional variations, and correlations with personality. Across studies, the scores from continuous scoring (continuous scores) showed higher Cronbach's alphas than those from dichotomous scoring (dichotomous scores). Moreover, continuous scores showed higher convergent correlations with other measures of social desirability and more consistent effects with self-presentation instructions (fake-good vs. fake-bad instructions). Finally, continuous self-deceptive enhancement scores showed higher correlations with those traits of the Five-factor model for which substantial correlations were expected (i.e., Neuroticism, Extraversion, and Conscientiousness). Consequently, these findings indicate that continuous scoring may be preferable to dichotomous scoring when assessing socially desirable responding with the BIDR.

  2. Adjoint sensitivity analysis of dynamic reliability models based on Markov chains - II: Application to IFMIF reliability assessment

    Energy Technology Data Exchange (ETDEWEB)

    Cacuci, D. G. [Commiss Energy Atom, Direct Energy Nucl, Saclay, (France); Cacuci, D. G.; Balan, I. [Univ Karlsruhe, Inst Nucl Technol and Reactor Safetly, Karlsruhe, (Germany); Ionescu-Bujor, M. [Forschungszentrum Karlsruhe, Fus Program, D-76021 Karlsruhe, (Germany)

    2008-07-01

    In Part II of this work, the adjoint sensitivity analysis procedure developed in Part I is applied to perform sensitivity analysis of several dynamic reliability models of systems of increasing complexity, culminating with the consideration of the International Fusion Materials Irradiation Facility (IFMIF) accelerator system. Section II presents the main steps of a procedure for the automated generation of Markov chains for reliability analysis, including the abstraction of the physical system, construction of the Markov chain, and the generation and solution of the ensuing set of differential equations; all of these steps have been implemented in a stand-alone computer code system called QUEFT/MARKOMAG-S/MCADJSEN. This code system has been applied to sensitivity analysis of dynamic reliability measures for a paradigm '2-out-of-3' system comprising five components and also to a comprehensive dynamic reliability analysis of the IFMIF accelerator system facilities for the average availability and, respectively, the system's availability at the final mission time. The QUEFT/MARKOMAG-S/MCADJSEN has been used to efficiently compute sensitivities to 186 failure and repair rates characterizing components and subsystems of the first-level fault tree of the IFMIF accelerator system. (authors)

  3. Adjoint sensitivity analysis of dynamic reliability models based on Markov chains - II: Application to IFMIF reliability assessment

    International Nuclear Information System (INIS)

    Cacuci, D. G.; Cacuci, D. G.; Balan, I.; Ionescu-Bujor, M.

    2008-01-01

    In Part II of this work, the adjoint sensitivity analysis procedure developed in Part I is applied to perform sensitivity analysis of several dynamic reliability models of systems of increasing complexity, culminating with the consideration of the International Fusion Materials Irradiation Facility (IFMIF) accelerator system. Section II presents the main steps of a procedure for the automated generation of Markov chains for reliability analysis, including the abstraction of the physical system, construction of the Markov chain, and the generation and solution of the ensuing set of differential equations; all of these steps have been implemented in a stand-alone computer code system called QUEFT/MARKOMAG-S/MCADJSEN. This code system has been applied to sensitivity analysis of dynamic reliability measures for a paradigm '2-out-of-3' system comprising five components and also to a comprehensive dynamic reliability analysis of the IFMIF accelerator system facilities for the average availability and, respectively, the system's availability at the final mission time. The QUEFT/MARKOMAG-S/MCADJSEN has been used to efficiently compute sensitivities to 186 failure and repair rates characterizing components and subsystems of the first-level fault tree of the IFMIF accelerator system. (authors)

  4. The modified gait abnormality rating scale in patients with a conversion disorder: a reliability and responsiveness study.

    Science.gov (United States)

    Vandenberg, Justin M; George, Deanna R; O'Leary, Andrea J; Olson, Lindsay C; Strassburg, Kaitlyn R; Hollman, John H

    2015-01-01

    Individuals with conversion disorder have neurologic symptoms that are not identified by an underlying organic cause. Often the symptoms manifest as gait disturbances. The modified gait abnormality rating scale (GARS-M) may be useful for quantifying gait abnormalities in these individuals. The purpose of this study was to examine the reliability, responsiveness and concurrent validity of GARS-M scores in individuals with conversion disorder. Data from 27 individuals who completed a rehabilitation program were included in this study. Pre- and post-intervention videos were obtained and walking speed was measured. Five examiners independently evaluated gait performance according to the GARS-M criteria. Inter- and intrarater reliability of GARS-M scores were estimated with intraclass correlation coefficients (ICCs). Responsiveness was estimated with the minimum detectable change (MDC). Pre- to post-treatment changes in GARS-M scores were analyzed with a dependent t-test. The correlation between GARS-M scores and walking speed was analyzed to assess concurrent validity. GARS-M scores were quantified with good-to-excellent inter- (ICC = 0.878) and intrarater reliability (ICC = 0.989). The MDC was 2 points. Mean GARS-M scores decreased from 7 ± 5 at baseline to 1 ± 2 at discharge (t26 = 7.411, p conversion disorder. GARS-M scores provide objective measures upon which treatment effects can be assessed. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Accuracy and reliability of peer assessment of athletic training psychomotor laboratory skills.

    Science.gov (United States)

    Marty, Melissa C; Henning, Jolene M; Willse, John T

    2010-01-01

    Peer assessment is defined as students judging the level or quality of a fellow student's understanding. No researchers have yet demonstrated the accuracy or reliability of peer assessment in athletic training education. To determine the accuracy and reliability of peer assessment of athletic training students' psychomotor skills. Cross-sectional study. Entry-level master's athletic training education program. First-year (n  =  5) and second-year (n  =  8) students. Participants evaluated 10 videos of a peer performing 3 psychomotor skills (middle deltoid manual muscle test, Faber test, and Slocum drawer test) on 2 separate occasions using a valid assessment tool. Accuracy of each peer-assessment score was examined through percentage correct scores. We used a generalizability study to determine how reliable athletic training students were in assessing a peer performing the aforementioned skills. Decision studies using generalizability theory demonstrated how the peer-assessment scores were affected by the number of participants and number of occasions. Participants had a high percentage of correct scores: 96.84% for the middle deltoid manual muscle test, 94.83% for the Faber test, and 97.13% for the Slocum drawer test. They were not able to reliably assess a peer performing any of the psychomotor skills on only 1 occasion. However, the φ increased (exceeding the 0.70 minimal standard) when 2 participants assessed the skill on 3 occasions (φ  =  0.79) for the Faber test, with 1 participant on 2 occasions (φ  =  0.76) for the Slocum drawer test, and with 3 participants on 2 occasions for the middle deltoid manual muscle test (φ  =  0.72). Although students did not detect all errors, they assessed their peers with an average of 96% accuracy. Having only 1 student assess a peer performing certain psychomotor skills was less reliable than having more than 1 student assess those skills on more than 1 occasion. Peer assessment of psychomotor skills

  6. Reliability of the Quality of Upper Extremity Skills Test for Children with Cerebral Palsy Aged 2 to 12 Years

    Science.gov (United States)

    Thorley, Megan; Lannin, Natasha; Cusick, Anne; Novak, Iona; Boyd, Roslyn

    2012-01-01

    Aim: To investigate reliability of the Quality of Upper Extremity Skills Test (QUEST) scores for children with cerebral palsy (CP) aged 2-12 years. Method: Thirty-one QUESTs from 24 children with CP were rated once by two raters and twice by one rater. Internal consistency of total scores, inter- and intra-rater reliability findings for total,…

  7. The development of a preliminary ultrasonographic scoring system for features of hand osteoarthritis.

    LENUS (Irish Health Repository)

    Keen, H I

    2008-05-01

    Painful osteoarthritis (OA) of the hand is common and a validated ultrasound (US) scoring system would be valuable for epidemiological and therapeutic outcome studies. US is increasingly used to assess peripheral joints, though most of the US focus in rheumatic diseases has been on rheumatoid arthritis. We aimed to develop a preliminary US hand OA scoring system, initially focusing on relevant pathological features with potentially high reliability.

  8. Validity and reliability of short form-12 questionnaire in Iranian hemodialysis patients

    DEFF Research Database (Denmark)

    Pakpour, Amir H.; Nourozi, Saeedeh; Mølsted, Stig

    2011-01-01

    INTRODUCTION: The aim of the study was to assess the validity and reliability of the SF-12 questionnaire in a sample of Iranian patients undergoing hemodialysis. MATERIALS AND METHODS: One hundred and forty-four hemodialysis patients were included from dialysis centers in Zanjan, Iran, and were...... asked to complete the SF-12 and SF-36 questionnaires. An initial test-retest reliability evaluation was performed on a sample of 70 patients from the total group, with a retest interval of 14 days. Reliability was estimated by internal consistency and validity was assessed using known-group comparisons...... and construct validity on the patient group as a whole. A linear regression analysis was used to assess any variation in the physical component summary and mental component summary scores of the SF-36 with the respective component summary scores of the SF-12. In addition, the factor structure...

  9. Test-Retest Reliability and Predictive Validity of the Implicit Association Test in Children

    Science.gov (United States)

    Rae, James R.; Olson, Kristina R.

    2018-01-01

    The Implicit Association Test (IAT) is increasingly used in developmental research despite minimal evidence of whether children's IAT scores are reliable across time or predictive of behavior. When test-retest reliability and predictive validity have been assessed, the results have been mixed, and because these studies have differed on many…

  10. Joint interval reliability for Markov systems with an application in transmission line reliability

    Energy Technology Data Exchange (ETDEWEB)

    Csenki, Attila [School of Computing and Mathematics, University of Bradford, Bradford, West Yorkshire, BD7 1DP (United Kingdom)]. E-mail: a.csenki@bradford.ac.uk

    2007-06-15

    We consider Markov reliability models whose finite state space is partitioned into the set of up states {sub U} and the set of down states {sub D}. Given a collection of k disjoint time intervals I{sub l}=[t{sub l},t{sub l}+x{sub l}], l=1,...,k, the joint interval reliability is defined as the probability of the system being in {sub U} for all time instances in I{sub 1} union ... union I{sub k}. A closed form expression is derived here for the joint interval reliability for this class of models. The result is applied to power transmission lines in a two-state fluctuating environment. We use the Linux versions of the free packages Maxima and Scilab in our implementation for symbolic and numerical work, respectively.

  11. Coronary artery calcification score by multislice computed tomography predicts the outcome of dobutamine cardiovascular magnetic resonance imaging

    International Nuclear Information System (INIS)

    Janssen, Caroline H.C.; Vliegenthart, Rozemarijn; Overbosch, Jelle; Oudkerk, Matthijs; Kuijpers, Dirkjan; Dijkman, Paul R.M. van; Zijlstra, Felix

    2005-01-01

    The aim of this study was to determine whether a coronary artery calcium (CAC) score of less than 11 can reliably rule out myocardial ischemia detected by dobutamine cardiovascular magnetic resonance imaging (CMR) in patients suspected of having myocardial ischemia. In 114 of 136 consecutive patients clinically suspected of myocardial ischemia with an inconclusive diagnosis of myocardial ischemia, dobutamine CMR was performed and the CAC score was determined. The CAC score was obtained by 16-row multidetector computed tomography (MDCT) and was calculated according to the method of Agatston. The CAC score and the results of the dobutamine CMR were correlated and the positive predictive value (PPV) and the negative predictive value (NPV) of the CAC score for dobutamine CMR were calculated. A total of 114 (87%) of the patients were eligible for this study. There was a significant correlation between the CAC score and dobutamine CMR (p<0.001). Patients with a CAC score of less than 11 showed no signs of inducible ischemia during dobutamine CMR. For a CAC score of less than 101, the NPV and the PPV of the CAC score for the outcome of dobutamine CMR were, respectively, 0.96 and 0.29. In patients with an inconclusive diagnosis of myocardial ischemia a MDCT CAC score of less than 11 reliably rules out myocardial ischemia detected by dobutamine CMR. (orig.)

  12. Coronary artery calcification score by multislice computed tomography predicts the outcome of dobutamine cardiovascular magnetic resonance imaging.

    Science.gov (United States)

    Janssen, Caroline H C; Kuijpers, Dirkjan; Vliegenthart, Rozemarijn; Overbosch, Jelle; van Dijkman, Paul R M; Zijlstra, Felix; Oudkerk, Matthijs

    2005-06-01

    The aim of this study was to determine whether a coronary artery calcium (CAC) score of less than 11 can reliably rule out myocardial ischemia detected by dobutamine cardiovascular magnetic resonance imaging (CMR) in patients suspected of having myocardial ischemia. In 114 of 136 consecutive patients clinically suspected of myocardial ischemia with an inconclusive diagnosis of myocardial ischemia, dobutamine CMR was performed and the CAC score was determined. The CAC score was obtained by 16-row multidetector compued tomography (MDCT) and was calculated according to the method of Agatston. The CAC score and the results of the dobutamine CMR were correlated and the positive predictive value (PPV) and the negative predictive value (NPV) of the CAC score for dobutamine CMR were calculated. A total of 114 (87%) of the patients were eligible for this study. There was a significant correlation between the CAC score and dobutamine CMR (p<0.001). Patients with a CAC score of less than 11 showed no signs of inducible ischemia during dobutamine CMR. For a CAC score of less than 101, the NPV and the PPV of the CAC score for the outcome of dobutamine CMR were, respectively, 0.96 and 0.29. In patients with an inconclusive diagnosis of myocardial ischemia a MDCT CAC score of less than 11 reliably rules out myocardial ischemia detected by dobutamine CMR.

  13. Coronary artery calcification score by multislice computed tomography predicts the outcome of dobutamine cardiovascular magnetic resonance imaging

    Energy Technology Data Exchange (ETDEWEB)

    Janssen, Caroline H.C.; Vliegenthart, Rozemarijn; Overbosch, Jelle; Oudkerk, Matthijs [University Hospital Groningen, Department of Radiology, Groningen (Netherlands); Kuijpers, Dirkjan [University Hospital Groningen, Department of Radiology, Groningen (Netherlands); Bronovo Hospital, Department of Radiology, The Hague (Netherlands); Dijkman, Paul R.M. van [Bronovo Hospital, Department of Cardiology, The Hague (Netherlands); Zijlstra, Felix [University Hospital Groningen, Department of Cardiology, Groningen (Netherlands)

    2005-06-01

    The aim of this study was to determine whether a coronary artery calcium (CAC) score of less than 11 can reliably rule out myocardial ischemia detected by dobutamine cardiovascular magnetic resonance imaging (CMR) in patients suspected of having myocardial ischemia. In 114 of 136 consecutive patients clinically suspected of myocardial ischemia with an inconclusive diagnosis of myocardial ischemia, dobutamine CMR was performed and the CAC score was determined. The CAC score was obtained by 16-row multidetector computed tomography (MDCT) and was calculated according to the method of Agatston. The CAC score and the results of the dobutamine CMR were correlated and the positive predictive value (PPV) and the negative predictive value (NPV) of the CAC score for dobutamine CMR were calculated. A total of 114 (87%) of the patients were eligible for this study. There was a significant correlation between the CAC score and dobutamine CMR (p<0.001). Patients with a CAC score of less than 11 showed no signs of inducible ischemia during dobutamine CMR. For a CAC score of less than 101, the NPV and the PPV of the CAC score for the outcome of dobutamine CMR were, respectively, 0.96 and 0.29. In patients with an inconclusive diagnosis of myocardial ischemia a MDCT CAC score of less than 11 reliably rules out myocardial ischemia detected by dobutamine CMR. (orig.)

  14. Reliability and validity of the Children's Fear Survey Schedule-Dental Subscale for Arabic-speaking children: a cross-sectional study.

    Science.gov (United States)

    El-Housseiny, Azza A; Alsadat, Farah A; Alamoudi, Najlaa M; El Derwi, Douaa A; Farsi, Najat M; Attar, Moaz H; Andijani, Basil M

    2016-04-14

    Early recognition of dental fear is essential for the effective delivery of dental care. This study aimed to test the reliability and validity of the Arabic version of the Children's Fear Survey Schedule-Dental Subscale (CFSS-DS). A school-based sample of 1546 children was randomly recruited. The Arabic version of the CFSS-DS was completed by children during class time. The scale was tested for internal consistency and test-retest reliability. To test criterion validity, children's behavior was assessed using the Frankl scale during dental examination, and results were compared with children's CFSS-DS scores. To test the scale's construct validity, scores on "fear of going to the dentist soon" were correlated with CFSS-DS scores. Factor analysis was also used. The Arabic version of the CFSS-DS showed high reliability regarding both test-retest reliability (intraclass correlation = 0.83, p children with negative behavior had significantly higher fear scores (t = 13.67, p fear of invasive dental procedures," "fear of less invasive dental procedures" and "fear of strangers." The Arabic version of the CFSS-DS is a reliable and valid measure of dental fear in Arabic-speaking children. Pediatric dentists and researchers may use this validated version of the CFSS-DS to measure dental fear in Arabic-speaking children.

  15. Validity and reliability of the Fels physical activity questionnaire for children.

    Science.gov (United States)

    Treuth, Margarita S; Hou, Ningqi; Young, Deborah R; Maynard, L Michele

    2005-03-01

    The aim was to evaluate the reliability and validity of the Fels physical activity questionnaire (PAQ) for children 7-19 yr of age. A cross-sectional study was conducted among 130 girls and 99 boys in elementary (N=70), middle (N=81), and high (N=78) schools in rural Maryland. Weight and height were measured on the initial school visit. All the children then wore an Actiwatch accelerometer for 6 d. The Fels PAQ for children was given on two separate occasions to evaluate reliability and was compared with accelerometry data to evaluate validity. The reliability of the Fels PAQ for the girls, boys, and the elementary, middle, and high school age groups range was r=0.48-0.76. For the elementary school children, the correlation coefficient examining validity between the Fels PAQ total score and Actiwatch (counts per minute) was 0.34 (P=0.004). The correlation coefficients were lower in middle school (r=0.11, P=0.31) and high school (r=0.21, P=0.006) adolescents. The sport index of the Fels PAQ for children had the highest validity in the high school participants (r=0.34, P=0.002). The Fels PAQ for children is moderately reliable for all age groups of children. Validity of the Fels PAQ for children is acceptable for elementary and high school students when the total activity score or the sport index is used. The sport index was similar to the total score for elementary students but was a better measure of physical activity among high school students.

  16. Reliability demonstration methodology for products with Gamma Process by optimal accelerated degradation testing

    International Nuclear Information System (INIS)

    Zhang, Chunhua; Lu, Xiang; Tan, Yuanyuan; Wang, Yashun

    2015-01-01

    For products with high reliability and long lifetime, accelerated degradation testing (ADT) may be adopted during product development phase to verify whether its reliability satisfies the predetermined level within feasible test duration. The actual degradation from engineering is usually a strictly monotonic process, such as fatigue crack growth, wear, and erosion. However, the method for reliability demonstration by ADT with monotonic degradation process has not been investigated so far. This paper proposes a reliability demonstration methodology by ADT for this kind of product. We first apply Gamma process to describe the monotonic degradation. Next, we present a reliability demonstration method by converting the required reliability level into allowable cumulative degradation in ADT and comparing the actual accumulative degradation with the allowable level. Further, we suggest an analytical optimal ADT design method for more efficient reliability demonstration by minimizing the asymptotic variance of decision variable in reliability demonstration under the constraints of sample size, test duration, test cost, and predetermined decision risks. The method is validated and illustrated with example on reliability demonstration of alloy product, and is applied to demonstrate the wear reliability within long service duration of spherical plain bearing in the end. - Highlights: • We present a reliability demonstration method by ADT for products with monotonic degradation process, which may be applied to verify reliability with long service life for products with monotonic degradation process within feasible test duration. • We suggest an analytical optimal ADT design method for more efficient reliability demonstration, which differs from the existed optimal ADT design for more accurate reliability estimation by different objective function and different constraints. • The methods are applied to demonstrate the wear reliability within long service duration of

  17. Parthenium dermatitis severity score to assess clinical severity of disease

    Directory of Open Access Journals (Sweden)

    Kaushal K Verma

    2017-01-01

    Full Text Available Background: Parthenium dermatitis is the most common type of airborne contact dermatitis in India. It is a chronic disease of a remitting and relapsing course with significant morbidity and distress, but there is no scoring system to assess its severity. Aim: To design a scoring system for the assessment of clinical severity of disease in Parthenium dermatitis and to use this scoring system in various studies to determine its sensitivity, specificity, and reproducibility. Methods and Results: In our first few studies on Parthenium dermatitis, we designed and used a basic clinical severity scoring system based on itching, morphology of the lesions, and areas involved. However, in subsequent studies, we modified it to the present scoring system as Parthenium dermatitis severity score (PDSS. Our studies showed the high sensitivity of PDSS in characterization of the disease severity at the given point of time, as well as to determine the efficacy of a prescribed treatment modality which was reliable and reproducible. Conclusion: Thus, PDSS may be used by clinicians for appropriate scoring of the clinical severity of Parthenium dermatitis and in monitoring the disease response to therapy.

  18. Reliability analysis of digital based I and C system

    Energy Technology Data Exchange (ETDEWEB)

    Kang, I. S.; Cho, B. S.; Choi, M. J. [KOPEC, Yongin (Korea, Republic of)

    1999-10-01

    Rapidly, digital technology is being widely applied in replacing analog component installed in existing plant and designing new nuclear power plant for control and monitoring system in Korea as well as in foreign countries. Even though many merits of digital technology, it is being faced with a new problem of reliability assurance. The studies for solving this problem are being performed vigorously in foreign countries. The reliability of KNGR Engineered Safety Features Component Control System (ESF-CCS), digital based I and C system, was analyzed to verify fulfillment of the ALWR EPRI-URD requirement for reliability analysis and eliminate hazards in design applied new technology. The qualitative analysis using FMEA and quantitative analysis using reliability block diagram were performed. The results of analyses are shown in this paper.

  19. Travel Time Reliability in Indiana

    OpenAIRE

    Martchouk, Maria; Mannering, Fred L.; Singh, Lakhwinder

    2010-01-01

    Travel time and travel time reliability are important performance measures for assessing traffic condition and extent of congestion on a roadway. This study first uses a floating car technique to assess travel time and travel time reliability on a number of Indiana highways. Then the study goes on to describe the use of Bluetooth technology to collect real travel time data on a freeway and applies it to obtain two weeks of data on Interstate 69 in Indianapolis. An autoregressive model, estima...

  20. The Vocal Cord Dysfunction Questionnaire: Validity and Reliability of the Persian Version.

    Science.gov (United States)

    Ghaemi, Hamide; Khoddami, Seyyedeh Maryam; Soleymani, Zahra; Zandieh, Fariborz; Jalaie, Shohreh; Ahanchian, Hamid; Khadivi, Ehsan

    2017-12-25

    The aim of this study was to develop, validate, and assess the reliability of the Persian version of Vocal Cord Dysfunction Questionnaire (VCDQ P ). The study design was cross-sectional or cultural survey. Forty-four patients with vocal fold dysfunction (VFD) and 40 healthy volunteers were recruited for the study. To assess the content validity, the prefinal questions were given to 15 experts to comment on its essential. Ten patients with VFD rated the importance of VCDQ P in detecting face validity. Eighteen of the patients with VFD completed the VCDQ 1 week later for test-retest reliability. To detect absolute reliability, standard error of measurement and smallest detected change were calculated. Concurrent validity was assessed by completing the Persian Chronic Obstructive Pulmonary Disease (COPD) Assessment Test (CAT) by 34 patients with VFD. Discriminant validity was measured from 34 participants. The VCDQ was further validated by administering the questionnaire to 40 healthy volunteers. Validation of the VCDQ as a treatment outcome tool was conducted in 18 patients with VFD using pre- and posttreatment scores. The internal consistency was confirmed (Cronbach α = 0.78). The test-retest reliability was excellent (intraclass correlation coefficient = 0.97). The standard error of measurement and smallest detected change values were acceptable (0.39 and 1.08, respectively). There was a significant correlation between the VCDQ P and the CAT total scores (P validity was significantly different. The VCDQ scores in patients with VFD before and after treatment was significantly different (P valid and reliable self-administered questionnaire in Persian-speaking population. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  1. Power system reliability analysis using fault trees

    International Nuclear Information System (INIS)

    Volkanovski, A.; Cepin, M.; Mavko, B.

    2006-01-01

    The power system reliability analysis method is developed from the aspect of reliable delivery of electrical energy to customers. The method is developed based on the fault tree analysis, which is widely applied in the Probabilistic Safety Assessment (PSA). The method is adapted for the power system reliability analysis. The method is developed in a way that only the basic reliability parameters of the analysed power system are necessary as an input for the calculation of reliability indices of the system. The modeling and analysis was performed on an example power system consisting of eight substations. The results include the level of reliability of current power system configuration, the combinations of component failures resulting in a failed power delivery to loads, and the importance factors for components and subsystems. (author)

  2. TURKISH VERSION QUALITY OF LIFE IN ESSENTIAL TREMOR QUESTIONNAIRE (QUEST): VALIDITY AND RELIABILITY STUDY.

    Science.gov (United States)

    Güler, Sibel; Turan, F Nesrin

    2015-09-30

    Our aim was to translate the Quality of Life in Essential Tremor Questionnaire (QUEST) advanced by Troster (2005) and to analyse the validity and reliability of this questionnaire. Two hundred twelve consecutive patients with essential tremor (ET) and forty-three control subjects were included in the study. Permission for the translation and validation of the QUEST scale was obtained. The translation was performed according to the guidelines provided by the publisher. After the translation, the final version of the scale was administered to both groups to determine its reliability and validity. The QUEST Physical, Psychosocial, communication, Hobbies/leisure and Work/finance scores were 0.967, 0.968, 0.933, 0.964 and 0.925, respectively. There were good correlations between each of the QUEST scores that were indicative of good internal consistency. Additionally, we observed that all of the QUEST scores were most strongly related to the right and left arms (p=0.0001). However, we observed that all of the QUEST scores were weakly related to the voice, head and right leg (p=0.0001). These findings support the notion that the Turkish version of the Quality of Life in Essential Tremor (QUEST) questionnaire is a valid and reliable tool for the assessment of the quality of life of patients with ET.

  3. Accuracy and Efficiency of Recording Pediatric Early Warning Scores Using an Electronic Physiological Surveillance System Compared With Traditional Paper-Based Documentation.

    Science.gov (United States)

    Sefton, Gerri; Lane, Steven; Killen, Roger; Black, Stuart; Lyon, Max; Ampah, Pearl; Sproule, Cathryn; Loren-Gosling, Dominic; Richards, Caitlin; Spinty, Jean; Holloway, Colette; Davies, Coral; Wilson, April; Chean, Chung Shen; Carter, Bernie; Carrol, E D

    2017-05-01

    Pediatric Early Warning Scores are advocated to assist health professionals to identify early signs of serious illness or deterioration in hospitalized children. Scores are derived from the weighting applied to recorded vital signs and clinical observations reflecting deviation from a predetermined "norm." Higher aggregate scores trigger an escalation in care aimed at preventing critical deterioration. Process errors made while recording these data, including plotting or calculation errors, have the potential to impede the reliability of the score. To test this hypothesis, we conducted a controlled study of documentation using five clinical vignettes. We measured the accuracy of vital sign recording, score calculation, and time taken to complete documentation using a handheld electronic physiological surveillance system, VitalPAC Pediatric, compared with traditional paper-based charts. We explored the user acceptability of both methods using a Web-based survey. Twenty-three staff participated in the controlled study. The electronic physiological surveillance system improved the accuracy of vital sign recording, 98.5% versus 85.6%, P < .02, Pediatric Early Warning Score calculation, 94.6% versus 55.7%, P < .02, and saved time, 68 versus 98 seconds, compared with paper-based documentation, P < .002. Twenty-nine staff completed the Web-based survey. They perceived that the electronic physiological surveillance system offered safety benefits by reducing human error while providing instant visibility of recorded data to the entire clinical team.

  4. Expanding Reliability Generalization Methods with KR-21 Estimates: An RG Study of the Coopersmith Self-Esteem Inventory.

    Science.gov (United States)

    Lane, Ginny G.; White, Amy E.; Henson, Robin K.

    2002-01-01

    Conducted a reliability generalizability study on the Coopersmith Self-Esteem Inventory (CSEI; S. Coopersmith, 1967) to examine the variability of reliability estimates across studies and to identify study characteristics that may predict this variability. Results show that reliability for CSEI scores can vary considerably, especially at the…

  5. Current Human Reliability Analysis Methods Applied to Computerized Procedures

    Energy Technology Data Exchange (ETDEWEB)

    Ronald L. Boring

    2012-06-01

    Computerized procedures (CPs) are an emerging technology within nuclear power plant control rooms. While CPs have been implemented internationally in advanced control rooms, to date no US nuclear power plant has implemented CPs in its main control room (Fink et al., 2009). Yet, CPs are a reality of new plant builds and are an area of considerable interest to existing plants, which see advantages in terms of enhanced ease of use and easier records management by omitting the need for updating hardcopy procedures. The overall intent of this paper is to provide a characterization of human reliability analysis (HRA) issues for computerized procedures. It is beyond the scope of this document to propose a new HRA approach or to recommend specific methods or refinements to those methods. Rather, this paper serves as a review of current HRA as it may be used for the analysis and review of computerized procedures.

  6. Investigation of reliability, validity and normality Persian version of the California Critical Thinking Skills Test; Form B (CCTST

    Directory of Open Access Journals (Sweden)

    Khallli H

    2003-04-01

    Full Text Available Background: To evaluate the effectiveness of the present educational programs in terms of students' achieving problem solving, decision making and critical thinking skills, reliable, valid and standard instrument are needed. Purposes: To Investigate the Reliability, validity and Norm of CCTST Form.B .The California Critical Thinking Skills Test contain 34 multi-choice questions with a correct answer in the jive Critical Thinking (CT cognitive skills domain. Methods: The translated CCTST Form.B were given t0405 BSN nursing students ojNursing Faculties located in Tehran (Tehran, Iran and Shahid Beheshti Universitiesthat were selected in the through random sampling. In order to determine the face and content validity the test was translated and edited by Persian and English language professor and researchers. it was also confirmed by judgments of a panel of medical education experts and psychology professor's. CCTST reliability was determined with internal consistency and use of KR-20. The construct validity of the test was investigated with factor analysis and internal consistency and group difference. Results: The test coefficien for reliablity was 0.62. Factor Analysis indicated that CCTST has been formed from 5 factor (element namely: Analysis, Evaluation, lriference, Inductive and Deductive Reasoning. Internal consistency method shows that All subscales have been high and positive correlation with total test score. Group difference method between nursing and philosophy students (n=50 indicated that there is meaningfUl difference between nursing and philosophy students scores (t=-4.95,p=0.OOO1. Scores percentile norm also show that percentile offifty scores related to 11 raw score and 95, 5 percentiles are related to 17 and 6 raw score ordinary. Conclusions: The Results revealed that the questions test is sufficiently reliable as a research tool, and all subscales measure a single construct (Critical Thinking and are able to distinguished the

  7. Extended score interval in the assessment of basic surgical skills.

    Science.gov (United States)

    Acosta, Stefan; Sevonius, Dan; Beckman, Anders

    2015-01-01

    The Basic Surgical Skills course uses an assessment score interval of 0-3. An extended score interval, 1-6, was proposed by the Swedish steering committee of the course. The aim of this study was to analyze the trainee scores in the current 0-3 scored version compared to a proposed 1-6 scored version. Sixteen participants, seven females and nine males, were evaluated in the current and proposed assessment forms by instructors, observers, and learners themselves during the first and second day. In each assessment form, 17 tasks were assessed. The inter-rater reliability between the current and the proposed score sheets were evaluated with intraclass correlation (ICC) with 95% confidence intervals (CI). The distribution of scores for 'knot tying' at the last time point and 'bowel anastomosis side to side' given by the instructors in the current assessment form showed that the highest score was given in 31 and 62%, respectively. No ceiling effects were found in the proposed assessment form. The overall ICC between the current and proposed score sheets after assessment by the instructors increased from 0.38 (95% CI 0.77-0.78) on Day 1 to 0.83 (95% CI 0.51-0.94) on Day 2. A clear ceiling effect of scores was demonstrated in the current assessment form, questioning its validity. The proposed score sheet provides more accurate scores and seems to be a better feedback instrument for learning technical surgical skills in the Basic Surgical Skills course.

  8. Reliability and validity of television food advertising questionnaire in Malaysia.

    Science.gov (United States)

    Zalma, Abdul Razak; Safiah, Md Yusof; Ajau, Danis; Khairil Anuar, Md Isa

    2015-09-01

    Interventions to counter the influence of television food advertising amongst children are important. Thus, reliable and valid instrument to assess its effect is needed. The objective of this study was to determine the reliability and validity of such a questionnaire. The questionnaire was administered twice on 32 primary schoolchildren aged 10-11 years in Selangor, Malaysia. The interval between the first and second administration was 2 weeks. Test-retest method was used to examine the reliability of the questionnaire. Intra-rater reliability was determined by kappa coefficient and internal consistency by Cronbach's alpha coefficient. Construct validity was evaluated using factor analysis. The test-retest correlation showed moderate-to-high reliability for all scores (r = 0.40*, p = 0.02 to r = 0.95**, p = 0.00), with one exception, consumption of fast foods (r = 0.24, p = 0.20). Kappa coefficient showed acceptable-to-strong intra-rater reliability (K = 0.40-0.92), except for two items under knowledge on television food advertising (K = 0.26 and K = 0.21) and one item under preference for healthier foods (K = 0.33). Cronbach's alpha coefficient indicated acceptable internal consistency for all scores (0.45-0.60). After deleting two items under Consumption of Commonly Advertised Food, the items showed moderate-to-high loading (0.52, 0.84, 0.42 and 0.42) with the Scree plot showing that there was only one factor. The Kaiser-Meyer-Olkin was 0.60, showing that the sample was adequate for factor analysis. The questionnaire on television food advertising is reliable and valid to assess the effect of media literacy education on television food advertising on schoolchildren. © The Author (2013). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. Manual and automatic locomotion scoring systems in dairy cows: A review

    NARCIS (Netherlands)

    Schlageter-Tello, A.; Bokkers, E.A.M.; Groot Koerkamp, P.W.G.; Hertem, van T.; Viazzi, S.; Romanini Bites, E.; Halachmi, I.; Bahr, C.; Berckmans, D.; Lokhorst, K.

    2014-01-01

    The objective of this review was to describe, compare and evaluate agreement, reliability, and validity of manual and automatic locomotion scoring systems (MLSSs and ALSSs, respectively) used in dairy cattle lameness research. There are many different types of MLSSs and ALSSs. Twenty-five MLSSs were

  10. [Evaluation on the validity and reliability of the Diabetes Self-management Knowledge, Attitude, and Behavior Assessment Scale (DSKAB)].

    Science.gov (United States)

    Liu, Xiaoli; Dai, Long; Chen, Bo; Feng, Nongping; Wu, Qianhui; Lin, Yonghai; Zhang, Lan; Tan, Dong; Zhang, Jinhua; Tu, Huijuan; Li, Changfeng; Wang, Wenjuan

    2016-01-01

    To evaluate the validity and reliability of Diabetes Self-management Knowledge, Attitude, and Behavior Assessment Scale (DSKAB). We selected 460 patients with diabetes in the community, used the scale which was after two rounds of the Delphi method and pilot study. Investigators surveyed the patients by the way of face to face. by draw lots, we selected 25 community diabetes randomly for repeating investigations after one week. The validity analyses included face validity, content validity, construct validity and discriminant validity. The reliability analyses included Cronbach's α coefficient, θ coefficient, Ω coefficient, split-half reliability and test-retest reliability. This study distributed a total of 460 questionnaires, reclaimed 442, qualified 432. The score of the scale was 254.59 ± 28.90, the scores of the knowledge, attitude, behavior sub-scales were 82.44 ± 11.24, 63.53 ± 5.77 and 108.61 ± 17.55, respectively. It had excellent face validity and content validity. The correlation coefficient was from 0.71 to 0.91 among three sub-scales and the scale, Pvalidity. The scores of high group and low group in three sub-scales were: knowledge (91.12 ± 3.62) and (69.96 ± 11.20), attitude (68.75 ± 4.51) and (58.79 ± 4.87), behavior (129.38 ± 8.53) and (89.65 ± 11.34),mean scores of three sub-scales were apparently different, which compared between high score group and low score group, the t value were - 19.45, -16.24 and -30.29, respectively, Pvalidity. The Cronbach's α coefficient of the scale and three sub-scales was from 0.79 to 0.93, the θ coefficient was from 0.86 to 0.95, the Ω coefficient was from 0.90 to 0.98, split-half reliability was from 0.89 to 0.95.Test-retest reliability of the scale was 0.51;the three sub-scales was from 0.46 to 0.52, Pvalidity and reliability of the Diabetes Self-management Knowledge, Attitude, and Behavior Assessment Scale are excellent, which is a suitable instrument to evaluate the self-management for patients

  11. Development and inter-rater reliability of a standardized verbal instruction manual for the Chinese Geriatric Depression Scale-short form.

    Science.gov (United States)

    Wong, M T P; Ho, T P; Ho, M Y; Yu, C S; Wong, Y H; Lee, S Y

    2002-05-01

    The Geriatric Depression Scale (GDS) is a common screening tool for elderly depression in Hong Kong. This study aimed at (1) developing a standardized manual for the verbal administration and scoring of the GDS-SF, and (2) comparing the inter-rater reliability between the standardized and non-standardized verbal administration of GDS-SF. Two studies were reported. In Study 1, the process of developing the manual was described. In Study 2, we compared the inter-rater reliabilities of GDS-SF scores using the standardized verbal instructions and the traditional non-standardized administration. Results of Study 2 indicated that the standardized procedure in verbal administration and scoring improved the inter-rater reliabilities of GDS-SF. Copyright 2002 John Wiley & Sons, Ltd.

  12. Reliability of third molar development for age estimation in Gujarati population: A comparative study.

    Science.gov (United States)

    Gandhi, Neha; Jain, Sandeep; Kumar, Manish; Rupakar, Pratik; Choyal, Kanaram; Prajapati, Seema

    2015-01-01

    Age assessment may be a crucial step in postmortem profiling leading to confirmative identification. In children, Demirjian's method based on eight developmental stages was developed to determine maturity scores as a function of age and polynomial functions to determine age as a function of score. Of this study was to evaluate the reliability of age estimation using Demirjian's eight teeth method following the French maturity scores and Indian-specific formula from developmental stages of third molar with the help of orthopantomograms using the Demirjian method. Dental panoramic tomograms from 30 subjects each of known chronological age and sex were collected and were evaluated according to Demirjian's criteria. Age calculations were performed using Demirjian's formula and Indian formula. Statistical analysis used was Chi-square test and ANOVA test and the P values obtained were statistically significant. There was an average underestimation of age with both Indian and Demirjian's formulas. The mean absolute error was lower using Indian formula hence it can be applied for age estimation in present Gujarati population. Also, females were ahead of achieving dental maturity than males thus completion of dental development is attained earlier in females. Greater accuracy can be obtained if population-specific formulas considering the ethnic and environmental variation are derived performing the regression analysis.

  13. Modified Ashworth scale and spasm frequency score in spinal cord injury

    DEFF Research Database (Denmark)

    Baunsgaard, C. B.; Nissen, U. V.; Christensen, K. B.

    2016-01-01

    .94 and inter-rater κweighted=0.93. Correlation between MAS and SFS showed non-significant correlation coefficients from-0.11 to 0.90. CONCLUSION: Reliability of MAS is highly affected by the weighting scheme. With a weighted-κ it was overall reliable and simple-κ overall unreliability. Repeated tests should......STUDY DESIGN: Intra- and inter-rater reliability study. OBJECTIVES: To assess intra- and inter-rater reliability of the Modified Ashworth Scale (MAS) and Spasm Frequency Score (SFS) in lower extremities in a population of spinal cord-injured persons, as well as correlations between the two scales....... SETTING: Clinic for Spinal Cord Injuries, Rigshospitalet, Hornbaek, Denmark. METHODS: Thirty-one persons participated in the study and were tested four times in total with MAS and SFS by three experienced raters. Cohen's kappa (κ), simple and quadratic weighted (nominal and ordinal scale level...

  14. Intra- and interobserver reliability of gray scale/dynamic range evaluation of ultrasonography using a standardized phantom

    International Nuclear Information System (INIS)

    Lee, Song; Choi, Joon Il; Park, Michael Yong; Yeo, Dong Myung; Byun, Jae Young; Jung, Seung Eun; Rha, Sung Eun; Oh, Soon Nam; Lee, Young Joon

    2014-01-01

    To evaluate intra- and interobserver reliability of the gray scale/dynamic range of the phantom image evaluation of ultrasonography using a standardized phantom, and to assess the effect of interactive education on the reliability. Three radiologists (a resident, and two board-certified radiologists with 2 and 7 years of experience in evaluating ultrasound phantom images) performed the gray scale/dynamic range test for an ultrasound machine using a standardized phantom. They scored the number of visible cylindrical structures of varying degrees of brightness and made a pass or fail decision. First, they scored 49 phantom images twice from a 2010 survey with limited knowledge of phantom images. After this, the radiologists underwent two hours of interactive education for the phantom images and scored another 91 phantom images from a 2011 survey twice. Intra- and interobserver reliability before and after the interactive education session were analyzed using K analyses. Before education, the K-value for intraobserver reliability for the radiologist with 7 years of experience, 2 years of experience, and the resident was 0.386, 0.469, and 0.465, respectively. After education, the K-values were improved (0.823, 0.611, and 0.711, respectively). For interobserver reliability, the K-value was also better after the education for the 3 participants (0.067, 0.002, and 0.547 before education; 0.635, 0.667, and 0.616 after education, respectively). The intra- and interobserver reliability of the gray scale/dynamic range was fair to substantial. Interactive education can improve reliability. For more reliable results, double- checking of phantom images by multiple reviewers is recommended.

  15. Acute imaging does not improve ASTRAL score's accuracy despite having a prognostic value.

    OpenAIRE

    Ntaios, G.; Papavasileiou, V.; Faouzi, M.; Vanacker, P.; Wintermark, M.; Michel, P.

    2014-01-01

    BACKGROUND: The ASTRAL score was recently shown to reliably predict three-month functional outcome in patients with acute ischemic stroke. AIM: The study aims to investigate whether information from multimodal imaging increases ASTRAL score's accuracy. METHODS: All patients registered in the ASTRAL registry until March 2011 were included. In multivariate logistic-regression analyses, we added covariates derived from parenchymal, vascular, and perfusion imaging to the 6-parameter model o...

  16. Validity and Reliability of the Upper Extremity Work Demands Scale.

    Science.gov (United States)

    Jacobs, Nora W; Berduszek, Redmar J; Dijkstra, Pieter U; van der Sluis, Corry K

    2017-12-01

    Purpose To evaluate validity and reliability of the upper extremity work demands (UEWD) scale. Methods Participants from different levels of physical work demands, based on the Dictionary of Occupational Titles categories, were included. A historical database of 74 workers was added for factor analysis. Criterion validity was evaluated by comparing observed and self-reported UEWD scores. To assess structural validity, a factor analysis was executed. For reliability, the difference between two self-reported UEWD scores, the smallest detectable change (SDC), test-retest reliability and internal consistency were determined. Results Fifty-four participants were observed at work and 51 of them filled in the UEWD twice with a mean interval of 16.6 days (SD 3.3, range = 10-25 days). Criterion validity of the UEWD scale was moderate (r = .44, p = .001). Factor analysis revealed that 'force and posture' and 'repetition' subscales could be distinguished with Cronbach's alpha of .79 and .84, respectively. Reliability was good; there was no significant difference between repeated measurements. An SDC of 5.0 was found. Test-retest reliability was good (intraclass correlation coefficient for agreement = .84) and all item-total correlations were >.30. There were two pairs of highly related items. Conclusion Reliability of the UEWD scale was good, but criterion validity was moderate. Based on current results, a modified UEWD scale (2 items removed, 1 item reworded, divided into 2 subscales) was proposed. Since observation appeared to be an inappropriate gold standard, we advise to investigate other types of validity, such as construct validity, in further research.

  17. Beyond Statistics: The Economic Content of Risk Scores

    Science.gov (United States)

    Einav, Liran; Finkelstein, Amy; Kluender, Raymond

    2016-01-01

    “Big data” and statistical techniques to score potential transactions have transformed insurance and credit markets. In this paper, we observe that these widely-used statistical scores summarize a much richer heterogeneity, and may be endogenous to the context in which they get applied. We demonstrate this point empirically using data from Medicare Part D, showing that risk scores confound underlying health and endogenous spending response to insurance. We then illustrate theoretically that when individuals have heterogeneous behavioral responses to contracts, strategic incentives for cream skimming can still exist, even in the presence of “perfect” risk scoring under a given contract. PMID:27429712

  18. Student-Centered Reliability, Concurrent Validity and Instructional Sensitivity in Scoring of Students' Concept Maps in a University Science Laboratory

    Science.gov (United States)

    Kaya, Osman Nafiz; Kilic, Ziya

    2004-01-01

    Student-centered approach of scoring the concept maps consisted of three elements namely symbol system, individual portfolio and scoring scheme. We scored student-constructed concept maps based on 5 concept map criteria: validity of concepts, adequacy of propositions, significance of cross-links, relevancy of examples, and interconnectedness. With…

  19. Reliability, Validity, and Minimal Detectable Change of Balance Evaluation Systems Test and Its Short Versions in Older Cancer Survivors: A Pilot Study.

    Science.gov (United States)

    Huang, Min H; Miller, Kara; Smith, Kristin; Fredrickson, Kayle; Shilling, Tracy

    2016-01-01

    .86-2.47 points), and MDC (2.39-6.86 points). The Bland-Altman plot revealed no systematic errors. The scores of BESTest, Mini-BEST, and Brief-BEST were correlated significantly with those of ABC Scale (P test-retest reliability, and excellent concurrent validity with the ABC Scale for community-dwelling cancer survivors aged 55 years and older who had completed cancer treatments for at least 3 months. Future studies are necessary to determine the predictive values for determining fall risks using balance assessment tools in older cancer survivors. Clinicians can utilize the BESTest and its short versions to evaluate balance problems in community-dwelling older cancer survivors and apply the established MDC to assess the intervention outcomes.

  20. The OMERACT-RAMRIS Rheumatoid Arthritis Magnetic Resonance Imaging Joint Space Narrowing Score

    DEFF Research Database (Denmark)

    Møller Døhn, Uffe; Conaghan, Philip G; Eshed, Iris

    2014-01-01

    To test the intrareader and interreader reliability of assessment of joint space narrowing (JSN) in rheumatoid arthritis (RA) wrist and metacarpophalangeal (MCP) joints on magnetic resonance imaging (MRI) and computed tomography (CT) using the newly proposed OMERACT-RAMRIS JSN scoring method...