WorldWideScience

Sample records for validity interrater reliability

  1. Quantitative measurement of hypertrophic scar: interrater reliability and concurrent validity.

    Science.gov (United States)

    Nedelec, Bernadette; Correa, José A; Rachelska, Grazyna; Armour, Alexis; LaSalle, Léo

    2008-01-01

    Research into the pathophysiology and treatment of hypertrophic scar (HSc) remains limited by the heterogeneity of scar and the imprecision with which its severity is measured. The objective of this study was to test the interrater reliability and concurrent validity of the Cutometer measurement of elasticity, the Mexameter measurement of erythema and pigmentation, and total thickness measure of the DermaScan C relative to the modified Vancouver Scar Scale (mVSS) in patient-matched normal skin, normal scar, and HSc. Three independent investigators evaluated 128 sites (severe HSc, moderate or mild HSc, donor site, and normal skin) on 32 burn survivors using all of the above measurement tools. The intraclass correlation coefficient, which was used to measure interrater reliability, reflects the inherent amount of error in the measure and is considered acceptable when it is >0.75. Interrater reliability of the totals of the height, pliability, and vascularity subscales of the mVSS fell below the acceptable limit ( congruent with0.50). The individual subscales of the mVSS fell well below the acceptable level (0.89) for each study site with the exception of severe scar. Mexameter and DermaScan C reliability measurements were acceptable for all sites (>0.82). Concurrent validity correlations with the mVSS were significant except for the comparison of the mVSS pliability subscale and the Cutometer maximum deformation measure comparison in severe scar. In conclusion, the Mexameter and DermaScan C measurements of scar color and thickness of all sites, as well as the Cutometer measurement of elasticity in all but the most severe scars shows high interrater reliability. Their significant concurrent validity with the mVSS confirms that these tools are measuring the same traits as the mVSS, and in a more objective way.

  2. The PRECIS-2 tool has good interrater reliability and modest discriminant validity.

    Science.gov (United States)

    Loudon, Kirsty; Zwarenstein, Merrick; Sullivan, Frank M; Donnan, Peter T; Gágyor, Ildikó; Hobbelen, Hans J S M; Althabe, Fernando; Krishnan, Jerry A; Treweek, Shaun

    2017-08-01

    PRagmatic Explanatory Continuum Indicator Summary (PRECIS)-2 is a tool that could improve design insight for trialists. Our aim was to validate the PRECIS-2 tool, unlike its predecessor, testing the discriminant validity and interrater reliability. Over 80 international trialists, methodologists, clinicians, and policymakers created PRECIS-2 helping to ensure face validity and content validity. The interrater reliability of PRECIS-2 was measured using 19 experienced trialists who used PRECIS-2 to score a diverse sample of 15 randomized controlled trial protocols. Discriminant validity was tested with two raters to independently determine if the trial protocols were more pragmatic or more explanatory, with scores from the 19 raters for the 15 trials as predictors of pragmatism. Interrater reliability was generally good, with seven of nine domains having an intraclass correlation coefficient over 0.65. Flexibility (adherence) and recruitment had wide confidence intervals, but raters found these difficult to rate and wanted more information. Each of the nine PRECIS-2 domains could be used to differentiate between trials taking more pragmatic or more explanatory approaches with better than chance discrimination for all domains. We have assessed the validity and reliability of PRECIS-2. An elaboration study and web site provide guidance to help future users of the tool which is continuing to be tested by trial teams, systematic reviewers, and funders. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Clinical Functional Capacity Testing in Patients With Facioscapulohumeral Muscular Dystrophy: Construct Validity and Interrater Reliability of Antigravity Tests

    NARCIS (Netherlands)

    Rijken, N.H.M.; Engelen, B.G.M. van; Weerdesteyn, V.G.M.; Geurts, A.C.H.

    2015-01-01

    OBJECTIVE: To evaluate the construct validity and interrater reliability of 4 simple antigravity tests in a small group of patients with facioscapulohumeral muscular dystrophy (FSHD). DESIGN: Case-control study. SETTING: University medical center. PARTICIPANTS: Patients with various severity levels

  4. A pediatric FOUR score coma scale: interrater reliability and predictive validity.

    Science.gov (United States)

    Czaikowski, Brianna L; Liang, Hong; Stewart, C Todd

    2014-04-01

    The Full Outline of UnResponsiveness (FOUR) Score is a coma scale that consists of four components (eye and motor response, brainstem reflexes, and respiration). It was originally validated among the adult population and recently in a pediatric population. To enhance clinical assessment of pediatric intensive care unit patients, including those intubated and/or sedated, at our children's hospital, we modified the FOUR Score Scale for this population. This modified scale would provide many of the same advantages as the original, such as interrater reliability, simplicity, and elimination of the verbal component that is not compatible with the Glasgow Coma Scale (GCS), creating a more valuable neurological assessment tool for the nursing community. Our goal was to potentially provide greater information than the formally used GCS when assessing critically ill, neurologically impaired patients, including those sedated and/or intubated. Experienced pediatric intensive care unit nurses were trained as "expert raters." Two different nurses assessed each subject using the Pediatric FOUR Score Scale (PFSS), GCS, and Richmond Agitation Sedation Scale at three different time points. Data were compared with the Pediatric Cerebral Performance Category (PCPC) assessed by another nurse. Our hypothesis was that the PFSS and PCPC should highly correlate and the GCS and PCPC should correlate lower. Study results show that the PFSS is excellent for interrater reliability for trained nurse-rater pairs and prediction of poor outcome and in-hospital mortality, under various situations, but there were no statistically significant differences between the PFSS and the GCS. However, the PFSS does have the potential to provide greater neurological assessment in the intubated and/or sedated patient based on the outcomes of our study.

  5. Validation and inter-rater reliability of a three item falls risk screening tool

    Directory of Open Access Journals (Sweden)

    Catherine Maree Said

    2017-11-01

    Full Text Available Abstract Background Falls screening tools are routinely used in hospital settings and the psychometric properties of tools should be examined in the setting in which they are used. The aim of this study was to explore the concurrent and predictive validity of the Austin Health Falls Risk Screening Tool (AHFRST, compared with The Northern Hospital Modified St Thomas’s Risk Assessment Tool (TNH-STRATIFY, and the inter-rater reliability of the AHFRST. Methods A research physiotherapist used the AHFRST and TNH-STRATIFY to classify 130 participants admitted to Austin Health (five acute wards, n = 115 two subacute wards n = 15; median length of stay 6 days IQR 3–12 as ‘High’ or ‘Low’ falls risk. The AHFRST was also completed by nursing staff on patient admission. Falls data was collected from the hospital incident reporting system. Results Six falls occurred during the study period (fall rate of 4.6 falls per 1000 bed days. There was substantial agreement between the AHFRST and the TNH-STRATIFY (Kappa = 0.68, 95% CI 0.52–0.78. Both tools had poor predictive validity, with low specificity (AHFRST 46.0%, 95% CI 37.0–55.1; TNH-STRATIFY 34.7%, 95% CI 26.4–43.7 and positive predictive values (AHFRST 5.6%, 95% CI 1.6–13.8; TNH-STRATIFY 6.9%, 95% CI 2.6–14.4. The AHFRST showed moderate inter-rater reliability (Kappa = 0.54, 95% CI = 0.36–0.67, p < 0.001 although 18 patients did not have the AHFRST completed by nursing staff. Conclusions There was an acceptable level of agreement between the 3 item AHFRST classification of falls risk and the longer, 9 item TNH-STRATIFY classification. However, both tools demonstrated limited predictive validity in the Austin Health population. The results highlight the importance of evaluating the validity of falls screening tools, and the clinical utility of these tools should be reconsidered.

  6. Nurses assessing pain with the Nociception Coma Scale: interrater reliability and validity

    NARCIS (Netherlands)

    Vink, Peter; Eskes, Anne Maria; Lindeboom, Robert; van den Munckhof, Pepijn; Vermeulen, Hester

    2014-01-01

    The Nociception Coma Scale (NCS) is a pain observation tool, developed for patients with disorders of consciousness (DOC) due to acquired brain injury (ABI). The aim of this study was to assess the interrater reliability of the NCS and NCS-R among nurses for the assessment of pain in ABI patients

  7. Clinical Functional Capacity Testing in Patients With Facioscapulohumeral Muscular Dystrophy: Construct Validity and Interrater Reliability of Antigravity Tests.

    Science.gov (United States)

    Rijken, Noortje H; van Engelen, Baziel G; Weerdesteyn, Vivian; Geurts, Alexander C

    2015-12-01

    To evaluate the construct validity and interrater reliability of 4 simple antigravity tests in a small group of patients with facioscapulohumeral muscular dystrophy (FSHD). Case-control study. University medical center. Patients with various severity levels of FSHD (n=9) and healthy control subjects (n=10) were included (N=19). Not applicable. A 4-point ordinal scale was designed to grade performance on the following 4 antigravity tests: sit to stance, stance to sit, step up, and step down. In addition, the 6-minute walk test, 10-m walking test, Berg Balance Scale, and timed Up and Go test were administered as conventional tests. Construct validity was determined by linear regression analysis using the Clinical Severity Score (CSS) as the dependent variable. Interrater agreement was tested using a κ analysis. Patients with FSHD performed worse on all 4 antigravity tests compared with the controls. Stronger correlations were found within than between test categories (antigravity vs conventional). The antigravity tests revealed the highest explained variance with regard to the CSS (R(2)=.86, P=.014). Interrater agreement was generally good. The results of this exploratory study support the construct validity and interrater reliability of the proposed antigravity tests for the assessment of functional capacity in patients with FSHD taking into account the use of compensatory strategies. Future research should further validate these results in a larger sample of patients with FSHD. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  8. Hypsarrhythmia assessment exhibits poor interrater reliability: a threat to clinical trial validity.

    Science.gov (United States)

    Hussain, Shaun A; Kwong, Grace; Millichap, John J; Mytinger, John R; Ryan, Nicole; Matsumoto, Joyce H; Wu, Joyce Y; Lerner, Jason T; Sankar, Raman

    2015-01-01

    Hypsarrhythmia is the classic interictal electroencephalographic pattern associated with infantile spasms, and characterized by high voltage, disorganization, and multifocal independent epileptiform discharges. Given this seemingly simple definition, one might expect excellent interrater reliability (IRR) in the identification of this pattern. Alternatively, it may be argued that assessments of voltage and disorganization are fairly subjective, and thus quite challenging in borderline cases. We sought to test the IRR of hypsarrhythmia assessment in a systematic fashion. Six blinded pediatric electroencephalographers from four centers reviewed 22 electroencephalography (EEG) samples from patients with infantile spasms. Each sample was 5 min in duration and included only wakefulness. Raters determined if each EEG was abnormal and if hypsarrhythmia was present/absent, and characterized relevant features: voltage, organization, epileptiform discharges, slowing, interictal attenuations, symmetry, and synchrony. In addition, raters indicated their level of confidence for each assessment. Multirater kappa statistics (κ) were calculated for the assessment of hypsarrhythmia and each feature. Although IRR was favorable in determining whether a study was normal or abnormal (κ=0.89), reliability was unfavorable for assessment of hypsarrhythmia (κ=0.40), modified hypsarrhythmia (κ=0.47), high voltage (κ=0.37), disorganization (κ=0.22), multifocal epileptiform discharges (κ=0.68), interictal voltage attenuations (κ=0.21), slowing (κ=0.20), asymmetry (κ=0.26), and asynchrony (κ=0.08). Despite generally unsatisfactory interrater agreement, raters consistently reported high confidence in assessments. This study contradicts the view that hypsarrhythmia assessment is straightforward. Even small variability in the identification of hypsarrhythmia has potentially deleterious consequences for clinical care, as its presence or absence impacts decisions to pursue high-risk and

  9. Validity and inter-rater reliability of medio-lateral knee motion observed during a single-limb mini squat

    DEFF Research Database (Denmark)

    Ageberg, Eva; Bennell, Kim L; Hunt, Michael A

    2010-01-01

    Muscle function may influence the risk of knee injury and outcomes following injury. Clinical tests, such as a single-limb mini squat, resemble conditions of daily life and are easy to administer. Fewer squats per 30 seconds indicate poorer function. However, the quality of movement, such as the ......, such as the medio-lateral knee motion may also be important. The aim was to validate an observational clinical test of assessing the medio-lateral knee motion, using a three-dimensional (3-D) motion analysis system. In addition, the inter-rater reliability was evaluated....

  10. Inter-Rater Reliability and Validity of the Australian Football League’s Kicking and Handball Tests

    Science.gov (United States)

    Cripps, Ashley J.; Hopper, Luke S.; Joyce, Christopher

    2015-01-01

    Talent identification tests used at the Australian Football League’s National Draft Combine assess the capacities of athletes to compete at a professional level. Tests created for the National Draft Combine are also commonly used for talent identification and athlete development in development pathways. The skills tests created by the Australian Football League required players to either handball (striking the ball with the hand) or kick to a series of 6 randomly generated targets. Assessors subjectively rate each skill execution giving a 0-5 score for each disposal. This study aimed to investigate the inter-rater reliability and validity of the skills tests at an adolescent sub-elite level. Male Australian footballers were recruited from sub-elite adolescent teams (n = 121, age = 15.7 ± 0.3 years, height = 1.77 ± 0.07 m, mass = 69.17 ± 8.08 kg). The coaches (n = 7) of each team were also recruited. Inter-rater reliability was assessed using Inter-class correlations (ICC) and Limits of Agreement statistics. Both the kicking (ICC = 0.96, p handball tests (ICC = 0.89, p handball test. Key points The skill tests created by the AFL demonstrated acceptable levels of relative and absolute inter-rater reliability. Both the AFL’s skills tests are able to differentiate between athletes dominant and non-dominant limbs. However, only the kicking test could consistently differentiated between score outcomes over a range of Australian Football specific disposal distances. Both tests demonstrated poor concurrent validity, with no correlation found between coaches’ perceptions of technical skills and actual skill outcomes measured. PMID:26336356

  11. Face validity and inter-rater reliability of the Danish version of the modified-Yale Preoperative Anxiety Scale

    DEFF Research Database (Denmark)

    Skovby, Pernille; Rask, Charlotte Ulrikka; Dall, Rolf

    2014-01-01

    -YPAS to Danish cultural and linguistic conditions and to test face validity and inter-reliability in a clinical setting. Materials and methods The translation was performed in accordance with WHO guidelines. Face validity as well as linguistic difficulties of the Danish version was tested and solved in a focus...... of the m-YPAS as suitable and relevant, i.e. the face validity satisfactory. Inter-rater reliability analysis revealed that inter-observer agreement at induction 1 were good to very good (kw: 0.63–0.98) and at induction 2, the agreement was good to very good (kw: 0.72–0.96). ICC for the overall weighted...... anxiety score was in: induction 1:0.92 and induction 2: 0.92 Conclusion Standardized and validated assessment tools are needed to evaluate interventions aiming to reduce preoperative anxiety in children. The Danish m-YPAS had a satisfactory face validity and inter-reliability, based on a minor empirical...

  12. High inter-rater reliability, agreement, and convergent validity of Constant score in patients with clavicle fractures

    DEFF Research Database (Denmark)

    Ban, Ilija; Troelsen, Anders; Kristensen, Morten Tange

    2016-01-01

    BACKGROUND: The Constant score (CS) has been the primary endpoint in most studies on clavicle fractures. However, the CS was not developed to assess patients with clavicle fractures. Our aim was to examine inter-rater reliability and agreement of the CS in patients with clavicle fractures...... standardized CS assessment at a mean of 6.8 weeks (SD, 1.0 weeks) after injury. Reliability and agreement of the CS were determined by 2 raters. The interclass correlation coefficient (ICC2,1), standard error of measurement, minimal detectable change, Cronbach α coefficient, and Pearson correlation coefficient...... were estimated. RESULTS: Inter-rater reliability of the total CS was excellent (interclass correlation coefficient, 0.94; 95% confidence interval, 0.88-0.97), with no systematic difference between the 2 raters (P = .75). The standard error of measurement (measurement error at the group level) was 4...

  13. BurnCase 3D software validation study: Burn size measurement accuracy and inter-rater reliability.

    Science.gov (United States)

    Parvizi, Daryousch; Giretzlehner, Michael; Wurzer, Paul; Klein, Limor Dinur; Shoham, Yaron; Bohanon, Fredrick J; Haller, Herbert L; Tuca, Alexandru; Branski, Ludwik K; Lumenta, David B; Herndon, David N; Kamolz, Lars-P

    2016-03-01

    The aim of this study was to compare the accuracy of burn size estimation using the computer-assisted software BurnCase 3D (RISC Software GmbH, Hagenberg, Austria) with that using a 2D scan, considered to be the actual burn size. Thirty artificial burn areas were pre planned and prepared on three mannequins (one child, one female, and one male). Five trained physicians (raters) were asked to assess the size of all wound areas using BurnCase 3D software. The results were then compared with the real wound areas, as determined by 2D planimetry imaging. To examine inter-rater reliability, we performed an intraclass correlation analysis with a 95% confidence interval. The mean wound area estimations of the five raters using BurnCase 3D were in total 20.7±0.9% for the child, 27.2±1.5% for the female and 16.5±0.1% for the male mannequin. Our analysis showed relative overestimations of 0.4%, 2.8% and 1.5% for the child, female and male mannequins respectively, compared to the 2D scan. The intraclass correlation between the single raters for mean percentage of the artificial burn areas was 98.6%. There was also a high intraclass correlation between the single raters and the 2D Scan visible. BurnCase 3D is a valid and reliable tool for the determination of total body surface area burned in standard models. Further clinical studies including different pediatric and overweight adult mannequins are warranted. Copyright © 2016 Elsevier Ltd and ISBI. All rights reserved.

  14. Validity and inter-rater reliability of medio-lateral knee motion observed during a single-limb mini squat

    Directory of Open Access Journals (Sweden)

    Simic Milena

    2010-11-01

    Full Text Available Abstract Background Muscle function may influence the risk of knee injury and outcomes following injury. Clinical tests, such as a single-limb mini squat, resemble conditions of daily life and are easy to administer. Fewer squats per 30 seconds indicate poorer function. However, the quality of movement, such as the medio-lateral knee motion may also be important. The aim was to validate an observational clinical test of assessing the medio-lateral knee motion, using a three-dimensional (3-D motion analysis system. In addition, the inter-rater reliability was evaluated. Methods Twenty-five (17 women non-injured participants (mean age 25.6 years, range 18-37 were included. Visual analysis of the medio-lateral knee motion, scored as knee-over-foot or knee-medial-to-foot by two raters, and 3-D kinematic data were collected simultaneously during a single-limb mini squat. Frontal plane 2-D peak tibial, thigh, and knee varus-valgus angles, and 3-D peak hip internal-external rotation, and knee varus-valgus angles were calculated. Results Ten subjects were scored as having a knee-medial-to-foot position and 15 subjects a knee-over-foot position assessed by visual inspection. In 2-D, the peak tibial angle (mean 89.0 (SE 0.7 vs mean 86.3 (SE 0.4 degrees, p = 0.001 and peak thigh angle (mean 77.4 (SE 1.0 vs mean 81.2 (SE 0.5 degrees, p = 0.001 with respect to the horizontal, indicated that the knee was more medially placed than the ankle and thigh, respectively. Thus, the knee was in more valgus (mean 11.6 (SE 1.5 vs 5.0 (SE 0.8 degrees, p 0.90 and 96 between raters. Conclusions Medio-lateral motion of the knee can reliably be assessed during a single-leg mini-squat. The test is valid in 2-D, while the actual movement, in 3-D, is mainly exhibited as increased internal hip rotation. The single-limb mini squat is feasible and easy to administer in the clinical setting and in research to address lower extremity movement quality.

  15. Validity and Interrater Reliability of the Visual Quarter-Waste Method for Assessing Food Waste in Middle School and High School Cafeteria Settings.

    Science.gov (United States)

    Getts, Katherine M; Quinn, Emilee L; Johnson, Donna B; Otten, Jennifer J

    2017-11-01

    Measuring food waste (ie, plate waste) in school cafeterias is an important tool to evaluate the effectiveness of school nutrition policies and interventions aimed at increasing consumption of healthier meals. Visual assessment methods are frequently applied in plate waste studies because they are more convenient than weighing. The visual quarter-waste method has become a common tool in studies of school meal waste and consumption, but previous studies of its validity and reliability have used correlation coefficients, which measure association but not necessarily agreement. The aims of this study were to determine, using a statistic measuring interrater agreement, whether the visual quarter-waste method is valid and reliable for assessing food waste in a school cafeteria setting when compared with the gold standard of weighed plate waste. To evaluate validity, researchers used the visual quarter-waste method and weighed food waste from 748 trays at four middle schools and five high schools in one school district in Washington State during May 2014. To assess interrater reliability, researcher pairs independently assessed 59 of the same trays using the visual quarter-waste method. Both validity and reliability were assessed using a weighted κ coefficient. For validity, as compared with the measured weight, 45% of foods assessed using the visual quarter-waste method were in almost perfect agreement, 42% of foods were in substantial agreement, 10% were in moderate agreement, and 3% were in slight agreement. For interrater reliability between pairs of visual assessors, 46% of foods were in perfect agreement, 31% were in almost perfect agreement, 15% were in substantial agreement, and 8% were in moderate agreement. These results suggest that the visual quarter-waste method is a valid and reliable tool for measuring plate waste in school cafeteria settings. Copyright © 2017 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.

  16. Concurrent validity and interrater reliability of a new smartphone application to assess 3D active cervical range of motion in patients with neck pain.

    Science.gov (United States)

    Stenneberg, Martijn S; Busstra, Harm; Eskes, Michel; van Trijffel, Emiel; Cattrysse, Erik; Scholten-Peeters, Gwendolijne G M; de Bie, Rob A

    2018-04-01

    There is a lack of valid, reliable, and feasible instruments for measuring planar active cervical range of motion (aCROM) and associated 3D coupling motions in patients with neck pain. Smartphones have advanced sensors and appear to be suitable for these measurements. To estimate the concurrent validity and interrater reliability of a new iPhone application for assessing planar aCROM and associated 3D coupling motions in patients with neck pain, using an electromagnetic tracking device as a reference test. Cross-sectional study. Two samples of neck pain patients were recruited; 30 patients for the validity study and 26 patients for the reliability study. Validity was estimated using intraclass correlation coefficients (ICCs), and by calculating 95% limits of agreement (LoA). To estimate interrater reliability, ICCs were calculated. Cervical 3D coupling motions were analyzed by calculating the cross-correlation coefficients and ratio between the main motions and coupled motions for both instruments. ICCs for concurrent validity and interrater reliability ranged from 0.90 to 0.99. The width of the 95% LoA ranged from about 5° for right lateral bending to 11° for total rotation. No significant differences were found between both devices for associated coupling motion analysis. The iPhone application appears to be a useful discriminative tool for the measurement of planar aCROM and associated coupling motions in patients with neck pain. It fulfills the need for a valid, reliable, and feasible instrument in clinical practice and research. Therapists and researchers should consider measurement error when interpreting scores. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Reevaluating Interrater Reliability in Offender Risk Assessment

    NARCIS (Netherlands)

    van der Knaap, L.M.; Leenarts, L.E.W.; Born, M.P.; Oosterveld, P.

    2012-01-01

    Offender risk and needs assessment, one of the pillars of the risk-need-responsivity model of offender rehabilitation, usually depends on raters assessing offender risk and needs. The few available studies of interrater reliability in offender risk assessment are, however, limited in the

  18. Reevaluating Interrater Reliability in Offender Risk Assessment

    Science.gov (United States)

    van der Knaap, Leontien M.; Leenarts, Laura E. W.; Born, Marise Ph.; Oosterveld, Paul

    2012-01-01

    Offender risk and needs assessment, one of the pillars of the risk-need-responsivity model of offender rehabilitation, usually depends on raters assessing offender risk and needs. The few available studies of interrater reliability in offender risk assessment are, however, limited in the generalizability of their results. The present study…

  19. Manual muscle testing and hand-held dynamometry in people with inflammatory myopathy: An intra- and interrater reliability and validity study.

    Science.gov (United States)

    Baschung Pfister, Pierrette; de Bruin, Eling D; Sterkele, Iris; Maurer, Britta; de Bie, Rob A; Knols, Ruud H

    2018-01-01

    Manual muscle testing (MMT) and hand-held dynamometry (HHD) are commonly used in people with inflammatory myopathy (IM), but their clinimetric properties have not yet been sufficiently studied. To evaluate the reliability and validity of MMT and HHD, maximum isometric strength was measured in eight muscle groups across three measurement events. To evaluate reliability of HHD, intra-class correlation coefficients (ICC), the standard error of measurements (SEM) and smallest detectable changes (SDC) were calculated. To measure reliability of MMT linear Cohen`s Kappa was computed for single muscle groups and ICC for total score. Additionally, correlations between MMT8 and HHD were evaluated with Spearman Correlation Coefficients. Fifty people with myositis (56±14 years, 76% female) were included in the study. Intra-and interrater reliability of HHD yielded excellent ICCs (0.75-0.97) for all muscle groups, except for interrater reliability of ankle extension (0.61). The corresponding SEMs% ranged from 8 to 28% and the SDCs% from 23 to 65%. MMT8 total score revealed excellent intra-and interrater reliability (ICC>0.9). Intrarater reliability of single muscle groups was substantial for shoulder and hip abduction, elbow and neck flexion, and hip extension (0.64-0.69); moderate for wrist (0.53) and knee extension (0.49) and fair for ankle extension (0.35). Interrater reliability was moderate for neck flexion (0.54) and hip abduction (0.44); fair for shoulder abduction, elbow flexion, wrist and ankle extension (0.20-0.33); and slight for knee extension (0.08). Correlations between the two tests were low for wrist, knee, ankle, and hip extension; moderate for elbow flexion, neck flexion and hip abduction; and good for shoulder abduction. In conclusion, the MMT8 total score is a reliable assessment to consider general muscle weakness in people with myositis but not for single muscle groups. In contrast, our results confirm that HHD can be recommended to evaluate strength of

  20. Interrater and test-retest reliability and validity of the Norwegian version of the BESTest and mini-BESTest in people with increased risk of falling.

    Science.gov (United States)

    Hamre, Charlotta; Botolfsen, Pernille; Tangen, Gro Gujord; Helbostad, Jorunn L

    2017-04-20

    The Balance Evaluation Systems Test (BESTest) was developed to assess underlying systems for balance control in order to be able to individually tailor rehabilitation interventions to people with balance disorders. A short form, the Mini-BESTest, was developed as a screening test. The study aimed to assess interrater and test-retest reliability of the Norwegian version of the BESTest and the Mini-BESTest in community-dwelling people with increased risk of falling and to assess concurrent validity with the Fall Efficacy Scale-International (FES-I), and it was an observational study with a cross-sectional design. Forty-two persons with increased risk of falling (elderly over 65 years of age, persons with a history of stroke or Multiple Sclerosis) were assessed twice by two raters. Relative reliability was analysed with Intraclass Correlation Coefficient (ICC), and absolute reliability with standard error of measurement (SEM) and smallest detectable change (SDC). Concurrent validity was assessed against the FES-I using Spearman's rho. The BESTest showed very good interrater reliability (ICC = 0.98, SEM = 1.79, SDC 95  = 5.0) and test-retest reliability (rater A/rater B = ICC = 0.89/0.89, SEM = 3.9/4.3, SDC 95  = 10.8/11.8). The Mini-BESTest also showed very good interrater reliability (ICC = 0.95, SEM = 1.19, SDC 95  = 3.3) and test-retest reliability (rater A/rater B = ICC = 0.85/0.84, SEM = 1.8/1.9, SDC 95  = 4.9/5.2). The correlations were moderate between the FES-I and both the BESTest and the Mini-BESTest (Spearman's rho -0.51 and-0.50, p test-retest reliability when assessed in a heterogeneous sample of people with increased risk of falling. The concurrent validity measured against the FES-I showed moderate correlation. The results are comparable with earlier studies and indicate that the Norwegian versions can be used in daily clinic and in research.

  1. Content Validity and Inter-Rater Reliability of the Halliwick-Concept-Based Instrument "Swimming with Independent Measure"

    Science.gov (United States)

    Srsen, Katja Groleger; Vidmar, Gaj; Pikl, Masa; Vrecar, Irena; Burja, Cirila; Krusec, Klavdija

    2012-01-01

    The Halliwick concept is widely used in different settings to promote joyful movement in water and swimming. To assess the swimming skills and progression of an individual swimmer, a valid and reliable measure should be used. The Halliwick-concept-based Swimming with Independent Measure (SWIM) was introduced for this purpose. We aimed to determine…

  2. Content Validity Index and Intra- and Inter-Rater Reliability of a New Muscle Strength/Endurance Test Battery for Swedish Soldiers.

    Directory of Open Access Journals (Sweden)

    Helena Larsson

    Full Text Available The objective of this study was to examine the content validity of commonly used muscle performance tests in military personnel and to investigate the reliability of a proposed test battery. For the content validity investigation, thirty selected tests were those described in the literature and/or commonly used in the Nordic and North Atlantic Treaty Organization (NATO countries. Nine selected experts rated, on a four-point Likert scale, the relevance of these tests in relation to five different work tasks: lifting, carrying equipment on the body or in the hands, climbing, and digging. Thereafter, a content validity index (CVI was calculated for each work task. The result showed excellent CVI (≥0.78 for sixteen tests, which comprised of one or more of the military work tasks. Three of the tests; the functional lower-limb loading test (the Ranger test, dead-lift with kettlebells, and back extension, showed excellent content validity for four of the work tasks. For the development of a new muscle strength/endurance test battery, these three tests were further supplemented with two other tests, namely, the chins and side-bridge test. The inter-rater reliability was high (intraclass correlation coefficient, ICC2,1 0.99 for all five tests. The intra-rater reliability was good to high (ICC3,1 0.82-0.96 with an acceptable standard error of mean (SEM, except for the side-bridge test (SEM%>15. Thus, the final suggested test battery for a valid and reliable evaluation of soldiers' muscle performance comprised the following four tests; the Ranger test, dead-lift with kettlebells, chins, and back extension test. The criterion-related validity of the test battery should be further evaluated for soldiers exposed to varying physical workload.

  3. Intrarater and interrater reliability for measurements in videofluoroscopy of swallowing

    International Nuclear Information System (INIS)

    Baijens, Laura; Barikroo, Ali; Pilz, Walmari

    2013-01-01

    Objective: Intrarater and interrater reliability is crucial to the quality of diagnostic and therapy-effect studies. This paper reports on a systematic review of studies on intrarater and interrater reliability for measurements in videofluoroscopy of swallowing. The aim of this review was to summarize and qualitatively analyze published studies on that topic. Materials and methods: Those published up to March 2013 were found through a comprehensive electronic database search using PubMed, Embase, and The Cochrane Library. Two reviewers independently assessed the studies using strict inclusion criteria. Results: Nineteen studies were included and then qualitatively analyzed. In several of these, methodological problems were found. Moreover, intrarater and interrater reliability varied with the measure applied. A meta-analysis was not carried out as studies were not of sufficient quality to warrant doing so. Conclusion: In order to achieve reliable measurements in videofluoroscopy of swallowing, it is recommended that raters use well-defined guidelines for the levels of ordinal visuoperceptual variables. Furthermore, in order to make the measurements reliable (intrarater and interrater) it is recommended that, following protocolled pre-experimental training, the raters should have maximum consensus about the definition of the measured variables

  4. Intrarater and interrater reliability for measurements in videofluoroscopy of swallowing

    Energy Technology Data Exchange (ETDEWEB)

    Baijens, Laura, E-mail: laura.baijens@mumc.nl [Department of Otorhinolaryngology, Head and Neck Surgery, Maastricht University Medical Center, Maastricht (Netherlands); Barikroo, Ali, E-mail: a.Barikroo@ufl.edu [Swallowing Research Laboratory, Department of Speech, Language and Hearing Sciences, College of Public Health and Health Professions, University of Florida, Gainesville, FL (United States); Pilz, Walmari, E-mail: walmari.pilz@mumc.nl [Department of Otorhinolaryngology, Head and Neck Surgery, Maastricht University Medical Center, Maastricht (Netherlands)

    2013-10-01

    Objective: Intrarater and interrater reliability is crucial to the quality of diagnostic and therapy-effect studies. This paper reports on a systematic review of studies on intrarater and interrater reliability for measurements in videofluoroscopy of swallowing. The aim of this review was to summarize and qualitatively analyze published studies on that topic. Materials and methods: Those published up to March 2013 were found through a comprehensive electronic database search using PubMed, Embase, and The Cochrane Library. Two reviewers independently assessed the studies using strict inclusion criteria. Results: Nineteen studies were included and then qualitatively analyzed. In several of these, methodological problems were found. Moreover, intrarater and interrater reliability varied with the measure applied. A meta-analysis was not carried out as studies were not of sufficient quality to warrant doing so. Conclusion: In order to achieve reliable measurements in videofluoroscopy of swallowing, it is recommended that raters use well-defined guidelines for the levels of ordinal visuoperceptual variables. Furthermore, in order to make the measurements reliable (intrarater and interrater) it is recommended that, following protocolled pre-experimental training, the raters should have maximum consensus about the definition of the measured variables.

  5. Inter-rater and intra-rater reliability of a movement control test in shoulder.

    Science.gov (United States)

    Rajasekar, S; Bangera, Rakshith K; Sekaran, Padmanaban

    2017-07-01

    Movement faults are commonly observed in patients with musculoskeletal pain. The Kinetic Medial Rotation Test (KMRT) is a movement control test used to identify movement faults of the scapula and gleno-humeral joints during arm movement. Objective tests such as the KMRT need to be reliable and valid for the results to be applied across different clinical settings and patient populations. The primary objective of the present study was to determine the intra-rater and inter-rater reliability of KMRT in subjects with and without shoulder pain. Sixty subjects were included in this study based on specific inclusion and exclusion criteria. Two musculoskeletal physiotherapists with different levels of clinical experience performed the tests. The intra-rater reliability was tested in twenty asymptomatic subjects by a single assessor at two week intervals. An equal number of subjects with and without shoulder pain were tested by both the assessors to determine the inter-rater reliability. Both components of the KMRT, the Gleno- Humeral Anterior Translation (GHAT) and the Scapular Forward Tilt (SCFT) were tested. The Kappa values for inter-rater reliability of the GHAT and SCFT were K = 0.68 & K = 0.65 respectively in subjects with shoulder pain. In asymptomatic subjects, the inter-rater reliability of GHAT was K = 0.61 and SCFT was K = 0.85. Intra-rater reliability ranged from K = 0.66 for GHAT to K = 0.87 for SCFT. Our study found substantial agreement in inter-rater reliability of KMRT in subjects with shoulder pain, whereas substantial to near perfect agreement was found in intra-rater and inter-rater reliability of KMRT in subjects without shoulder pain. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Interrater reliability of a Pilates movement-based classification system.

    Science.gov (United States)

    Yu, Kwan Kenny; Tulloch, Evelyn; Hendrick, Paul

    2015-01-01

    To determine the interrater reliability for identification of a specific movement pattern using a Pilates Classification system. Videos of 5 subjects performing specific movement tasks were sent to raters trained in the DMA-CP classification system. Ninety-six raters completed the survey. Interrater reliability for the detection of a directional bias was excellent (Pi = 0.92, and K(free) = 0.89). Interrater reliability for classifying an individual into a specific subgroup was moderate (Pi = 0.64, K(free) = 0.55) however raters who had completed levels 1-4 of the DMA-CP training and reported using the assessment daily demonstrated excellent reliability (Pi = 0.89 and K(free) = 0.87). The reliability of the classification system demonstrated almost perfect agreement in determining the existence of a specific movement pattern and classifying into a subgroup for experienced raters. There was a trend for greater reliability associated with increased levels of training and experience of the raters. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Construct validity and inter-rater reliability of the Dutch activity measure for post-acute care "6-clicks" basic mobility form to assess the mobility of hospitalized patients.

    Science.gov (United States)

    Geelen, Sven Jacobus Gertruda; Valkenet, Karin; Veenhof, Cindy

    2018-05-12

    To evaluate the construct validity and the inter-rater reliability of the Dutch Activity Measure for Post-Acute Care "6-clicks" Basic Mobility short form measuring the patient's mobility in Dutch hospital care. First, the "6-clicks" was translated by using a forward-backward translation protocol. Next, 64 patients were assessed by the physiotherapist to determine the validity while being admitted to the Internal Medicine wards of a university medical center. Six hypotheses were tested regarding the construct "mobility" which showed that: Better "6-clicks" scores were related to less restrictive pre-admission living situations (p = 0.011), less restrictive discharge locations (p = 0.001), more independence in activities of daily living (p = 0.001) and less physiotherapy visits (p Dutch "6-clicks" shows a good construct validity and moderate-to-excellent inter-rater reliability when used to assess the mobility of hospitalized patients. Implications for Rehabilitation Even though various measurement tools have been developed, it appears the majority of physiotherapists working in a hospital currently do not use these tools as a standard part of their care. The Activity Measure for Post-Acute Care "6-clicks" Basic Mobility is the only tool which is designed to be short, easy to use within usual care and has been validated in the entire hospital population. This study shows that the Dutch version of the Activity Measure for Post-Acute Care "6-clicks" Basic Mobility form is a valid, easy to use, quick tool to assess the basic mobility of Dutch hospitalized patients.

  8. Inter-Rater Reliability of Cyclotorsion Measurements Using Fundus Photography.

    Science.gov (United States)

    Dysli, Muriel; Kanku, Madeleine; Traber, Ghislaine L

    2018-04-01

    The foveo-papillary angle (FPA) on fundus photographs is the accepted standard for the measurement of ocular cyclotorsion. We assessed the inter-rater reliability of this method in healthy subjects and in patients with trochlear nerve palsies. In this methodological study, fundus photographs of healthy subjects and of patients with trochlear nerve palsies were made with a fundus camera (Zeiss Fundus Camera FF 450 plus, Jena, Germany). Three independent observers measured the FPA on the fundus photographs of all subjects in synedra View (synedra View 16, Version 16.0.0.11, Innsbruck, Austria). One hundred and four eyes of 52 subjects (26 healthy controls and 26 patients) were assessed. The mean FPA of the healthy controls was 5.80 degrees (°) [± 0.44 standard error of the mean (SEM)] compared to 11.55° (± 0.80 SEM) for patients with trochlear nerve palsies. The inter-rater reliability of all measured FPAs showed an intraclass correlation coefficient (ICC) of 0.98 (95% CI 0.97 - 0.98). The inter-rater reliability of objective cyclotorsion measurements using fundus photographs was very high. Georg Thieme Verlag KG Stuttgart · New York.

  9. Grant Peer Review: Improving Inter-Rater Reliability with Training.

    Science.gov (United States)

    Sattler, David N; McKnight, Patrick E; Naney, Linda; Mathis, Randy

    2015-01-01

    This study developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-rater reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers--especially those with experience--have good understanding of the grant review rating scale. The findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. The results underscore the benefits of and need for specialized peer reviewer training.

  10. Interrater reliability of videotaped observational gait-analysis assessments.

    Science.gov (United States)

    Eastlack, M E; Arvidson, J; Snyder-Mackler, L; Danoff, J V; McGarvey, C L

    1991-06-01

    The purpose of this study was to determine the interrater reliability of videotaped observational gait-analysis (VOGA) assessments. Fifty-four licensed physical therapists with varying amounts of clinical experience served as raters. Three patients with rheumatoid arthritis who demonstrated an abnormal gait pattern served as subjects for the videotape. The raters analyzed each patient's most severely involved knee during the four subphases of stance for the kinematic variables of knee flexion and genu valgum. Raters were asked to determine whether these variables were inadequate, normal, or excessive. The temporospatial variables analyzed throughout the entire gait cycle were cadence, step length, stride length, stance time, and step width. Generalized kappa coefficients ranged from .11 to .52. Intraclass correlation coefficients (2,1) and (3,1) were slightly higher. Our results indicate that physical therapists' VOGA assessments are only slightly to moderately reliable and that improved interrater reliability of the assessments of physical therapists utilizing this technique is needed. Our data suggest that there is a need for greater standardization of gait-analysis training.

  11. Test-retest and interrater reliability of the functional lower extremity evaluation.

    Science.gov (United States)

    Haitz, Karyn; Shultz, Rebecca; Hodgins, Melissa; Matheson, Gordon O

    2014-12-01

    Repeated-measures clinical measurement reliability study. To establish the reliability and face validity of the Functional Lower Extremity Evaluation (FLEE). The FLEE is a 45-minute battery of 8 standardized functional performance tests that measures 3 components of lower extremity function: control, power, and endurance. The reliability and normative values for the FLEE in healthy athletes are unknown. A face validity survey for the FLEE was sent to sports medicine personnel to evaluate the level of importance and frequency of clinical usage of each test included in the FLEE. The FLEE was then administered and rated for 40 uninjured athletes. To assess test-retest reliability, each athlete was tested twice, 1 week apart, by the same rater. To assess interrater reliability, 3 raters scored each athlete during 1 of the testing sessions. Intraclass correlation coefficients were used to assess the test-retest and interrater reliability of each of the FLEE tests. In the face validity survey, the FLEE tests were rated as highly important by 58% to 71% of respondents but frequently used by only 26% to 45% of respondents. Interrater reliability intraclass correlation coefficients ranged from 0.83 to 1.00, and test-retest reliability ranged from 0.71 to 0.95. The FLEE tests are considered clinically important for assessing lower extremity function by sports medicine personnel but are underused. The FLEE also is a reliable assessment tool. Future studies are required to determine if use of the FLEE to make return-to-play decisions may reduce reinjury rates.

  12. Interrater reliability of the mind map assessment rubric in a cohort of medical students

    Directory of Open Access Journals (Sweden)

    Zipp Genevieve

    2009-04-01

    Full Text Available Abstract Background Learning strategies are thinking tools that students can use to actively acquire information. Examples of learning strategies include mnemonics, charts, and maps. One strategy that may help students master the tsunami of information presented in medical school is the mind map learning strategy. Currently, there is no valid and reliable rubric to grade mind maps and this may contribute to their underutilization in medicine. Because concept maps and mind maps engage learners similarly at a metacognitive level, a valid and reliable concept map assessment scoring system was adapted to form the mind map assessment rubric (MMAR. The MMAR can assess mind map depth based upon concept-links, cross-links, hierarchies, examples, pictures, and colors. The purpose of this study was to examine interrater reliability of the MMAR. Methods This exploratory study was conducted at a US medical school as part of a larger investigation on learning strategies. Sixty-six (N = 66 first-year medical students were given a 394-word text passage followed by a 30-minute presentation on mind mapping. After the presentation, subjects were again given the text passage and instructed to create mind maps based upon the passage. The mind maps were collected and independently scored using the MMAR by 3 examiners. Interrater reliability was measured using the intraclass correlation coefficient (ICC statistic. Statistics were calculated using SPSS version 12.0 (Chicago, IL. Results Analysis of the mind maps revealed the following: concept-links ICC = .05 (95% CI, -.42 to .38, cross-links ICC = .58 (95% CI, .37 to .73, hierarchies ICC = .23 (95% CI, -.15 to .50, examples ICC = .53 (95% CI, .29 to .69, pictures ICC = .86 (95% CI, .79 to .91, colors ICC = .73 (95% CI, .59 to .82, and total score ICC = .86 (95% CI, .79 to .91. Conclusion The high ICC value for total mind map score indicates strong MMAR interrater reliability. Pictures and colors demonstrated moderate

  13. Interrater reliability of the mind map assessment rubric in a cohort of medical students.

    Science.gov (United States)

    D'Antoni, Anthony V; Zipp, Genevieve Pinto; Olson, Valerie G

    2009-04-28

    Learning strategies are thinking tools that students can use to actively acquire information. Examples of learning strategies include mnemonics, charts, and maps. One strategy that may help students master the tsunami of information presented in medical school is the mind map learning strategy. Currently, there is no valid and reliable rubric to grade mind maps and this may contribute to their underutilization in medicine. Because concept maps and mind maps engage learners similarly at a metacognitive level, a valid and reliable concept map assessment scoring system was adapted to form the mind map assessment rubric (MMAR). The MMAR can assess mind map depth based upon concept-links, cross-links, hierarchies, examples, pictures, and colors. The purpose of this study was to examine interrater reliability of the MMAR. This exploratory study was conducted at a US medical school as part of a larger investigation on learning strategies. Sixty-six (N = 66) first-year medical students were given a 394-word text passage followed by a 30-minute presentation on mind mapping. After the presentation, subjects were again given the text passage and instructed to create mind maps based upon the passage. The mind maps were collected and independently scored using the MMAR by 3 examiners. Interrater reliability was measured using the intraclass correlation coefficient (ICC) statistic. Statistics were calculated using SPSS version 12.0 (Chicago, IL). Analysis of the mind maps revealed the following: concept-links ICC = .05 (95% CI, -.42 to .38), cross-links ICC = .58 (95% CI, .37 to .73), hierarchies ICC = .23 (95% CI, -.15 to .50), examples ICC = .53 (95% CI, .29 to .69), pictures ICC = .86 (95% CI, .79 to .91), colors ICC = .73 (95% CI, .59 to .82), and total score ICC = .86 (95% CI, .79 to .91). The high ICC value for total mind map score indicates strong MMAR interrater reliability. Pictures and colors demonstrated moderate to strong interrater reliability. We conclude that the

  14. Inter-rater and intra-rater reliability of the Bahasa Melayu version of Rose Angina Questionnaire.

    Science.gov (United States)

    Hassan, N B; Choudhury, S R; Naing, L; Conroy, R M; Rahman, A R A

    2007-01-01

    The objective of the study is to translate the Rose Questionnaire (RQ) into a Bahasa Melayu version and adapt it cross-culturally, and to measure its inter-rater and intrarater reliability. This cross sectional study was conducted in the respondents' homes or workplaces in Kelantan, Malaysia. One hundred respondents aged 30 and above with different socio-demographic status were interviewed for face validity. For each inter-rater and intra-rater reliability, a sample of 150 respondents was interviewed. Inter-rater and intra-rater reliabilities were assessed by Cohen's kappa. The overall inter-rater agreements by the five pair of interviewers at point one and two were 0.86, and intrarater reliability by the five interviewers on the seven-item questionnaire at poinone and two was 0.88, as measured by kappa coefficient. The translated Malay version of RQ demonstrated an almost perfect inter-rater and intra-rater reliability and further validation such as sensitivity and specificity analysis of this translated questionnaire is highly recommended.

  15. Education Research: Bias and poor interrater reliability in evaluating the neurology clinical skills examination

    Science.gov (United States)

    Schuh, L A.; London, Z; Neel, R; Brock, C; Kissela, B M.; Schultz, L; Gelb, D J.

    2009-01-01

    Objective: The American Board of Psychiatry and Neurology (ABPN) has recently replaced the traditional, centralized oral examination with the locally administered Neurology Clinical Skills Examination (NEX). The ABPN postulated the experience with the NEX would be similar to the Mini-Clinical Evaluation Exercise, a reliable and valid assessment tool. The reliability and validity of the NEX has not been established. Methods: NEX encounters were videotaped at 4 neurology programs. Local faculty and ABPN examiners graded the encounters using 2 different evaluation forms: an ABPN form and one with a contracted rating scale. Some NEX encounters were purposely failed by residents. Cohen’s kappa and intraclass correlation coefficients (ICC) were calculated for local vs ABPN examiners. Results: Ninety-eight videotaped NEX encounters of 32 residents were evaluated by 20 local faculty evaluators and 18 ABPN examiners. The interrater reliability for a determination of pass vs fail for each encounter was poor (kappa 0.32; 95% confidence interval [CI] = 0.11, 0.53). ICC between local faculty and ABPN examiners for each performance rating on the ABPN NEX form was poor to moderate (ICC range 0.14-0.44), and did not improve with the contracted rating form (ICC range 0.09-0.36). ABPN examiners were more likely than local examiners to fail residents. Conclusions: There is poor interrater reliability between local faculty and American Board of Psychiatry and Neurology examiners. A bias was detected for favorable assessment locally, which is concerning for the validity of the examination. Further study is needed to assess whether training can improve interrater reliability and offset bias. GLOSSARY ABIM = American Board of Internal Medicine; ABPN = American Board of Psychiatry and Neurology; CI = confidence interval; HFH = Henry Ford Hospital; ICC = intraclass correlation coefficients; IM = internal medicine; mini-CEX = Mini-Clinical Evaluation Exercise; NEX = Neurology Clinical

  16. Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability

    Science.gov (United States)

    Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca

    2018-01-01

    Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…

  17. Inter-rater reliability of three standardized functional tests in patients with low back pain

    Science.gov (United States)

    Tidstrand, Johan; Horneij, Eva

    2009-01-01

    Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs). Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ) and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0), for sitting on a Bobath ball good (κ: 0.79) and very good (κ: 0.88) and for the unilateral pelvic lift: good (κ: 0.61) and moderate (κ: 0.47). Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their ability to evaluate lumbar

  18. Inter-rater reliability of three standardized functional tests in patients with low back pain

    Directory of Open Access Journals (Sweden)

    Tidstrand Johan

    2009-06-01

    Full Text Available Abstract Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs. Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0, for sitting on a Bobath ball good (κ: 0.79 and very good (κ: 0.88 and for the unilateral pelvic lift: good (κ: 0.61 and moderate (κ: 0.47. Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their

  19. Simulated patient training: Using inter-rater reliability to evaluate simulated patient consistency in nursing education.

    Science.gov (United States)

    MacLean, Sharon; Geddes, Fiona; Kelly, Michelle; Della, Phillip

    2018-03-01

    Simulated patients (SPs) are frequently used for training nursing students in communication skills. An acknowledged benefit of using SPs is the opportunity to provide a standardized approach by which participants can demonstrate and develop communication skills. However, relatively little evidence is available on how to best facilitate and evaluate the reliability and accuracy of SPs' performances. The aim of this study is to investigate the effectiveness of an evidenced based SP training framework to ensure standardization of SPs. The training framework was employed to improve inter-rater reliability of SPs. A quasi-experimental study was employed to assess SP post-training understanding of simulation scenario parameters using inter-rater reliability agreement indices. Two phases of data collection took place. Initially a trial phase including audio-visual (AV) recordings of two undergraduate nursing students completing a simulation scenario is rated by eight SPs using the Interpersonal Communication Assessments Scale (ICAS) and Quality of Discharge Teaching Scale (QDTS). In phase 2, eight SP raters and four nursing faculty raters independently evaluated students' (N=42) communication practices using the QDTS. Intraclass correlation coefficients (ICC) were >0.80 for both stages of the study in clinical communication skills. The results support the premise that if trained appropriately, SPs have a high degree of reliability and validity to both facilitate and evaluate student performance in nurse education. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.

  20. Inter-rater reliability of diagnostic criteria for sacroiliac joint-, disc- and facet joint pain.

    Science.gov (United States)

    van Tilburg, Cornelis W J; Groeneweg, Johannes G; Stronks, Dirk L; Huygen, Frank J P M

    2017-01-01

    Several diagnostic criteria sets are described in the literature to identify low back pain subtypes, but very little is known about the inter-rater reliability of these criteria. We conducted a study to determine the reliability of diagnostic tests that point towards SI joint-, disc- or facet joint pain. Inter-rater reliability study alongside three randomized clinical trials. Multidisciplinary pain center of general hospital. Patients aged 18 or more with medical history and physical examination suggestive of sacroiliac joint-, disc- and facet joint pain on lumbar level. Making use of nowadays most common used diagnostic criteria, a physical examination is taken independently by three physicians (two pain physicians and one orthopedic surgeon). Inter-rater reliability (Kappa (κ) measure of agreement) and significance (p) between raters are presented. Strengths of agreement, indicated with κ values above 0,20, are presented in order of agreement. One hundred patients were included. None of the parameters from the physical investigation had κ values of more than 0.21 (fair) in all pairs of raters. Between two raters (C and D), there was an almost perfect agreement on three parameters, more specifically ``Abnormal sensory and motor examination, hyperactive or diminished reflexes'', ``Sitting exam shows no reflex, motor or sensory signs in the legs'' and ``Straight leg raising (Laségue) negative between 30 and 70 degrees of flexion''. The ``Drop test positive'' parameters had moderate strength of agreement between raters A and D and fair strength between raters A and B. The ``Digital interspinous pressure test positive'' had moderate strength of agreement between raters C and D and fair strength of agreement between raters A and B as well as raters B and C. Three other parameters had a fair strength of agreement between two raters, all other parameters had a slight or poor strength of agreement. Inter-rater reliability, confidence intervals and significance of

  1. Inter-rater reliability of kinesthetic measurements with the KINARM robotic exoskeleton.

    Science.gov (United States)

    Semrau, Jennifer A; Herter, Troy M; Scott, Stephen H; Dukelow, Sean P

    2017-05-22

    Kinesthesia (sense of limb movement) has been extremely difficult to measure objectively, especially in individuals who have survived a stroke. The development of valid and reliable measurements for proprioception is important to developing a better understanding of proprioceptive impairments after stroke and their impact on the ability to perform daily activities. We recently developed a robotic task to evaluate kinesthetic deficits after stroke and found that the majority (~60%) of stroke survivors exhibit significant deficits in kinesthesia within the first 10 days post-stroke. Here we aim to determine the inter-rater reliability of this robotic kinesthetic matching task. Twenty-five neurologically intact control subjects and 15 individuals with first-time stroke were evaluated on a robotic kinesthetic matching task (KIN). Subjects sat in a robotic exoskeleton with their arms supported against gravity. In the KIN task, the robot moved the subjects' stroke-affected arm at a preset speed, direction and distance. As soon as subjects felt the robot begin to move their affected arm, they matched the robot movement with the unaffected arm. Subjects were tested in two sessions on the KIN task: initial session and then a second session (within an average of 18.2 ± 13.8 h of the initial session for stroke subjects), which were supervised by different technicians. The task was performed both with and without the use of vision in both sessions. We evaluated intra-class correlations of spatial and temporal parameters derived from the KIN task to determine the reliability of the robotic task. We evaluated 8 spatial and temporal parameters that quantify kinesthetic behavior. We found that the parameters exhibited moderate to high intra-class correlations between the initial and retest conditions (Range, r-value = [0.53-0.97]). The robotic KIN task exhibited good inter-rater reliability. This validates the KIN task as a reliable, objective method for quantifying

  2. "A Comparison of Consensus, Consistency, and Measurement Approaches to Estimating Interrater Reliability"

    OpenAIRE

    Steven E. Stemler

    2004-01-01

    This article argues that the general practice of describing interrater reliability as a single, unified concept is..at best imprecise, and at worst potentially misleading. Rather than representing a single concept, different..statistical methods for computing interrater reliability can be more accurately classified into one of three..categories based upon the underlying goals of analysis. The three general categories introduced and..described in this paper are: 1) consensus estimates, 2) cons...

  3. The timed "up and go" test : Reliability and validity in persons with unilateral lower limb amputation

    NARCIS (Netherlands)

    Schoppen, Tanneke; Boonstra, Antje; Groothoff, JW; de Vries, J; Goeken, LNH; Eisma, Willem

    Objective: To determine the interrater and interrater reliability and the validity of the Timed "up and go" test as a measure for physical mobility in elderly patients with an amputation of the lower extremity. Design: To test interrater reliability, the test was performed for two observers at

  4. Development and interrater reliability testing of a telephone interview training programme for Australian nurse interviewers.

    Science.gov (United States)

    Ahern, Tracey; Gardner, Anne; Gardner, Glenn; Middleton, Sandy; Della, Phillip

    2013-05-01

    The final phase of a three phase study analysing the implementation and impact of the nurse practitioner role in Australia (the Australian Nurse Practitioner Project or AUSPRAC) was undertaken in 2009, requiring nurse telephone interviewers to gather information about health outcomes directly from patients and their treating nurse practitioners. A team of several registered nurses was recruited and trained as telephone interviewers. The aim of this paper is to report on development and evaluation of the training process for telephone interviewers. The training process involved planning the content and methods to be used in the training session; delivering the session; testing skills and understanding of interviewers post-training; collecting and analysing data to determine the degree to which the training process was successful in meeting objectives and post-training follow-up. All aspects of the training process were informed by established educational principles. Interrater reliability between interviewers was high for well-validated sections of the survey instrument resulting in 100% agreement between interviewers. Other sections with unvalidated questions showed lower agreement (between 75% and 90%). Overall the agreement between interviewers was 92%. Each interviewer was also measured against a specifically developed master script or gold standard and for this each interviewer achieved a percentage of correct answers of 94.7% or better. This equated to a Kappa value of 0.92 or better. The telephone interviewer training process was very effective and achieved high interrater reliability. We argue that the high reliability was due to the use of well validated instruments and the carefully planned programme based on established educational principles. There is limited published literature on how to successfully operationalise educational principles and tailor them for specific research studies; this report addresses this knowledge gap. Copyright © 2012 Elsevier

  5. Test–re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females

    Directory of Open Access Journals (Sweden)

    Chris Beardsley

    2016-03-01

    Full Text Available Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test–re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81–0.88, test–re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88–0.95, and test–re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65 and good on the right side (ICC = 0.85. Conclusion. Inter-rater reliability and test–re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test–re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test–re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.

  6. Test-re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females.

    Science.gov (United States)

    Beardsley, Chris; Egerton, Tim; Skinner, Brendon

    2016-01-01

    Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81-0.88), test-re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and test-re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test-re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test-re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test-re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.

  7. Interrater and Intrarater Reliability of the Tuck Jump Assessment by Health Professionals of Varied Educational Backgrounds

    Directory of Open Access Journals (Sweden)

    Lisa A. Dudley

    2013-01-01

    Full Text Available Objective. The Tuck Jump Assessment (TJA, a clinical plyometric assessment, identifies 10 jumping and landing technique flaws. The study objective was to investigate TJA interrater and intrarater reliability with raters of different educational and clinical backgrounds. Methods. 40 participants were video recorded performing the TJA using published protocol and instructions. Five raters of varied educational and clinical backgrounds scored the TJA. Each score of the 10 technique flaws was summed for the total TJA score. Approximately one month later, 3 raters scored the videos again. Intraclass correlation coefficients determined interrater (5 and 3 raters for first and second session, resp. and intrarater (3 raters reliability. Results. Interrater reliability with 5 raters was poor (ICC = 0.47; 95% confidence intervals (CI 0.33–0.62. Interrater reliability between 3 raters who completed 2 scoring sessions improved from 0.52 (95% CI 0.35–0.68 for session one to 0.69 (95% CI 0.55–0.81 for session two. Intrarater reliability was poor to moderate, ranging from 0.44 (95% CI 0.22–0.68 to 0.72 (95% CI 0.55–0.84. Conclusion. Published protocol and training of raters were insufficient to allow consistent TJA scoring. There may be a learned effect with the TJA since interrater reliability improved with repetition. TJA instructions and training should be modified and enhanced before clinical implementation.

  8. An examination of the interrater reliability between practitioners and researchers on the static-99.

    Science.gov (United States)

    Quesada, Stephen P; Calkins, Cynthia; Jeglic, Elizabeth L

    2014-11-01

    Many studies have validated the psychometric properties of the Static-99, the most widely used measure of sexual offender recidivism risk. However much of this research relied on instrument coding completed by well-trained researchers. This study is the first to examine the interrater reliability (IRR) of the Static-99 between practitioners in the field and researchers. Using archival data from a sample of 1,973 formerly incarcerated sex offenders, field raters' scores on the Static-99 were compared with those of researchers. Overall, clinicians and researchers had excellent IRR on Static-99 total scores, with IRR coefficients ranging from "substantial" to "outstanding" for the individual 10 items of the scale. The most common causes of discrepancies were coding manual errors, followed by item subjectivity, inaccurate item scoring, and calculation errors. These results offer important data with regard to the frequency and perceived nature of scoring errors. © The Author(s) 2013.

  9. Intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injuries.

    Science.gov (United States)

    Wangensteen, Arnlaug; Tol, Johannes L; Roemer, Frank W; Bahr, Roald; Dijkstra, H Paul; Crema, Michel D; Farooq, Abdulaziz; Guermazi, Ali

    2017-04-01

    To assess and compare the intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injury. Male athletes (n=40) with clinical diagnosis of acute hamstring injury and MRI ≤5days were selected from a prospective cohort. Two radiologists independently evaluated the MRIs using standardised scoring form including the modified Peetrons grading system, the Chan acute muscle strain injury classification and the British Athletics Muscle Injury Classification. Intra-and interrater reliability was assessed with linear weighted kappa (κ) or unweighted Cohen's κ and percentage agreement was calculated. We observed 'substantial' to 'almost perfect' intra- (κ range 0.65-1.00) and interrater reliability (κ range 0.77-1.00) with percentage agreement 83-100% and 88-100%, respectively, for severity gradings, overall anatomical sites and overall classifications for the three MRI systems. We observed substantial variability (κ range -0.05 to 1.00) for subcategories within the Chan classification and the British Athletics Muscle Injury Classification, however, the prevalence of positive scorings was low for some subcategories. The modified Peetrons grading system, overall Chan classification and overall British Athletics Muscle Injury Classification demonstrated 'substantial' to 'almost perfect' intra- and interrater reliability when scored by experienced radiologists. The intra- and interrater reliability for the anatomical subcategories within the classifications remains unclear. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Interrater and Intrarater Reliability of the Balance Computerized Adaptive Test in Patients With Stroke.

    Science.gov (United States)

    Chiang, Hsin-Yu; Lu, Wen-Shian; Yu, Wan-Hui; Hsueh, I-Ping; Hsieh, Ching-Lin

    2018-04-11

    To examine the interrater and intrarater reliability of the Balance Computerized Adaptive Test (Balance CAT) in patients with chronic stroke having a wide range of balance functions. Repeated assessments design (1wk apart). Seven teaching hospitals. A pooled sample (N=102) including 2 independent groups of outpatients (n=50 for the interrater reliability study; n=52 for the intrarater reliability study) with chronic stroke. Not applicable. Balance CAT. For the interrater reliability study, the values of intraclass correlation coefficient, minimal detectable change (MDC), and percentage of MDC (MDC%) for the Balance CAT were .84, 1.90, and 31.0%, respectively. For the intrarater reliability study, the values of intraclass correlation coefficient, MDC, and MDC% ranged from .89 to .91, from 1.14 to 1.26, and from 17.1% to 18.6%, respectively. The Balance CAT showed sufficient intrarater reliability in patients with chronic stroke having balance functions ranging from sitting with support to independent walking. Although the Balance CAT may have good interrater reliability, we found substantial random measurement error between different raters. Accordingly, if the Balance CAT is used as an outcome measure in clinical or research settings, same raters are suggested over different time points to ensure reliable assessments. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  11. Inter-rater reliability of measures to characterize the tobacco retail environment in Mexico

    Directory of Open Access Journals (Sweden)

    Marissa G Hall

    2015-11-01

    Full Text Available Objective. To evaluate the inter-rater reliability of a data collection instrument to assess the tobacco retail environ- ment in Mexico, after major marketing regulations were implemented. Materials and methods. In 2013, two data collectors independently evaluated 21 stores in two census tracts, through a data collection instrument that assessed the presence of price promotions, whether single cigarettes were sold, the number of visible advertisements, the pre- sence of signage prohibiting the sale of cigarettes to minors, and characteristics of cigarette pack displays. We evaluated the inter-rater reliability of the collected data, through the calculation of metrics such as intraclass correlation coefficient, percent agreement, Cohen’s kappa and Krippendorff’s alpha. Results. Most measures demonstrated substantial or perfect inter-rater reliability. Conclusions. Our results indicate the potential utility of the data collection instrument for future point-of-sale research.

  12. Orthopaedic nurses' knowledge and interrater reliability of neurovascular assessments with 2-point discrimination test.

    Science.gov (United States)

    Turney, Jennifer; Raley Noble, Deana; Kim, Son Chae

    2013-01-01

    : This study was conducted to evaluate the effects of education on knowledge and interrater reliability of neurovascular assessments with 2-point discrimination (2-PD) test among pediatric orthopaedic nurses. : A pre- and posttest study was done among 60 nurses attending 2-hour educational sessions. Neurovascular assessments with 2-PD test were performed on 64 casted pediatric patients by the nurses and 5 nurse experts before and after the educational sessions. : The mean neurovascular assessment knowledge score was improved at posteducation compared with the preeducation (p < .001). The 2-PD test interrater reliability also improved from Cohen's kappa value of 0.24-0.48 at posteducation. : The 2-hour educational session may be effective in improving nurses' knowledge and the interrater reliability of neurovascular assessment with 2-PD test.

  13. Introducing a new definition of a near fall: intra-rater and inter-rater reliability.

    Science.gov (United States)

    Maidan, I; Freedman, T; Tzemah, R; Giladi, N; Mirelman, A; Hausdorff, J M

    2014-01-01

    Near falls (NFs) are more frequent than falls, and may occur before falls, potentially predicting fall risk. As such, identification of a NF is important. We aimed to assess intra and inter-rater reliability of the traditional definition of a NF and to demonstrate the potential utility of a new definition. To this end, 10 older adults, 10 idiopathic elderly fallers, and 10 patients with Parkinson's disease (PD) walked in an obstacle course while wearing a safety harness. All walks were videotaped. Forty-nine video segments were extracted to create 2 clips each of 8.48 min. Four raters scored each event using the traditional definition and, two weeks later, using the new definition. A fifth rater used only the new definition. Intra-rater reliability was determined using Kappa (K) statistics and inter-rater reliability was determined using ICC. Using the traditional definition, three raters had poor intra-rater reliability (K0.137) and one rater had moderate intra-rater reliability (K=0.624, pdefinition, inter-rater reliability between the four raters was moderate (ICC=0.667, pdefinition showed high intra-rater (K>0.601, pdefinition of NF is required. The results of the present study suggest that the proposed new definition increases intra and inter-rater reliability, a critical step for using NFs to quantify fall risk. Copyright © 2013 Elsevier B.V. All rights reserved.

  14. Inter-rater reliability of the South African Triage Scale: Assessing two different cadres of health care workers in a real time environment

    Directory of Open Access Journals (Sweden)

    Michèle Twomey

    2011-09-01

    Conclusion: The inter-rater reliability of SATS ratings is excellent within individual HCWs, but significantly lower between different HCWs. This confirms previous reliability studies of the SATS using vignettes and if validated by larger studies would support the feasibility of further implementation of the SATS in primary health care settings across the Western Cape.

  15. Intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injuries

    International Nuclear Information System (INIS)

    Wangensteen, Arnlaug; Tol, Johannes L.; Roemer, Frank W.; Bahr, Roald; Dijkstra, H. Paul; Crema, Michel D.; Farooq, Abdulaziz; Guermazi, Ali

    2017-01-01

    Highlights: • Three different MRI grading and classification systems for acute hamstring injuries are overall reliable. • Reliability for the subcategories within these MRI grading and classification systems remains, however, unclear. - Abstract: Objective: To assess and compare the intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injury. Methods: Male athletes (n = 40) with clinical diagnosis of acute hamstring injury and MRI ≤5 days were selected from a prospective cohort. Two radiologists independently evaluated the MRIs using standardised scoring form including the modified Peetrons grading system, the Chan acute muscle strain injury classification and the British Athletics Muscle Injury Classification. Intra-and interrater reliability was assessed with linear weighted kappa (κ) or unweighted Cohen's κ and percentage agreement was calculated. Results: We observed ‘substantial’ to ‘almost perfect’ intra- (κ range 0.65–1.00) and interrater reliability (κ range 0.77–1.00) with percentage agreement 83–100% and 88–100%, respectively, for severity gradings, overall anatomical sites and overall classifications for the three MRI systems. We observed substantial variability (κ range −0.05 to 1.00) for subcategories within the Chan classification and the British Athletics Muscle Injury Classification, however, the prevalence of positive scorings was low for some subcategories. Conclusions: The modified Peetrons grading system, overall Chan classification and overall British Athletics Muscle Injury Classification demonstrated ‘substantial' to ‘almost perfect' intra- and interrater reliability when scored by experienced radiologists. The intra- and interrater reliability for the anatomical subcategories within the classifications remains unclear.

  16. Intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injuries

    Energy Technology Data Exchange (ETDEWEB)

    Wangensteen, Arnlaug, E-mail: arnlaug.wangensteen@nih.no [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Oslo Sports Trauma Research Center, Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo (Norway); Tol, Johannes L., E-mail: johannes.tol@aspetar.com [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Amsterdam Center for Evidence Sports Medicine, Academic Medical Center (Netherlands); The Sports Physician Group, OLVG, Amsterdam (Netherlands); Roemer, Frank W. [Quantitative Imaging Center, Department of Radiology, Boston University School of Medicine, Boston, MA (United States); Department of Radiology, University of Erlangen-Nuremberg, Erlangen (Germany); Bahr, Roald [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Oslo Sports Trauma Research Center, Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo (Norway); Dijkstra, H. Paul [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Crema, Michel D. [Quantitative Imaging Center, Department of Radiology, Boston University School of Medicine, Boston, MA (United States); Department of Radiology, Saint-Antoine Hospital, University Paris VI, Paris (France); Farooq, Abdulaziz [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Guermazi, Ali [Quantitative Imaging Center, Department of Radiology, Boston University School of Medicine, Boston, MA (United States)

    2017-04-15

    Highlights: • Three different MRI grading and classification systems for acute hamstring injuries are overall reliable. • Reliability for the subcategories within these MRI grading and classification systems remains, however, unclear. - Abstract: Objective: To assess and compare the intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injury. Methods: Male athletes (n = 40) with clinical diagnosis of acute hamstring injury and MRI ≤5 days were selected from a prospective cohort. Two radiologists independently evaluated the MRIs using standardised scoring form including the modified Peetrons grading system, the Chan acute muscle strain injury classification and the British Athletics Muscle Injury Classification. Intra-and interrater reliability was assessed with linear weighted kappa (κ) or unweighted Cohen's κ and percentage agreement was calculated. Results: We observed ‘substantial’ to ‘almost perfect’ intra- (κ range 0.65–1.00) and interrater reliability (κ range 0.77–1.00) with percentage agreement 83–100% and 88–100%, respectively, for severity gradings, overall anatomical sites and overall classifications for the three MRI systems. We observed substantial variability (κ range −0.05 to 1.00) for subcategories within the Chan classification and the British Athletics Muscle Injury Classification, however, the prevalence of positive scorings was low for some subcategories. Conclusions: The modified Peetrons grading system, overall Chan classification and overall British Athletics Muscle Injury Classification demonstrated ‘substantial' to ‘almost perfect' intra- and interrater reliability when scored by experienced radiologists. The intra- and interrater reliability for the anatomical subcategories within the classifications remains unclear.

  17. Inter-rater reliability of shoulder measurements in middle-aged women.

    Science.gov (United States)

    De Groef, A; Van Kampen, M; Vervloesem, N; Clabau, E; Christiaens, M-R; Neven, P; Geraerts, I; Struyf, F; Devoogdt, N

    2017-06-01

    To investigate inter-rater reliability of a set of shoulder measurements including inclinometry [shoulder range of motion (ROM)], acromion-table distance and pectoralis minor muscle length (static scapular positioning), upward rotation with two inclinometers (scapular kinematics) and pain pressure thresholds (muscle tenderness) in middle-aged women. Observational study. Thirty symptom-free middle-aged women (first cohort) were measured by two raters. All measurements with an intraclass correlation coefficient (ICC) below 0.75 were retested after an additional training period in a second cohort of 30 symptom-free middle-aged women. Inter-rater reliability of all variables was measured with the ICC (95% confidence interval) and standard error of measurement (SEM). Acromion-table distance (ICC=0.91, SEM 0.22 to 0.28% of body length), pectoralis minor muscle length (ICC=0.91, SEM 0.16% of body length), pain pressure thresholds (ICC=0.78 to 0.85, SEM 0.39 to 0.70kg) and abduction ROM (ICC=0.77, SEM 5°) showed good to excellent inter-rater reliability in the first cohort. After an additional training period, forward flexion ROM showed good inter-rater reliability (ICC=0.83, SEM 5°), scapular upward rotation in resting position showed moderate reliability (ICC=0.52, SEM 2°), and other scaption angles showed weak reliability (ICC=0.26 to 0.43, SEM 3 to 8°). In a battery of clinical tools to evaluate factors contributing to shoulder pain, static scapular positioning and pressure pain thresholds were found to have good to excellent inter-rater reliability in middle-aged women. Additional training is recommended for measurements with a gravity inclinometer. Copyright © 2016 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.

  18. IRR (Inter-Rater Reliability) of a COP (Classroom Observation Protocol)--A Critical Appraisal

    Science.gov (United States)

    Rui, Ning; Feldman, Jill M.

    2012-01-01

    Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…

  19. Ultrasound assessment for grading structural tendon changes in supraspinatus tendinopathy: an inter-rater reliability study

    DEFF Research Database (Denmark)

    Ingwersen, Kim Gordon; Hjarbæk, John; Eshøj, Henrik

    2016-01-01

    Aim To evaluate the inter-rater reliability of measuring structural changes in the tendon of patients, clinically diagnosed with supraspinatus tendinopathy (cases) and healthy participants (controls), on ultrasound (US) images captured by standardised procedures. Methods A total of 40 participant...

  20. The Effect of Instrument-Specific Rater Training on Interrater Reliability and Counseling Skills Performance Differentiation

    Science.gov (United States)

    Meacham, Paul Douglas, Jr.

    2013-01-01

    The purpose of this study was to explore the effect of instrument-specific rater training on interrater reliability (IRR) and counseling skills performance differentiation. Strong IRR is of primary concern to effective program evaluation (McCullough, Kuhn, Andrews, Valen, Hatch, & Osimo, 2003; Schanche, Nielsen, McCullough, Valen, &…

  1. Application of STOPP and START criteria: interrater reliability among pharmacists.

    LENUS (Irish Health Repository)

    Ryan, Cristin

    2009-07-01

    Inappropriate prescribing is a well-documented problem in older people. The new screening tools, STOPP (Screening Tool of Older Peoples\\' Prescriptions) and START (Screening Tool to Alert doctors to Right Treatment) have been formulated to identify potentially inappropriate medications (PIMs) and potential errors of omissions (PEOs) in older patients. Consistent, reliable application of STOPP and START is essential for the screening tools to be used effectively by pharmacists.

  2. Interrater reliability of the Volume-Viscosity Swallow Test; screening for dysphagia among hospitalized elderly medical patients.

    Science.gov (United States)

    Jørgensen, Lise Walther; Søndergaard, Kasper; Melgaard, Dorte; Warming, Susan

    2017-12-01

    Oropharyngeal dysphagia (OD) is prevalent among medical and geriatric patients admitted due to acute illness and it is associated with malnutrition, increased length of stay and increased mortality. A valid and reliable bedside screening test for patients at risk of OD is essential in order to detect patients in need of further assessment. The Volume-Viscosity Swallow Test (V-VST) has been shown to be a valid screening test for OD in mixed outpatient populations. However, as reliability of the test has yet to be investigated in a population of medical and geriatric patients admitted due to acute illness, we aimed to determine the interrater reliability of the V-VST in this clinical setting. Reporting in this study is in accordance with proposed guidelines for the reporting of reliability and agreement studies (GRRAS). In three Danish hospitals (CRD-BFH, CRD-GH, NDR-H) 11 skilled occupational therapists examined an unselected group of 110 patients admitted to geriatric or medical wards. In an overall agreement phase raters reached ≥80% agreement before data collection phase was commenced. The V-VST was applied to patients twice within maximum one hour by raters who administrated the test in an order based on randomization, blinded to each other's results. Agreement, Kappa values, weighed Kappa values and Kappa adjusted for bias and prevalence are reported. The interrater reliability of V-VST as screening test for OD in patients admitted to geriatric or medical wards was substantial with an overall Kappa value of 0.77 (95% CI 0.65-0.89) however interrater reliability varied among hospitals ranging from 0.37 (95% CI -0.01 to 0.41) to 0.85 (95% CI 0.75-1.00). Interrater reliability of the accompanying recommendations of volume and viscosity was moderate with a weighted kappa value of 0.55 (95% CI 0.37-0.73) for viscosity and 0.53 (95% CI 0.36-0.7) for volume. The overall prevalence of OD was 34.5%, ranging from 8% to 53.6% across hospitals. The prevalence and bias

  3. Intra and Inter-Rater Reliability of Screening for Movement Impairments: Movement Control Tests from The Foundation Matrix

    Science.gov (United States)

    Mischiati, Carolina R.; Comerford, Mark; Gosford, Emma; Swart, Jacqueline; Ewings, Sean; Botha, Nadine; Stokes, Maria; Mottram, Sarah L.

    2015-01-01

    Pre-season screening is well established within the sporting arena, and aims to enhance performance and reduce injury risk. With the increasing need to identify potential injury with greater accuracy, a new risk assessment process has been produced; The Performance Matrix (battery of movement control tests). As with any new method of objective testing, it is fundamental to establish whether the same results can be reproduced between examiners and by the same examiner on consecutive occasions. This study aimed to determine the intra-rater test re-test and inter-rater reliability of tests from a component of The Performance Matrix, The Foundation Matrix. Twenty participants were screened by two experienced musculoskeletal therapists using nine tests to assess the ability to control movement during specific tasks. Movement evaluation criteria for each test were rated as pass or fail. The therapists observed participants real-time and tests were recorded on video to enable repeated ratings four months later to examine intra-rater reliability (videos rated two weeks apart). Overall test percentage agreement was 87% for inter-rater reliability; 98% Rater 1, 94% Rater 2 for test re-test reliability; and 75% for real-time versus video. Intraclass-correlation coefficients (ICCs) were excellent between raters (0.81) and within raters (Rater 1, 0.96; Rater 2, 0.88) but poor for real-time versus video (0.23). Reliability for individual components of each test was more variable: inter-rater, 68-100%; intra-rater, 88-100% Rater 1, 75-100% Rater 2; and real-time versus video 31-100%. Cohen’s Kappa values for inter-rater reliability were 0.0-1.0; intra-rater 0.6-1.0 for Rater 1; -0.1-1.0 for Rater 2; and -0.1-1 for real-time versus video. It is concluded that both inter and intra-rater reliability of tests in The Foundation Matrix are acceptable when rated by experienced therapists. Recommendations are made for modifying some of the criteria to improve reliability where

  4. Inter-rater reliability of assessment of levator ani muscle strength and attachment to the pubic bone in nulliparous women.

    Science.gov (United States)

    van Delft, K; Schwertner-Tiepelmann, N; Thakar, R; Sultan, A H

    2013-09-01

    The modified Oxford scale (MOS) has been found previously to have poor inter-rater reliability, whereas digital assessment of levator ani muscle (LAM) attachment to the pubic bone has been shown to have acceptable reliability. Our aim was to evaluate inter-rater reliability of the validated MOS and to develop a reliable classification system for digital assessment of LAM attachment, correlating this to findings on transperineal ultrasound (TPUS) examination. Evaluation of the MOS by palpation was performed in nulliparous women by two investigators. LAM attachment was evaluated using digital palpation, for which a novel classification system was developed with four grades based on the position of the attachment and presence of discernible muscle. Findings were compared with those on TPUS examination. Inter-rater reliability was assessed using Cohen's kappa statistic. Twenty-five nulliparous women were examined. There was agreement in MOS scores between the investigators in 64% of women (n = 16), with a kappa of 0.66 (indicating substantial agreement). There was agreement in palpation of LAM attachment using the new grading system in 96% of women (n = 24), with a kappa of 0.90 (indicating almost perfect agreement). TPUS examination did not show LAM avulsion in any woman, with the exception of one with a partial avulsion. In this group of nulliparous patients, there was substantial agreement between the two investigators in evaluation of the MOS and there was good agreement between grades of LAM attachment using the new classification system, which correlated with findings on TPUS examination. It therefore appears that these results are reproducible in nulliparous women and the techniques can be readily learned and reliably incorporated into clinical practice and research after appropriate training. Further research is required to establish clinical utility of the grading system for LAM attachment in postpartum women and in women with symptomatic pelvic organ

  5. INTER-RATER RELIABILITY FOR MOVEMENT PATTERN ANALYSIS (MPA: MEASURING PATTERNING OF BEHAVIORS VERSUS DISCRETE BEHAVIOR COUNTS AS INDICATORS OF DECISION-MAKING STYLE

    Directory of Open Access Journals (Sweden)

    Brenda L Connors

    2014-06-01

    Full Text Available The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic – the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts – and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from Movement Pattern Analysis (MPA, an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective, inter-rater reliability for patterning (proportional indicators of each factor was significantly higher and excellent (ICC = .89. Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring discrete behavioral counts versus patterning of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns.

  6. Inter-rater reliability of case-note audit: a systematic review.

    Science.gov (United States)

    Lilford, Richard; Edwards, Alex; Girling, Alan; Hofer, Timothy; Di Tanna, Gian Luca; Petty, Jane; Nicholl, Jon

    2007-07-01

    The quality of clinical care is often assessed by retrospective examination of case-notes (charts, medical records). Our objective was to determine the inter-rater reliability of case-note audit. We conducted a systematic review of the inter-rater reliability of case-note audit. Analysis was restricted to 26 papers reporting comparisons of two or three raters making independent judgements about the quality of care. Sixty-six separate comparisons were possible, since some papers reported more than one measurement of reliability. Mean kappa values ranged from 0.32 to 0.70. These may be inflated due to publication bias. Measured reliabilities were found to be higher for case-note reviews based on explicit, as opposed to implicit, criteria and for reviews that focused on outcome (including adverse effects) rather than process errors. We found an association between kappa and the prevalence of errors (poor quality care), suggesting alternatives such as tetrachoric and polychoric correlation coefficients be considered to assess inter-rater reliability. Comparative studies should take into account the relationship between kappa and the prevalence of the events being measured.

  7. Feasibility and Inter-Rater Reliability of Physical Performance Measures in Acutely Admitted Older Medical Patients

    DEFF Research Database (Denmark)

    Bodilsen, Ann Christine; Juul-Larsen, Helle Gybel; Petersen, Janne

    2015-01-01

    OBJECTIVE: Physical performance measures can be used to predict functional decline and increased dependency in older persons. However, few studies have assessed the feasibility or reliability of such measures in hospitalized older patients. Here we assessed the feasibility and inter-rater reliabi......OBJECTIVE: Physical performance measures can be used to predict functional decline and increased dependency in older persons. However, few studies have assessed the feasibility or reliability of such measures in hospitalized older patients. Here we assessed the feasibility and inter......-rater reliability of four simple measures of physical performance in acutely admitted older medical patients. DESIGN: During the first 24 hours of hospitalization, the following were assessed twice by different raters in 52 (≥ 65 years) patients admitted for acute medical illness: isometric hand grip strength, 4......, and 30-s chair stand were 8%, 7%, and 18%, and the SRD95% values were 22%, 17%, and 49%. CONCLUSION: In acutely admitted older medical patients, grip strength, gait speed, and the Cumulated Ambulation Score measurements were feasible and showed high inter-rater reliability when administered by different...

  8. Quality of the Critical Incident Technique in practice: Interrater reliability and users' acceptance under real conditions

    Directory of Open Access Journals (Sweden)

    ANNA KOCH

    2009-03-01

    Full Text Available The Critical Incident Technique (CIT is a widely used task analysis method in personnel psychology. While studies on psychometric properties of the CIT so far primarily took into account relevance ratings of task-lists or attributes, and hence, only a smaller or adapted part of the CIT, little is known about the psychometric properties of the complete CIT in its most meaningful and fruitful way. Therefore, the aim of the present study was to assess interrater reliability and the participants’ view of the CIT under real conditions and especially to provide data for the key step of the CIT: the classification of behavior descriptions into requirements. Additionally, the cost-benefit-ratio and practicability were rated from the participants’ views as an important indicator for the acceptance of the task analysis approach in practice. Instructors of German Institutions for Statutory Accidents Insurance and Prevention as well as their supervisors took part in a job analysis with the CIT. Moderate interrater reliability for the relevance rating was found while the classification step yielded unexpectedly low coefficients for interrater reliability. The cost-benefit-ratio and practicability of the complete CIT were rated very positive. The results are discussed in relation to determinants that facilitate or impede the application of task analysis procedures.

  9. Blinded evaluation of interrater reliability of an operative competency assessment tool for direct laryngoscopy and rigid bronchoscopy.

    Science.gov (United States)

    Ishman, Stacey L; Benke, James R; Johnson, Kaalan Erik; Zur, Karen B; Jacobs, Ian N; Thorne, Marc C; Brown, David J; Lin, Sandra Y; Bhatti, Nasir; Deutsch, Ellen S

    2012-10-01

    OBJECTIVES To confirm interrater reliability using blinded evaluation of a skills-assessment instrument to assess the surgical performance of resident and fellow trainees performing pediatric direct laryngoscopy and rigid bronchoscopy in simulated models. DESIGN Prospective, paired, blinded observational validation study. SUBJECTS Paired observers from multiple institutions simultaneously evaluated residents and fellows who were performing surgery in an animal laboratory or using high-fidelity manikins. The evaluators had no previous affiliation with the residents and fellows and did not know their year of training. INTERVENTIONS One- and 2-page versions of an objective structured assessment of technical skills (OSATS) assessment instrument composed of global and a task-specific surgical items were used to evaluate surgical performance. RESULTS Fifty-two evaluations were completed by 17 attending evaluators. The instrument agreement for the 2-page assessment was 71.4% when measured as a binary variable (ie, competent vs not competent) (κ = 0.38; P = .08). Evaluation as a continuous variable revealed a 42.9% percentage agreement (κ = 0.18; P = .14). The intraclass correlation was 0.53, considered substantial/good interrater reliability (69% reliable). For the 1-page instrument, agreement was 77.4% when measured as a binary variable (κ = 0.53, P = .0015). Agreement when evaluated as a continuous measure was 71.0% (κ = 0.54, P formative feedback on operational competency.

  10. Inter-rater reliability of healthcare professional skills' portfolio assessments: The Andalusian Agency for Healthcare Quality model

    Directory of Open Access Journals (Sweden)

    Antonio Almuedo-Paz

    2014-07-01

    Full Text Available This study aims to determine the reliability of assessment criteria used for a portfolio at the Andalusian Agency for Healthcare Quality (ACSA. Data: all competences certification processes, regardless of their discipline. Period: 2010-2011. Three types of tests are used: 368 certificates, 17,895 reports and 22,642 clinical practice reports (N = 3,010 candidates. The tests were evaluated in pairs by the ACSA team of raters using two categories: valid and invalid. Results: The percentage agreement in assessments of certificates was 89,9%, while for the reports of clinical practice was 85,1 % and for clinical practice reports was 81,7%. The inter-rater agreement coefficients (kappa ranged from 0,468 to 0,711. Discussion: The results of this study show that the inter-rater reliability of assessments varies from fair to good. Compared with other similar studies, the results put the reliability of the model in a comfortable position. Among the improvements incorporated, progressive automation of evaluations must be highlighted.

  11. Interrater reliability of quantitative ultrasound using force feedback among examiners with varied levels of experience

    Directory of Open Access Journals (Sweden)

    Michael O. Harris-Love

    2016-06-01

    Full Text Available Background. Quantitative ultrasound measures are influenced by multiple external factors including examiner scanning force. Force feedback may foster the acquisition of reliable morphometry measures under a variety of scanning conditions. The purpose of this study was to determine the reliability of force-feedback image acquisition and morphometry over a range of examiner-generated forces using a muscle tissue-mimicking ultrasound phantom. Methods. Sixty material thickness measures were acquired from a muscle tissue mimicking phantom using B-mode ultrasound scanning by six examiners with varied experience levels (i.e., experienced, intermediate, and novice. Estimates of interrater reliability and measurement error with force feedback scanning were determined for the examiners. In addition, criterion-based reliability was determined using material deformation values across a range of examiner scanning forces (1–10 Newtons via automated and manually acquired image capture methods using force feedback. Results. All examiners demonstrated acceptable interrater reliability (intraclass correlation coefficient, ICC = .98, p .90, p < .001, independent of their level of experience. The measurement error among all examiners was 1.5%–2.9% across all applied stress conditions. Conclusion. Manual image capture with force feedback may aid the reliability of morphometry measures across a range of examiner scanning forces, and allow for consistent performance among examiners with differing levels of experience.

  12. Inter-rater reliability of an observation-based ergonomics assessment checklist for office workers.

    Science.gov (United States)

    Pereira, Michelle Jessica; Straker, Leon Melville; Comans, Tracy Anne; Johnston, Venerina

    2016-12-01

    To establish the inter-rater reliability of an observation-based ergonomics assessment checklist for computer workers. A 37-item (38-item if a laptop was part of the workstation) comprehensive observational ergonomics assessment checklist comparable to government guidelines and up to date with empirical evidence was developed. Two trained practitioners assessed full-time office workers performing their usual computer-based work and evaluated the suitability of workstations used. Practitioners assessed each participant consecutively. The order of assessors was randomised, and the second assessor was blinded to the findings of the first. Unadjusted kappa coefficients between the raters were obtained for the overall checklist and subsections that were formed from question-items relevant to specific workstation equipment. Twenty-seven office workers were recruited. The inter-rater reliability between two trained practitioners achieved moderate to good reliability for all except one checklist component. This checklist has mostly moderate to good reliability between two trained practitioners. Practitioner Summary: This reliable ergonomics assessment checklist for computer workers was designed using accessible government guidelines and supplemented with up-to-date evidence. Employers in Queensland (Australia) can fulfil legislative requirements by using this reliable checklist to identify and subsequently address potential risk factors for work-related injury to provide a safe working environment.

  13. Examination of anomalous self-experience in first-episode psychosis: interrater reliability

    DEFF Research Database (Denmark)

    Møller, Paul; Haug, Elisabeth; Raballo, Andrea

    2011-01-01

    -rater correlation above 0.80 (Spearman's rho, p values at an item level were very good in 9 items, good in 20 items, moderate in 11 items and fair in 4 items. Conclusion: The EASE provides a reliable and internally......) is a phenomenologically inspired checklist, specifically designed to support the comprehensive assessment of these characteristic subjective experiences. Aim: To assess the interrater reliability of the EASE. Sampling and Methods: Twenty-five first-episode psychosis (FEP) patients were interviewed with the EASE...

  14. Intra and inter-rater reliability study of pelvic floor muscle dynamometric measurements

    Directory of Open Access Journals (Sweden)

    Natalia M. Martinho

    2015-04-01

    Full Text Available OBJECTIVE: The aim of this study was to evaluate the intra and inter-rater reliability of pelvic floor muscle (PFM dynamometric measurements for maximum and average strengths, as well as endurance. METHOD: A convenience sample of 18 nulliparous women, without any urogynecological complaints, aged between 19 and 31 (mean age of 25.4±3.9 participated in this study. They were evaluated using a pelvic floor dynamometer based on load cell technology. The dynamometric evaluations were repeated in three successive sessions: two on the same day with a rest period of 30 minutes between them, and the third on the following day. All participants were evaluated twice in each session; first by examiner 1 followed by examiner 2. The vaginal dynamometry data were analyzed using three parameters: maximum strength, average strength, and endurance. The Intraclass Correlation Coefficient (ICC was applied to estimate the PFM dynamometric measurement reliability, considering a good level as being above 0.75. RESULTS: The intra and inter-raters' analyses showed good reliability for maximum strength (ICCintra-rater1=0.96, ICCintra-rater2=0.95, and ICCinter-rater=0.96, average strength (ICCintra-rater1=0.96, ICCintra-rater2=0.94, and ICCinter-rater=0.97, and endurance (ICCintra-rater1=0.88, ICCintra-rater2=0.86, and ICCinter-rater=0.92 dynamometric measurements. CONCLUSIONS: The PFM dynamometric measurements showed good intra- and inter-rater reliability for maximum strength, average strength and endurance, which demonstrates that this is a reliable device that can be used in clinical practice.

  15. Inter-rater reliability of AMSTAR is dependent on the pair of reviewers.

    Science.gov (United States)

    Pieper, Dawid; Jacobs, Anja; Weikert, Beate; Fishta, Alba; Wegewitz, Uta

    2017-07-11

    Inter-rater reliability (IRR) is mainly assessed based on only two reviewers of unknown expertise. The aim of this paper is to examine differences in the IRR of the Assessment of Multiple Systematic Reviews (AMSTAR) and R(evised)-AMSTAR depending on the pair of reviewers. Five reviewers independently applied AMSTAR and R-AMSTAR to 16 systematic reviews (eight Cochrane reviews and eight non-Cochrane reviews) from the field of occupational health. Responses were dichotomized and reliability measures were calculated by applying Holsti's method (r) and Cohen's kappa (κ) to all potential pairs of reviewers. Given that five reviewers participated in the study, there were ten possible pairs of reviewers. Inter-rater reliability varied for AMSTAR between r = 0.82 and r = 0.98 (median r = 0.88) using Holsti's method and κ = 0.41 and κ = 0.69 (median κ = 0.52) using Cohen's kappa and for R-AMSTAR between r = 0.77 and r = 0.89 (median r = 0.82) and κ = 0.32 and κ = 0.67 (median κ = 0.45) depending on the pair of reviewers. The same pair of reviewers yielded the highest IRR for both instruments. Pairwise Cohen's kappa reliability measures showed a moderate correlation between AMSTAR and R-AMSTAR (Spearman's ρ =0.50). The mean inter-rater reliability for AMSTAR was highest for item 1 (κ = 1.00) and item 5 (κ = 0.78), while lowest values were found for items 3, 8, 9 and 11, which showed only fair agreement. Inter-rater reliability varies widely depending on the pair of reviewers. There may be some shortcomings associated with conducting reliability studies with only two reviewers. Further studies should include additional reviewers and should probably also take account of their level of expertise.

  16. Interrater and intrarater reliability of the Knosp scale for pituitary adenoma grading.

    Science.gov (United States)

    Mooney, Michael A; Hardesty, Douglas A; Sheehy, John P; Bird, Robert; Chapple, Kristina; White, William L; Little, Andrew S

    2017-05-01

    OBJECTIVE The goal of this study was to determine the interrater and intrarater reliability of the Knosp grading scale for predicting pituitary adenoma cavernous sinus (CS) involvement. METHODS Six independent raters (3 neurosurgery residents, 2 pituitary surgeons, and 1 neuroradiologist) participated in the study. Each rater scored 50 unique pituitary MRI scans (with contrast) of biopsy-proven pituitary adenoma. Reliabilities for the full scale were determined 3 ways: 1) using all 50 scans, 2) using scans with midrange scores versus end scores, and 3) using a dichotomized scale that reflects common clinical practice. The performance of resident raters was compared with that of faculty raters to assess the influence of training level on reliability. RESULTS Overall, the interrater reliability of the Knosp scale was "strong" (0.73, 95% CI 0.56-0.84). However, the percent agreement for all 6 reviewers was only 10% (26% for faculty members, 30% for residents). The reliability of the middle scores (i.e., average rated Knosp Grades 1 and 2) was "very weak" (0.18, 95% CI -0.27 to 0.56) and the percent agreement for all reviewers was only 5%. When the scale was dichotomized into tumors unlikely to have intraoperative CS involvement (Grades 0, 1, and 2) and those likely to have CS involvement (Grades 3 and 4), the reliability was "strong" (0.60, 95% CI 0.39-0.75) and the percent agreement for all raters improved to 60%. There was no significant difference in reliability between residents and faculty (residents 0.72, 95% CI 0.55-0.83 vs faculty 0.73, 95% CI 0.56-0.84). Intrarater reliability was moderate to strong and increased with the level of experience. CONCLUSIONS Although these findings suggest that the Knosp grading scale has acceptable interrater reliability overall, it raises important questions about the "very weak" reliability of the scale's middle grades. By dichotomizing the scale into clinically useful groups, the authors were able to address the poor

  17. Interrater Reliability of the Power Mobility Road Test in the Virtual Reality-Based Simulator-2.

    Science.gov (United States)

    Kamaraj, Deepan C; Dicianno, Brad E; Mahajan, Harshal P; Buhari, Alhaji M; Cooper, Rory A

    2016-07-01

    To assess interrater reliability of the Power Mobility Road Test (PMRT) when administered through the Virtual Reality-based SIMulator-version 2 (VRSIM-2). Within-subjects repeated-measures design. Participants interacted with VRSIM-2 through 2 display options (desktop monitor vs immersive virtual reality screens) using 2 control interfaces (roller system vs conventional movement-sensing joystick), providing 4 different driving scenarios (driving conditions 1-4). Participants performed 3 virtual driving sessions for each of the 2 display screens and 1 session through a real-world driving course (driving condition 5). The virtual PMRT was conducted in a simulated indoor office space, and an equivalent course was charted in an open space for the real-world assessment. After every change in driving condition, participants completed a self-reported workload assessment questionnaire, the Task Load Index, developed by the National Aeronautics and Space Administration. A convenience sample of electric-powered wheelchair (EPW) athletes (N=21) recruited at the 31st National Veterans Wheelchair Games. Not applicable. Total composite PMRT score. The PMRT had high interrater reliability (intraclass correlation coefficient [ICC]>.75) between the 2 raters in all 5 driving conditions. Post hoc analyses revealed that the reliability analyses had >80% power to detect high ICCs in driving conditions 1 and 4. The PMRT has high interrater reliability in conditions 1 and 4 and could be used to assess EPW driving performance virtually in VRSIM-2. However, further psychometric assessment is necessary to assess the feasibility of administering the PMRT using the different interfaces of VRSIM-2. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  18. Chest computed tomography-based scoring of thoracic sarcoidosis: Inter-rater reliability of CT abnormalities

    Energy Technology Data Exchange (ETDEWEB)

    Heuvel, D.A.V. den; Es, H.W. van; Heesewijk, J.P. van; Spee, M. [St. Antonius Hospital Nieuwegein, Department of Radiology, Nieuwegein (Netherlands); Jong, P.A. de [University Medical Center Utrecht, Department of Radiology, Utrecht (Netherlands); Zanen, P.; Grutters, J.C. [University Medical Center Utrecht, Division Heart and Lungs, Utrecht (Netherlands); St. Antonius Hospital Nieuwegein, Center of Interstitial Lung Diseases, Department of Pulmonology, Nieuwegein (Netherlands)

    2015-09-15

    To determine inter-rater reliability of sarcoidosis-related computed tomography (CT) findings that can be used for scoring of thoracic sarcoidosis. CT images of 51 patients with sarcoidosis were scored by five chest radiologists for various abnormal CT findings (22 in total) encountered in thoracic sarcoidosis. Using intra-class correlation coefficient (ICC) analysis, inter-rater reliability was analysed and reported according to the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) criteria. A pre-specified sub-analysis was performed to investigate the effect of training. Scoring was trained in a distinct set of 15 scans in which all abnormal CT findings were represented. Median age of the 51 patients (36 men, 70 %) was 43 years (range 26 - 64 years). All radiographic stages were present in this group. ICC ranged from 0.91 for honeycombing to 0.11 for nodular margin (sharp versus ill-defined). The ICC was above 0.60 in 13 of the 22 abnormal findings. Sub-analysis for the best-trained observers demonstrated an ICC improvement for all abnormal findings and values above 0.60 for 16 of the 22 abnormalities. In our cohort, reliability between raters was acceptable for 16 thoracic sarcoidosis-related abnormal CT findings. (orig.)

  19. Interrater reliability of Violence Risk Appraisal Guide scores provided in Canadian criminal proceedings.

    Science.gov (United States)

    Edens, John F; Penson, Brittany N; Ruchensky, Jared R; Cox, Jennifer; Smith, Shannon Toney

    2016-12-01

    Published research suggests that most violence risk assessment tools have relatively high levels of interrater reliability, but recent evidence of inconsistent scores among forensic examiners in adversarial settings raises concerns about the "field reliability" of such measures. This study specifically examined the reliability of Violence Risk Appraisal Guide (VRAG) scores in Canadian criminal cases identified in the legal database, LexisNexis. Over 250 reported cases were located that made mention of the VRAG, with 42 of these cases containing 2 or more scores that could be submitted to interrater reliability analyses. Overall, scores were skewed toward higher risk categories. The intraclass correlation (ICCA1) was .66, with pairs of forensic examiners placing defendants into the same VRAG risk "bin" in 68% of the cases. For categorical risk statements (i.e., low, moderate, high), examiners provided converging assessment results in most instances (86%). In terms of potential predictors of rater disagreement, there was no evidence for adversarial allegiance in our sample. Rater disagreement in the scoring of 1 VRAG item (Psychopathy Checklist-Revised; Hare, 2003), however, strongly predicted rater disagreement in the scoring of the VRAG (r = .58). (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  20. Intrarater and interrater reliability of pulse examination in traditional Indian Ayurvedic medicine.

    Science.gov (United States)

    Kurande, Vrinda; Waagepetersen, Rasmus; Toft, Egon; Prasad, Ramjee

    2013-09-01

    In Ayurveda, pulse examination ( nadipariksha ) is an important tool to assess the status of three doshas : vata , pitta , and kapha . Long historical use has been seen as a documentation of its efficacy; however, there is a lack of a quantitative measure of the reliability of the pulse examination method. The objective of this study was to test the intrarater and interrater reliability of pulse examination in Ayurveda. Fifteen registered Ayurvedic doctors with 3-15 years of experience examined the pulse of 20 healthy volunteers twice, for a total of 600 examinations. The examinations were performed blind and in a random order. Only the current status of dosha- specific methods of pulse examination were considered. Cohen's weighted κ statistic was used as a measure of intrarater and interrater reliability, and a hypothesis of homogeneous diagnosis (random rating) was tested. Following this, we tested whether proportions of ratings were equal between doctors. According to the Landis and Koch scale, the level of reliability ranged from poor to moderate. It was observed that the doctors more frequently diagnosed a combination of two doshas than a single dosha. The κ values were generally larger for experienced doctors ( p   =  0.04). Experience and proper training have important roles in pulse examination.

  1. Interrater reliability assessment using the Test of Gross Motor Development-2.

    Science.gov (United States)

    Barnett, Lisa M; Minto, Christine; Lander, Natalie; Hardy, Louise L

    2014-11-01

    The aim was to examine interrater reliability of the object control subtest from the Test of Gross Motor Development-2 by live observation in a school field setting. Reliability Study--cross sectional. Raters were rated on their ability to agree on (1) the raw total for the six object control skills; (2) each skill performance and (3) the skill components. Agreement for the object control subtest and the individual skills was assessed by an intraclass correlation (ICC) and a kappa statistic assessed for skill component agreement. A total of 37 children (65% girls) aged 4-8 years (M = 6.2, SD = 0.8) were assessed in six skills by two raters; equating to 222 skill tests. Interrater reliability was excellent for the object control subset (ICC = 0.93), and for individual skills, highest for the dribble (ICC = 0.94) followed by strike (ICC = 0.85), overhand throw (ICC = 0.84), underhand roll (ICC = 0.82), kick (ICC = 0.80) and the catch (ICC = 0.71). The strike and the throw had more components with less agreement. Even though the overall subtest score and individual skill agreement was good, some skill components had lower agreement, suggesting these may be more problematic to assess. This may mean some skill components need to be specified differently in order to improve component reliability. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  2. Chest computed tomography-based scoring of thoracic sarcoidosis: Inter-rater reliability of CT abnormalities

    International Nuclear Information System (INIS)

    Heuvel, D.A.V. den; Es, H.W. van; Heesewijk, J.P. van; Spee, M.; Jong, P.A. de; Zanen, P.; Grutters, J.C.

    2015-01-01

    To determine inter-rater reliability of sarcoidosis-related computed tomography (CT) findings that can be used for scoring of thoracic sarcoidosis. CT images of 51 patients with sarcoidosis were scored by five chest radiologists for various abnormal CT findings (22 in total) encountered in thoracic sarcoidosis. Using intra-class correlation coefficient (ICC) analysis, inter-rater reliability was analysed and reported according to the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) criteria. A pre-specified sub-analysis was performed to investigate the effect of training. Scoring was trained in a distinct set of 15 scans in which all abnormal CT findings were represented. Median age of the 51 patients (36 men, 70 %) was 43 years (range 26 - 64 years). All radiographic stages were present in this group. ICC ranged from 0.91 for honeycombing to 0.11 for nodular margin (sharp versus ill-defined). The ICC was above 0.60 in 13 of the 22 abnormal findings. Sub-analysis for the best-trained observers demonstrated an ICC improvement for all abnormal findings and values above 0.60 for 16 of the 22 abnormalities. In our cohort, reliability between raters was acceptable for 16 thoracic sarcoidosis-related abnormal CT findings. (orig.)

  3. Inter-Rater Reliability of Provider Interpretations of Irritable Bowel Syndrome Food and Symptom Journals.

    Science.gov (United States)

    Zia, Jasmine; Chung, Chia-Fang; Xu, Kaiyuan; Dong, Yi; Schenk, Jeanette M; Cain, Kevin; Munson, Sean; Heitkemper, Margaret M

    2017-11-04

    There are currently no standardized methods for identifying trigger food(s) from irritable bowel syndrome (IBS) food and symptom journals. The primary aim of this study was to assess the inter-rater reliability of providers' interpretations of IBS journals. A second aim was to describe whether these interpretations varied for each patient. Eight providers reviewed 17 IBS journals and rated how likely key food groups (fermentable oligo-di-monosaccharides and polyols, high-calorie, gluten, caffeine, high-fiber) were to trigger IBS symptoms for each patient. Agreement of trigger food ratings was calculated using Krippendorff's α-reliability estimate. Providers were also asked to write down recommendations they would give to each patient. Estimates of agreement of trigger food likelihood ratings were poor (average α = 0.07). Most providers gave similar trigger food likelihood ratings for over half the food groups. Four providers gave the exact same written recommendation(s) (range 3-7) to over half the patients. Inter-rater reliability of provider interpretations of IBS food and symptom journals was poor. Providers favored certain trigger food likelihood ratings and written recommendations. This supports the need for a more standardized method for interpreting these journals and/or more rigorous techniques to accurately identify personalized IBS food triggers.

  4. Inter-Rater Reliability of Provider Interpretations of Irritable Bowel Syndrome Food and Symptom Journals

    Directory of Open Access Journals (Sweden)

    Jasmine Zia

    2017-11-01

    Full Text Available There are currently no standardized methods for identifying trigger food(s from irritable bowel syndrome (IBS food and symptom journals. The primary aim of this study was to assess the inter-rater reliability of providers’ interpretations of IBS journals. A second aim was to describe whether these interpretations varied for each patient. Eight providers reviewed 17 IBS journals and rated how likely key food groups (fermentable oligo-di-monosaccharides and polyols, high-calorie, gluten, caffeine, high-fiber were to trigger IBS symptoms for each patient. Agreement of trigger food ratings was calculated using Krippendorff’s α-reliability estimate. Providers were also asked to write down recommendations they would give to each patient. Estimates of agreement of trigger food likelihood ratings were poor (average α = 0.07. Most providers gave similar trigger food likelihood ratings for over half the food groups. Four providers gave the exact same written recommendation(s (range 3–7 to over half the patients. Inter-rater reliability of provider interpretations of IBS food and symptom journals was poor. Providers favored certain trigger food likelihood ratings and written recommendations. This supports the need for a more standardized method for interpreting these journals and/or more rigorous techniques to accurately identify personalized IBS food triggers.

  5. Reliability and Validity of the Activity Participation Assessment for School-age Children in Korea

    Directory of Open Access Journals (Sweden)

    Se-Yun Kim

    2016-12-01

    Conclusion: The APA shows good internal reliability, test–retest reliability, discriminant validity, and construct validity. However, evidence of psychometric properties was limited by a small sample size. Psychometric properties such as interrater reliability as well as concurrent validity and construct validity need to be tested using a larger sample size with representative demographics.

  6. THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS).

    Science.gov (United States)

    McCunn, Robert; Aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

    2017-02-01

    The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the 'pure' intra-rater (intra-occasion) reliability for those movements. Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate

  7. Inter-rater reliability of nursing home quality indicators in the U.S

    Directory of Open Access Journals (Sweden)

    Roy Jason

    2003-11-01

    Full Text Available Abstract Background In the US, Quality Indicators (QI's profiling and comparing the performance of hospitals, health plans, nursing homes and physicians are routinely published for consumer review. We report the results of the largest study of inter-rater reliability done on nursing home assessments which generate the data used to derive publicly reported nursing home quality indicators. Methods We sampled nursing homes in 6 states, selecting up to 30 residents per facility who were observed and assessed by research nurses on 100 clinical assessment elements contained in the Minimum Data Set (MDS and compared these with the most recent assessment in the record done by facility nurses. Kappa statistics were generated for all data items and derived for 22 QI's over the entire sample and for each facility. Finally, facilities with many QI's with poor Kappa levels were compared to those with many QI's with excellent Kappa levels on selected characteristics. Results A total of 462 facilities in 6 states were approached and 219 agreed to participate, yielding a response rate of 47.4%. A total of 5758 residents were included in the inter-rater reliability analyses, around 27.5 per facility. Patients resembled the traditional nursing home resident, only 43.9% were continent of urine and only 25.2% were rated as likely to be discharged within the next 30 days. Results of resident level comparative analyses reveal high inter-rater reliability levels (most items >.75. Using the research nurses as the "gold standard", we compared composite quality indicators based on their ratings with those based on facility nurses. All but two QI's have adequate Kappa levels and 4 QI's have average Kappa values in excess of .80. We found that 16% of participating facilities performed poorly (Kappa .75 on 12 or more QI's. No facility characteristics were related to reliability of the data on which Qis are based. Conclusion While a few QI's being used for public reporting

  8. Inter-rater reliability of the evaluation of muscular chains associated with posture alterations in scoliosis

    Directory of Open Access Journals (Sweden)

    Fortin Carole

    2012-05-01

    Full Text Available Abstract Background In the Global postural re-education (GPR evaluation, posture alterations are associated with anterior or posterior muscular chain impairments. Our goal was to assess the reliability of the GPR muscular chain evaluation. Methods Design: Inter-rater reliability study. Fifty physical therapists (PTs and two experts trained in GPR assessed the standing posture from photographs of five youths with idiopathic scoliosis using a posture analysis grid with 23 posture indices (PI. The PTs and experts indicated the muscular chain associated with posture alterations. The PTs were also divided into three groups according to their experience in GPR. Experts’ results (after consensus were used to verify agreement between PTs and experts for muscular chain and posture assessments. We used Kappa coefficients (K and the percentage of agreement (%A to assess inter-rater reliability and intra-class coefficients (ICC for determining agreement between PTs and experts. Results For the muscular chain evaluation, reliability was moderate to substantial for 12 PI for the PTs (%A: 56 to 82; K: 0.42 to 0.76 and perfect for 19 PI for the experts. For posture assessment, reliability was moderate to substantial for 12 PI for the PTs (%A > 60%; K: 0.42 to 0.75 and moderate to perfect for 18 PI for the experts (%A: 80 to 100; K: 0.55 to 1.00. The agreement between PTs and experts was good for most muscular chain evaluations (18 PI; ICC: 0.82 to 0.99 and PI (19 PI; ICC: 0.78 to 1.00. Conclusions The GPR muscular chain evaluation has good reliability for most posture indices. GPR evaluation should help guide physical therapists in targeting affected muscles for treatment of abnormal posture patterns.

  9. Inter-rater reliability of the Greek version of CAARMS among two groups of mental health professionals.

    Science.gov (United States)

    Kollias, C; Kontaxakis, V; Havaki-Kontaxaki, B; Simmons, M B; Stefanis, N; Papageorgiou, C

    2015-01-01

    There is increasing interest within the Greek psychiatric community in the early detection and prevention of psychotic disorders. To support this, there is a need for a valid and reliable tool to identify young people that may be at risk of developing a psychotic disorder. Our team has previously translated the Comprehensive Assessment of At-Risk Mental States (CAARMS). The validity of the CAARMS was ensured by the procedure of translation and the aim of the current study was to estimate the interrater reliability of the CAARMS Greek translation among residents in psychiatry and specialized mental health professionals. 43 mental health workers (27 residents in psychiatry and 16 specialized mental health professionals (i.e. 11 psychiatrists and 5 psychologist) participated in two seminars that covered theoretical information about the ultra high risk concept and training in the CAARMS. During the seminars, 10 vignettes with psychiatric history cases were presented, including healthy, ultra high risk and first episode psychosis. The mean correlated percentage of agreement with the correct answers regarding diagnosis of the presented history cases among all our subjects was 81.42, among specialized mental health professionals 77.88, and among residents 84.46. Intraclass correlation co-efficients were 0.994 for specialized mental health professionals and 0.997 for residents. The translated Greek version of CAARMS presents a satisfying interrater reliability when used by both residents and specialized mental health professionals. Residents declare even higher intraclass correlation co-efficients and mean correlated percentage of agreement than specialized mental health professionals, which indicate that residents are capable of using the CAARMS in early intervention units.

  10. Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps.

    Science.gov (United States)

    Powell, Adam C; Torous, John; Chan, Steven; Raynor, Geoffrey Stephen; Shwarts, Erik; Shanahan, Meghan; Landman, Adam B

    2016-02-10

    There are over 165,000 mHealth apps currently available to patients, but few have undergone an external quality review. Furthermore, no standardized review method exists, and little has been done to examine the consistency of the evaluation systems themselves. We sought to determine which measures for evaluating the quality of mHealth apps have the greatest interrater reliability. We identified 22 measures for evaluating the quality of apps from the literature. A panel of 6 reviewers reviewed the top 10 depression apps and 10 smoking cessation apps from the Apple iTunes App Store on these measures. Krippendorff's alpha was calculated for each of the measures and reported by app category and in aggregate. The measure for interactiveness and feedback was found to have the greatest overall interrater reliability (alpha=.69). Presence of password protection (alpha=.65), whether the app was uploaded by a health care agency (alpha=.63), the number of consumer ratings (alpha=.59), and several other measures had moderate interrater reliability (alphas>.5). There was the least agreement over whether apps had errors or performance issues (alpha=.15), stated advertising policies (alpha=.16), and were easy to use (alpha=.18). There were substantial differences in the interrater reliabilities of a number of measures when they were applied to depression versus smoking apps. We found wide variation in the interrater reliability of measures used to evaluate apps, and some measures are more robust across categories of apps than others. The measures with the highest degree of interrater reliability tended to be those that involved the least rater discretion. Clinical quality measures such as effectiveness, ease of use, and performance had relatively poor interrater reliability. Subsequent research is needed to determine consistent means for evaluating the performance of apps. Patients and clinicians should consider conducting their own assessments of apps, in conjunction with

  11. Examining Design and Inter-Rater Reliability of a Rubric Measuring Research Quality across Multiple Disciplines

    Directory of Open Access Journals (Sweden)

    Marilee J. Bresciani

    2009-05-01

    Full Text Available The paper presents a rubric to help evaluate the quality of research projects. The rubric was applied in a competition across a variety of disciplines during a two-day research symposium at one institution in the southwest region of the United States of America. It was collaboratively designed by a faculty committee at the institution and was administered to 204 undergraduate, master, and doctoral oral presentations by approximately 167 different evaluators. No training or norming of the rubric was given to 147 of the evaluators prior to the competition. The findings of the inter-rater reliability analysis reveal substantial agreement among the judges, which contradicts literature describing the fact that formal norming must occur prior to seeing substantial levels of inter-rater reliability. By presenting the rubric along with the methodology used in its design and evaluation, it is hoped that others will find this to be a useful tool for evaluating documents and for teaching research methods.

  12. 4-Meter Gait Speed Test in Chronic Obstructive Pulmonary Disease: INTERRATER RELIABILITY USING A STOPWATCH.

    Science.gov (United States)

    Bisca, Gianna Waldrich; Fava, Lucas Rodrigues; Morita, Andrea Akemi; Machado, Felipe Vilaça Cavallari; Pitta, Fabio; Hernandes, Nidia Aparecida

    2017-12-14

    4-meter gait speed (4MGS) is increasingly used to assess functional performance in patients with chronic obstructive pulmonary disease. However, the current literature lacks information regarding some technical standards for this test. Therefore, the purpose of this study was to compare and to evaluate the interrater reliability between a stopwatch and video recording used as timing systems for the 4MGS in patients with chronic obstructive pulmonary disease, as well as to verify the interrater reliability between 2 observers measuring the 4MGS time using a manual stopwatch. Fifty-one patients performed the 4MGS using 4 different protocols (random order): walking at the usual and maximum speed in a 4-meter course and walking at the same 2 speeds on an 8-m course using a 2-m acceleration zone, a 4-meter timing area, and a 2-m deceleration zone. Gait speed was measured simultaneously using a stopwatch and a video recording. In a subanalysis (n = 24), 2 independent observers timed the 4MGS using a stopwatch. There was no significant difference in comparison between the 2 timing methods (P > .05 for all), and the reliability between video recording and stopwatch was excellent in all 4MGS studied protocols (intraclass correlation coefficient ≥ 0.91). Moreover, when comparing gait speed measured by 2 observers using a stopwatch, no significant difference was found among all proposed protocols (P > .05 for all), and there was also excellent reliability between the 2 independent observers (intraclass correlation coefficient ≥ 0.94). The stopwatch, a low-cost and feasible tool, is reliable as a timing device for the 4MGS in patients with chronic obstructive pulmonary disease.

  13. Inter-rater and intra-rater reliability of a clinical protocol for measuring turnout in collegiate dancers.

    Science.gov (United States)

    Greene, Amanda; Lasner, Andrea; Deu, Rajwinder; Oliphant, Seth; Johnson, Kenneth

    2018-02-02

    Reliable methods of measuring turnout in dancers and comparing active turnout (used in class) with functional (uncompensated) turnout are needed. Authors have suggested measurement techniques but there is no clinically useful, easily reproducible technique with established inter-rater and intra-rater reliability. We adapted a technique based on previous research, which is easily reproducible. We hypothesized excellent inter-rater and intra-rater reliability between experienced physical therapists (PTs) and a briefly trained faculty member from a university's department of dance. Thirty-two participants were recruited from the same dance department. Dancers' active and functional turnout was measured by each rater. We found that our technique for measuring active and functional turnout has excellent inter-rater and intra-rater reliability when performed by two experienced PTs and by one briefly trained university-level dance faculty member. For active turnout, inter-rater reliability was 0.78 among all raters and 0.82 among only the PT raters; intra-rater reliability was 0.82 among all raters and 0.85 among only the PT raters. For functional turnout, inter-rater reliability was 0.86 among all raters and 0.88 among only the PT raters; intra-rater reliability was 0.87 among all raters and 0.88 among only the PT raters. The measurement technique described provides a standardized protocol with excellent inter-rater and intra-rater reliability when performed by experienced PTs or by a briefly trained university-level dance faculty member.

  14. Inter-rater reliability of PATH observations for assessment of ergonomic risk factors in hospital work.

    Science.gov (United States)

    Park, Jung-Keun; Boyer, Jon; Tessler, Jamie; Casey, Jeffrey; Schemm, Linda; Gore, Rebecca; Punnett, Laura

    2009-07-01

    This study examined the inter-rater reliability of expert observations of ergonomic risk factors by four analysts. Ten jobs were observed at a hospital using a newly expanded version of the PATH method (Buchholz et al. 1996), to which selected upper extremity exposures had been added. Two of the four raters simultaneously observed each worker onsite for a total of 443 observation pairs containing 18 categorical exposure items each. For most exposure items, kappa coefficients were 0.4 or higher. For some items, agreement was higher both for the jobs with less rapid hand activity and for the analysts with a higher level of ergonomic job analysis experience. These upper extremity exposures could be characterised reliably with real-time observation, given adequate experience and training of the observers. The revised version of PATH is applicable to the analysis of jobs where upper extremity musculoskeletal strain is of concern.

  15. Intra- and interrater reliability of the 'lumbar-locked thoracic rotation test' in competitive swimmers ages 10 through 18 years.

    Science.gov (United States)

    Feijen, Stef; Kuppens, Kevin; Tate, Angela; Baert, Isabel; Struyf, Thomas; Struyf, Filip

    2018-04-17

    Measuring thoracic spine mobility can be of interest to competitive swimmers as it has been associated with shoulder girdle function and scapular position in subjects with and without shoulder pain. At present, no reliability data of thoracic spine mobility measurements are available in the swimming population. This study aims to evaluate the within-session intra- and interrater reliability of the "lumbar-locked rotation test" for thoracic spine rotation in competitive swimmers aged 10 to 18 years. This reliability study is part of a larger prospective cohort study investigating potential risk factors for the development of shoulder pain in competitive swimmers. Within-session, intra- and inter-rater reliability. Competitive swimming clubs in Belgium. 21 competitive swimmers. Intra- and inter-rater reliability of the lumbar-locked thoracic rotation test. Intraclass correlation coefficients (ICCs) ranged from 0.91 (95% CI 0.78 to 0.96) to 0.96 (0.89-0.98) for intra-rater reliability. Results for inter-rater reliability ranged from 0.89 (0.72-0.95) to 0.86 (0.65-0.94) respectively for right and left thoracic rotation. Results suggest good to excellent reliability of the lumbar-locked thoracic rotation test, indicating this test can be used reliably in clinical practice. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Inter-rater reliability of the Sødring Motor Evaluation of Stroke patients (SMES).

    Science.gov (United States)

    Halsaa, K E; Sødring, K M; Bjelland, E; Finsrud, K; Bautz-Holter, E

    1999-12-01

    The Sødring Motor Evaluation of Stroke patients is an instrument for physiotherapists to evaluate motor function and activities in stroke patients. The rating reflects quality as well as quantity of the patient's unassisted performance within three domains: leg, arm and gross function. The inter-rater reliability of the method was studied in a sample of 30 patients admitted to a stroke rehabilitation unit. Three therapists were involved in the study; two therapists assessed the same patient on two consecutive days in a balanced design. Cohen's weighted kappa and McNemar's test of symmetry were used as measures of item reliability, and the intraclass correlation coefficient was used to express the reliability of the sumscores. For 24 out of 32 items the weighted kappa statistic was excellent (0.75-0.98), while 7 items had a kappa statistic within the range 0.53-0.74 (fair to good). The reliability of one item was poor (0.13). The intraclass correlation coefficient for the three sumscores was 0.97, 0.91 and 0.97. We conclude that the Sødring Motor Evaluation of Stroke patients is a reliable measure of motor function in stroke patients undergoing rehabilitation.

  17. Photographic assessment of burn size and depth: reliability and validity

    NARCIS (Netherlands)

    Hop, M.; Moues, C.; Bogomolova, K.; Nieuwenhuis, M.; Oen, I.; Middelkoop, E.; Breederveld, R.; de Baar, M.

    2014-01-01

    Objective: The aim of this study was to examine the reliability and validity of using photographs of burns to assess both burn size and depth. Method: Fifty randomly selected photographs taken on day 0-1 post burn were assessed by seven burn experts and eight referring physicians. Inter-rater

  18. Actors' portrayals of depression to test interrater reliability in clinical trials.

    Science.gov (United States)

    Rosen, Jules; Mulsant, Benoit H; Bruce, Martha L; Mittal, Vikas; Fox, Debra

    2004-10-01

    This study determined if actors could portray depressed patients to establish the interrater reliability of raters using the Hamilton Depression Rating Scale (HDRS). Actors portrayed depressed patients using scripts derived from HDRS assessments obtained at three points during treatment. Four experienced raters blindly viewed videotapes of two patients and two actors. They guessed if each interviewee was a patient or an actor and rated the certainty of their guesses. For each interview, they also rated the realism of the portrayal and completed the HDRS. Experienced raters could not distinguish actors and patients better than chance and were equally certain of their right and wrong guesses. Actors and patients received high scores on the realism of their portrayals. The HDRS scores of the actor-patient pairs were correlated. Actors can effectively portray depressed patients. Future studies will determine if actors can accurately portray patients with anxiety and psychosis.

  19. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

    Directory of Open Access Journals (Sweden)

    Kevin A. Hallgren

    2012-02-01

    Full Text Available Many research designs require the assessment of inter-rater reliability (IRR to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR.

  20. Analyses of inter-rater reliability between professionals, medical students and trained school children as assessors of basic life support skills.

    Science.gov (United States)

    Beck, Stefanie; Ruhnke, Bjarne; Issleib, Malte; Daubmann, Anne; Harendza, Sigrid; Zöllner, Christian

    2016-10-07

    Training of lay-rescuers is essential to improve survival-rates after cardiac arrest. Multiple campaigns emphasise the importance of basic life support (BLS) training for school children. Trainings require a valid assessment to give feedback to school children and to compare the outcomes of different training formats. Considering these requirements, we developed an assessment of BLS skills using MiniAnne and tested the inter-rater reliability between professionals, medical students and trained school children as assessors. Fifteen professional assessors, 10 medical students and 111-trained school children (peers) assessed 1087 school children at the end of a CPR-training event using the new assessment format. Analyses of inter-rater reliability (intraclass correlation coefficient; ICC) were performed. Overall inter-rater reliability of the summative assessment was high (ICC = 0.84, 95 %-CI: 0.84 to 0.86, n = 889). The number of comparisons between peer-peer assessors (n = 303), peer-professional assessors (n = 339), and peer-student assessors (n = 191) was adequate to demonstrate high inter-rater reliability between peer- and professional-assessors (ICC: 0.76), peer- and student-assessors (ICC: 0.88) and peer- and other peer-assessors (ICC: 0.91). Systematic variation in rating of specific items was observed for three items between professional- and peer-assessors. Using this assessment and integrating peers and medical students as assessors gives the opportunity to assess hands-on skills of school children with high reliability.

  1. Interrater reliability of schizoaffective disorder compared with schizophrenia, bipolar disorder, and unipolar depression - A systematic review and meta-analysis.

    Science.gov (United States)

    Santelmann, Hanno; Franklin, Jeremy; Bußhoff, Jana; Baethge, Christopher

    2016-10-01

    Schizoaffective disorder is a common diagnosis in clinical practice but its nosological status has been subject to debate ever since it was conceptualized. Although it is key that diagnostic reliability is sufficient, schizoaffective disorder has been reported to have low interrater reliability. Evidence based on systematic review and meta-analysis methods, however, is lacking. Using a highly sensitive literature search in Medline, Embase, and PsycInfo we identified studies measuring the interrater reliability of schizoaffective disorder in comparison to schizophrenia, bipolar disorder, and unipolar disorder. Out of 4126 records screened we included 25 studies reporting on 7912 patients diagnosed by different raters. The interrater reliability of schizoaffective disorder was moderate (meta-analytic estimate of Cohen's kappa 0.57 [95% CI: 0.41-0.73]), and substantially lower than that of its main differential diagnoses (difference in kappa between 0.22 and 0.19). Although there was considerable heterogeneity, analyses revealed that the interrater reliability of schizoaffective disorder was consistently lower in the overwhelming majority of studies. The results remained robust in subgroup and sensitivity analyses (e.g., diagnostic manual used) as well as in meta-regressions (e.g., publication year) and analyses of publication bias. Clinically, the results highlight the particular importance of diagnostic re-evaluation in patients diagnosed with schizoaffective disorder. They also quantify a widely held clinical impression of lower interrater reliability and agree with earlier meta-analysis reporting low test-retest reliability. Copyright © 2016. Published by Elsevier B.V.

  2. Inter-Rater Reliability of Neck Reflex Points in Women with Chronic Neck Pain.

    Science.gov (United States)

    Weinschenk, Stefan; Göllner, Richard; Hollmann, Markus W; Hotz, Lorenz; Picardi, Susanne; Hubbert, Katharina; Strowitzki, Thomas; Meuser, Thomas

    2016-01-01

    Neck reflex points (NRP) are tender soft tissue areas of the cervical region that display reflectory changes in response to chronic inflammations of correlated regions in the visceral cranium. Six bilateral areas, NRP C0, C1, C2, C3, C4 and C7, are detectable by palpating the lateral neck. We investigated the inter-rater reliability of NRP to assess their potential clinical relevance. 32 consecutive patients with chronic neck pain were examined for NRP tenderness by an experienced physician and an inexperienced medical student in a blinded design. A detailed description of the palpation technique is included in this section. Absence of pain was defined as pain index (PI) = 0, slight tenderness = 1, and marked pain = 2. Findings were evaluated either by pair-wise Cohen's kappa (ĸ) or by percentage of agreement (PA). Examiners identified 40% and 41% of positive NRP, respectively (PI > 0, physician: 155, student: 157) with a slight preference for the left side (1.2:1). The number of patients identified with >6 positive NRP by the examiners was similar (13 vs. 12 patients). ĸ values ranged from 0.52 to 0.95. The overall kappa was ĸ = 0.80 for the left and ĸ = 0.74 for the right side. PA varied from 78.1% to 96.9% with strongest agreement at NRP C0, NRP C2, and NRP C7. Inter-rater agreement was independent of patients' age, gender, body mass index and examiner's experience. The high reproducibility suggests the clinical relevance of NRP in women. © 2016 S. Karger GmbH, Freiburg.

  3. Inter-rater and intrarater reliability of the South African Triage Scale in low-resource settings of Haiti and Afghanistan.

    Science.gov (United States)

    Dalwai, Mohammed; Tayler-Smith, Katie; Twomey, Michèle; Nasim, Masood; Popal, Abdul Qayum; Haqdost, Waliul Haq; Gayraud, Olivia; Cheréstal, Sophia; Wallis, Lee; Valles, Pola

    2018-03-16

    The South African Triage Scale (SATS) has demonstrated good validity in the EDs of Médecins Sans Frontières (MSF)-supported sites in Afghanistan and Haiti; however, corresponding reliability in these settings has not yet been reported on. This study set out to assess the inter-rater and intrarater reliability of the SATS in four MSF-supported EDs in Afghanistan and Haiti (two trauma-only EDs and two mixed (including both medical and trauma cases) EDs). Under classroom conditions between December 2013 and February 2014, ED nurses at each site assigned triage ratings to a set of context-specific vignettes (written case reports of ED patients). Inter-rater reliability was assessed by comparing triage ratings among nurses; intrarater reliability was assessed by asking the nurses to retriage 10 random vignettes from the original set and comparing these duplicate ratings. Inter-rater reliability was calculated using the unweighted kappa, linearly weighted kappa and quadratically weighted kappa (QWK) statistics, and the intraclass correlation coefficient (ICC). Intrarater reliability was calculated according to the percentage of exact agreement and the percentage of agreement allowing for one level of discrepancy in triage ratings. The correlation between years of nursing experience and reliability of the SATS was assessed based on comparison of ICCs and the respective 95% CIs. A total of 67 nurses agreed to participate in the study: In Afghanistan there were 19 nurses from Kunduz Trauma Centre and nine from Ahmed Shah Baba; in Haiti, there were 20 nurses from Martissant Emergency Centre and 19 from Tabarre Surgical and Trauma Centre. Inter-rater agreement was moderate across all sites (ICC range: 0.50-0.60; QWK range: 0.50-0.59) apart from the trauma ED in Haiti where it was moderate to substantial (ICC: 0.58; QWK: 0.61). Intrarater agreement was similar across the four sites (68%-74% exact agreement); when allowing for a one-level discrepancy in triage ratings

  4. Inter-rater reliability in the classification of supraspinatus tendon tears using 3D ultrasound – a question of experience?

    Directory of Open Access Journals (Sweden)

    Giorgio Tamborrini

    2016-09-01

    Full Text Available Background: Three-dimensional (3D ultrasound of the shoulder is characterized by a comparable accuracy to two-dimensional (2D ultrasound. No studies investigating 2D versus 3D inter-rater reliability in the detection of supraspinatus tendon tears taking into account the level of experience of the raters have been carried out so far. Objectives: The aim of this study was to determine the inter-rater reliability in the analysis of 3D ultrasound image sets of the supraspinatus tendon between sonographer with different levels of experience. Patients and methods: Non-interventional, prospective, observational pilot study of 2309 images of 127 adult patients suffering from unilateral shoulder pain. 3D ultrasound image sets were scored by three raters independently. The intra-and interrater reliabilities were calculated. Results: There was an excellent intra-rater reliability of rater A in the overall classification of supraspinatus tendon tears (2D vs 3D κ = 0.892, pairwise reliability 93.81%, 3D scoring round 1 vs 3D scoring round 2 κ = 0.875, pairwise reliability 92.857%. The inter-rater reliability was only moderate compared to rater B on 3D (κ = 0.497, pairwise reliability 70.95% and fair compared to rater C (κ = 0.238, pairwise reliability 42.38%. Conclusions: The reliability of 3D ultrasound of the supraspinatus tendon depends on the level of experience of the sonographer. Experience in 2D ultrasound does not seem to be sufficient for the analysis of 3D ultrasound imaging sets. Therefore, for a 3D ultrasound analysis new diagnostic criteria have to be established and taught even to experienced 2D sonographers to improve reproducibility.

  5. Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of traditional Chinese medicine.

    Science.gov (United States)

    Mist, Scott; Ritenbaugh, Cheryl; Aickin, Mikel

    2009-07-01

    To investigate whether a training process that focused on a questionnaire-based diagnosis in Traditional Chinese Medicine (TCM), and developing diagnostic consensus, would improve the agreement of TCM diagnoses among 10 TCM practitioners evaluating patients with temporomandibular joint disorder (TMJD). Evaluation of a diagnostic training program at the Department of Family and Community Medicine, University of Arizona, Tucson, Arizona, and the Oregon College of Oriental Medicine, Portland, Oregon. Screened participants for a study of TCM for TMJD. PRACTITIONERS: Ten (10) licensed acupuncturists with a minimum of 5 years licensure and education in Chinese herbs. A training session using a questionnaire-based diagnostic form was conducted, followed by waves of diagnostic sessions. Between sessions, practitioners discussed the results of the previous round of participants with a focus on reducing variability in primary diagnosis and severity rating of each diagnosis: 3 waves of 5 patients were assessed by 4 practitioner pairs for a total of 120 diagnoses. At 18 months, practitioners completed a recalibration exercise with a similar format with a total of 32 diagnoses. These diagnoses were then examined with respect to the rate of agreement among the 10 practitioners using inter-rater correlations and kappas. The inter-rater correlation with respect to the TCM diagnoses among the 10 practitioners increased from 0.112 to 0.618 with training. Statistically significant improvements were found between the baseline and 18 month exercises (p reliability of TCM diagnosis may be improved through a training process and a questionnaire-based diagnosis process. The improvements varied by diagnosis, with the greatest congruence among primary and more severe diagnoses. Future TCM studies should consider including calibration training to improve the validity of results.

  6. Corrections for criterion reliability in validity generalization: The consistency of Hermes, the utility of Midas

    Directory of Open Access Journals (Sweden)

    Jesús F. Salgado

    2016-04-01

    Full Text Available There is criticism in the literature about the use of interrater coefficients to correct for criterion reliability in validity generalization (VG studies and disputing whether .52 is an accurate and non-dubious estimate of interrater reliability of overall job performance (OJP ratings. We present a second-order meta-analysis of three independent meta-analytic studies of the interrater reliability of job performance ratings and make a number of comments and reflections on LeBreton et al.s paper. The results of our meta-analysis indicate that the interrater reliability for a single rater is .52 (k = 66, N = 18,582, SD = .105. Our main conclusions are: (a the value of .52 is an accurate estimate of the interrater reliability of overall job performance for a single rater; (b it is not reasonable to conclude that past VG studies that used .52 as the criterion reliability value have a less than secure statistical foundation; (c based on interrater reliability, test-retest reliability, and coefficient alpha, supervisor ratings are a useful and appropriate measure of job performance and can be confidently used as a criterion; (d validity correction for criterion unreliability has been unanimously recommended by "classical" psychometricians and I/O psychologists as the proper way to estimate predictor validity, and is still recommended at present; (e the substantive contribution of VG procedures to inform HRM practices in organizations should not be lost in these technical points of debate.

  7. Inter-rater reliability of a modified version of Delitto et al.’s classification-based system for low back pain : a pilot study

    NARCIS (Netherlands)

    Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C W

    2016-01-01

    Study design:: Observational inter-rater reliability study. Objectives: To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3)

  8. Intra- and interrater reliability and agreement of the Danish version of the Dynamic Gait Index in older people with balance impairments

    DEFF Research Database (Denmark)

    Jønsson, Line R; Kristensen, Morten; Tibaek, Sigrid

    2011-01-01

    To examine the intrarater and interrater reliability and agreement of the Danish version of the Dynamic Gait Index (DGI) in hospitalized and community-dwelling older people with balance impairments.......To examine the intrarater and interrater reliability and agreement of the Danish version of the Dynamic Gait Index (DGI) in hospitalized and community-dwelling older people with balance impairments....

  9. Intra-rater and inter-rater reliability of the standardized ultrasound protocol for assessing subacromial structures

    DEFF Research Database (Denmark)

    Hougs Kjær, Birgitte; Ellegaard, Karen; Wieland, Ina

    2017-01-01

    BACKGROUND: US-examinations related to shoulder impingement (SI) often vary due to methodological differences, examiner positions, transducers, and recording parameters. Reliable US protocols for examination of different structures related to shoulder impingement are therefore needed. OBJECTIVES...... of the supraspinatus tendon (SUPRA) and subacromial subdeltoid (SASD) bursa in two imaging positions, and the acromial humeral distance (AHD) in one position. Additionally, agreement on dynamic impingement (DI) examination was performed. The intra- and inter-rater reliability was carried out on the same day...

  10. Inter-rater reliability of postnatal ultrasound interpretation in infants with congenital hydronephrosis.

    Science.gov (United States)

    Vemulakonda, V M; Wilcox, D T; Torok, M R; Hou, A; Campbell, J B; Kempe, A

    2015-09-01

    The most common measurements of hydronephrosis are the anterior-posterior (AP) diameter and the Society for Fetal Urology (SFU) grading systems. To date, the inter-rater reliability (IRR) of these measures has not been compared in the postnatal period. The objectives of this study were to compare the IRR of the AP diameter and the SFU grading system in infants and to determine whether ultrasound findings other than pelvicalyceal dilation are associated with higher SFU grades. Initial postnatal ultrasounds of infants seen from February 1, 2011, to January 31, 2012, with a primary diagnosis of congenital hydronephrosis were included for review. Ultrasound images were de-identified and reviewed by four pediatric urologists. IRR was calculated using the intraclass correlation (ICC) measure. A paired t test was used to compare ICCs. Associations between SFU grade and other ultrasound findings were tested using Chi-square or Fisher's exact tests. A total of 112 kidneys in 56 patients were reviewed. IRR of the SFU grading system was high (right kidney ICC = 0.83, left kidney ICC = 0.85); however, IRR of AP diameter measurement was higher (right kidney ICC = 00.97, left kidney ICC = 0.98; p hydronephrosis on bivariable and multivariable analysis. The SFU grading system is associated with excellent IRR, although the AP diameter appears to have higher IRR. Physicians may consider ultrasound findings that are not explicitly included in the SFU system when assigning hydronephrosis grade, which may lead to variability in use of this classification system.

  11. Reliability and Validity of Prototype Diagnosis for Adolescent Psychopathology.

    Science.gov (United States)

    Haggerty, Greg; Zodan, Jennifer; Mehra, Ashwin; Zubair, Ayyan; Ghosh, Krishnendu; Siefert, Caleb J; Sinclair, Samuel J; DeFife, Jared

    2016-04-01

    The current study investigated the interrater reliability and validity of prototype ratings of 5 common adolescent psychiatric disorders: attention-deficit/hyperactivity disorder, conduct disorder, major depressive disorder, generalized anxiety disorder, and posttraumatic stress disorder. One hundred fifty-seven adolescent inpatient participants consented to participate in this study. We compared ratings from 2 inpatient clinicians, blinded to each other's ratings and patient measures, after their separate initial diagnostic interview to assess interrater reliability. Prototype ratings completed by clinicians after their initial diagnostic interview with adolescent inpatients and outpatients were compared with patient-reported behavior problems and parents' report of their child's behavioral problems. Prototype ratings demonstrated good interrater reliability. Clinicians' prototype ratings showed predicted relationships with patient-reported behavior problems and parent-reported behavior problems. Prototype matching seems to be a possible alternative for psychiatric diagnosis. Prototype ratings showed good interrater reliability based on clinicians unique experiences with the patient (as opposed to video-/audio-recorded material) with no training.

  12. Quality of nursing intensity data: inter-rater reliability of the patient classification after two decades in clinical use.

    Science.gov (United States)

    Liljamo, Pia; Kinnunen, Ulla-Mari; Ohtonen, Pasi; Saranto, Kaija

    2017-09-01

    The aim of this study was to measure the inter-rater reliability of the Oulu Patient Classification and to discuss existing methods of reliability testing. The Oulu Patient Classification, part of the RAFAELA ® System, has been developed to assist nursing managers with the proper allocation of nursing resources. Due to the increased intensity of inpatient care during recent years, there is a need for the reliability testing of the classification, which has been in clinical use for 20 years. Retrospective statistical study. To test inter-rater reliability, a pair of nurses classified the same patients, without knowledge of each other's ratings, as a part of annually conducted standardization. Data on the parallel classifications (n = 19,997) was obtained from inpatient units (n = 32) with different specialties at a university hospital in Finland during 2010-2015. Parallel classification practices were also analysed. The reliability of the overall classification and its subareas were calculated using suitable statistical coefficients. Inter-rater reliability coefficients were a reliable or almost perfect means of considering the nursing intensity category and various practices, but there were detectable differences between subareas. The lowest agreement levels occurred in the subareas 'Planning and Coordination of Nursing Care' and 'Guiding of Care/Continued Care and Emotional Support'. There is a need to develop the descriptions of subareas and to clarify the related concepts. Precise nursing documentation can promote a high level of agreement and reliable results. The traditional overall proportion of agreement does not provide an adequate picture of reliability - weighted kappa coefficients should be used instead. © 2017 John Wiley & Sons Ltd.

  13. Interrater reliability of the Melbourne Assessment of Unilateral Upper Limb Function for children with hemiplegic cerebral palsy.

    LENUS (Irish Health Repository)

    Spirtos, Michelle

    2012-02-01

    OBJECTIVE: We examined the interrater reliability of the Melbourne Assessment of Unilateral Upper Limb Function. METHOD: Three occupational therapists independently scored 34 videotaped assessments of children with hemiplegic cerebral palsy aged 6 yr, 1 mo, to 14 yr, 5 mo. Intraclass correlation coefficients (ICCs) at a 95% confidence interval were calculated for total scores, category scores, and item scores. RESULTS: The correlation between raters\\' total scores was high (ICC = .961). The highest correlation for test components between raters was found for fluency (ICC = .902), followed by range of movement (ICC = .866), and the lowest correlation was found for quality of movement (ICC = .683). The ICCs for individual test item scores varied and ranged from .368 to .899. CONCLUSION: This study demonstrated high interrater reliability for total scores, with scoring of some individual components and items requiring further consideration from both a clinical and a research perspective.

  14. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age

    NARCIS (Netherlands)

    van Daalen, Emma; Kemner, Chantal; Dietz, Claudine; Swinkels, Sophie H. N.; Buitelaar, Jan K.; van Engeland, Herman

    2009-01-01

    To examine the inter-rater reliability and stability of autism spectrum disorder (ASD) diagnoses made at a very early age in children identified through a screening procedure around 14 months of age. In a prospective design, preschoolers were recruited from a screening study for ASD. The inter-rater

  15. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age.

    NARCIS (Netherlands)

    Daalen, E. van; Kemner, C.; Dietz, C.; Swinkels, S.H.N.; Buitelaar, J.K.; Engeland, H.M. van

    2009-01-01

    To examine the inter-rater reliability and stability of autism spectrum disorder (ASD) diagnoses made at a very early age in children identified through a screening procedure around 14 months of age. In a prospective design, preschoolers were recruited from a screening study for ASD. The inter-rater

  16. The inter-rater reliability and prognostic value of coma scales in Nepali children with acute encephalitis syndrome.

    Science.gov (United States)

    Ray, Stephen; Rayamajhi, Ajit; Bonnett, Laura J; Solomon, Tom; Kneen, Rachel; Griffiths, Michael J

    2018-02-01

    Background Acute encephalitis syndrome (AES) is a common cause of coma in Nepali children. The Glasgow coma scale (GCS) is used to assess the level of coma in these patients and predict outcome. Alternative coma scales may have better inter-rater reliability and prognostic value in encephalitis in Nepali children, but this has not been studied. The Adelaide coma scale (ACS), Blantyre coma scale (BCS) and the Alert, Verbal, Pain, Unresponsive scale (AVPU) are alternatives to the GCS which can be used. Methods Children aged 1-14 years who presented to Kanti Children's Hospital, Kathmandu with AES between September 2010 and November 2011 were recruited. All four coma scales (GCS, ACS, BCS and AVPU) were applied on admission, 48 h later and on discharge. Inter-rater reliability (unweighted kappa) was measured for each. Correlation and agreement between total coma score and outcome (Liverpool outcome score) was measured by Spearman's rank and Bland-Altman plot. The prognostic value of coma scales alone and in combination with physiological variables was investigated in a subgroup (n = 22). A multivariable logistic regression model was fitted by backward stepwise. Results Fifty children were recruited. Inter-rater reliability using the variables scales was fair to moderate. However, the scales poorly predicted clinical outcome. Combining the scales with physiological parameters such as systolic blood pressure improved outcome prediction. Conclusion This is the first study to compare four coma scales in Nepali children with AES. The scales exhibited fair to moderate inter-rater reliability. However, the study is inadequately powered to answer the question on the relationship between coma scales and outcome. Further larger studies are required.

  17. Development and first validation of a simplified CT-based classification system of soft tissue changes in large-head metal-on-metal total hip replacement: intra- and interrater reliability and association with revision rates in a uniform cohort of 664 arthroplasties

    International Nuclear Information System (INIS)

    Boomsma, Martijn F.; Warringa, Niek; Edens, Mireille A.; Lingen, Christiaan P. van; Ettema, Harmen B.; Verheyen, Cees C.P.M.; Maas, Mario

    2015-01-01

    analysis was performed two-tailed using alpha 5 % as the significance level. In total, 664 scores from 664 MoM hips obtained by two observers were available for analyses. Interobserver reliability for the non-simplified version (I-V) was κw = 0.71 (95 % CI: 0.62-0.79), which indicates good agreement between the two musculoskeletal radiologists. Intra- and interobserver reliability for the simplified version (A-C) were respectively κw 0.78 (95 % CI: 0.68-0.87), and κw = 0.71 (95 % CI: 0.65-0.76). This indicates good agreement within and between the two observers. The simplified A-C version is significantly associated with revision exclusively due to MoM pathology, in both patients with unilateral MoM THA (p < 0.001) and patients with bilateral MoM THA (p < 0.044). The simplified A-C version is associated with several clinical measures. In patients with unilateral MoM THA, with or without contralateral THA, in situ time (p < 0.008), cobalt and chromium (p < 0.001) were statistically significant. In patients with bilateral MoM, cobalt (p < 0.001) and chromium (p < 0.027) were statistically significant. Revision is significantly associated with cup size (p < 0.001), anteversion of the cup (p < 0.004), serum ion levels of cobalt and chromium (p < 0.001) and the adapted classification system (p < 0.001). In univariate logistic regression analysis on revision, cup, anteversion of the cup, cobalt-chromium ion serum levels, and the simplified (A-C) CT category system were statistically significant. The simplified (A-C) CT category system was an independent associate of revision, in several multiple logistic regression models. The presented simplified CT grading system (A-C) in its first clinical validation on 48- and 64-multislice systems is reliable, showing good intra- and interrater reliability and is independently associated with revision surgery. (orig.)

  18. Development and first validation of a simplified CT-based classification system of soft tissue changes in large-head metal-on-metal total hip replacement: intra- and interrater reliability and association with revision rates in a uniform cohort of 664 arthroplasties

    Energy Technology Data Exchange (ETDEWEB)

    Boomsma, Martijn F.; Warringa, Niek [Isala Hospital, Department of Radiology, Zwolle (Netherlands); Edens, Mireille A. [Isala Hospital, Department of Innovation and Science, Zwolle (Netherlands); Lingen, Christiaan P. van; Ettema, Harmen B.; Verheyen, Cees C.P.M. [Isala Hospital, Department of Orthopaedics, Zwolle (Netherlands); Maas, Mario [AMC, Department of Radiology, Amsterdam (Netherlands)

    2015-08-15

    analysis was performed two-tailed using alpha 5 % as the significance level. In total, 664 scores from 664 MoM hips obtained by two observers were available for analyses. Interobserver reliability for the non-simplified version (I-V) was κw = 0.71 (95 % CI: 0.62-0.79), which indicates good agreement between the two musculoskeletal radiologists. Intra- and interobserver reliability for the simplified version (A-C) were respectively κw 0.78 (95 % CI: 0.68-0.87), and κw = 0.71 (95 % CI: 0.65-0.76). This indicates good agreement within and between the two observers. The simplified A-C version is significantly associated with revision exclusively due to MoM pathology, in both patients with unilateral MoM THA (p < 0.001) and patients with bilateral MoM THA (p < 0.044). The simplified A-C version is associated with several clinical measures. In patients with unilateral MoM THA, with or without contralateral THA, in situ time (p < 0.008), cobalt and chromium (p < 0.001) were statistically significant. In patients with bilateral MoM, cobalt (p < 0.001) and chromium (p < 0.027) were statistically significant. Revision is significantly associated with cup size (p < 0.001), anteversion of the cup (p < 0.004), serum ion levels of cobalt and chromium (p < 0.001) and the adapted classification system (p < 0.001). In univariate logistic regression analysis on revision, cup, anteversion of the cup, cobalt-chromium ion serum levels, and the simplified (A-C) CT category system were statistically significant. The simplified (A-C) CT category system was an independent associate of revision, in several multiple logistic regression models. The presented simplified CT grading system (A-C) in its first clinical validation on 48- and 64-multislice systems is reliable, showing good intra- and interrater reliability and is independently associated with revision surgery. (orig.)

  19. Methods to achieve high interrater reliability in data collection from primary care medical records.

    Science.gov (United States)

    Liddy, Clare; Wiens, Miriam; Hogg, William

    2011-01-01

    We assessed interrater reliability (IRR) of chart abstractors within a randomized trial of cardiovascular care in primary care. We report our findings, and outline issues and provide recommendations related to determining sample size, frequency of verification, and minimum thresholds for 2 measures of IRR: the κ statistic and percent agreement. We designed a data quality monitoring procedure having 4 parts: use of standardized protocols and forms, extensive training, continuous monitoring of IRR, and a quality improvement feedback mechanism. Four abstractors checked a 5% sample of charts at 3 time points for a predefined set of indicators of the quality of care. We set our quality threshold for IRR at a κ of 0.75, a percent agreement of 95%, or both. Abstractors reabstracted a sample of charts in 16 of 27 primary care practices, checking a total of 132 charts with 38 indicators per chart. The overall κ across all items was 0.91 (95% confidence interval, 0.90-0.92) and the overall percent agreement was 94.3%, signifying excellent agreement between abstractors. We gave feedback to the abstractors to highlight items that had a κ of less than 0.70 or a percent agreement less than 95%. No practice had to have its charts abstracted again because of poor quality. A 5% sampling of charts for quality control using IRR analysis yielded κ and agreement levels that met or exceeded our quality thresholds. Using 3 time points during the chart audit phase allows for early quality control as well as ongoing quality monitoring. Our results can be used as a guide and benchmark for other medical chart review studies in primary care.

  20. Interrater Reliability in Analysis of Laryngoscopic Features for Unilateral Vocal Fold Paresis.

    Science.gov (United States)

    Isseroff, Tova F; Parasher, Arjun K; Richards, Amanda; Sivak, Mark; Woo, Peak

    2016-11-01

    The diagnosis of paresis in patients with vocal fold motion impairment remains a challenge. In particular, laryngoscopy examination may result in significant disagreement in diagnosis among providers. We hypothesize that systematically evaluating for a standard set of clinical parameters will increase the diagnostic concordance among providers. Prospective case series conducted at a Tertiary referral Laryngology office. Two laryngologists (rater 1) and two trainees (rater 2) rated laryngoscopy findings in 19 patients suspected of paresis. The diagnosis was confirmed with laryngeal electromyogram. A standard set of 27 ratings was used for each examination that included movement, laryngeal configuration, and stroboscopy signs. A kappa coefficient was calculated for agreement in laryngoscopy findings and effectiveness in predicting the laterality of paresis. A substantial agreement (kappa coefficient > 0.61) existed between the raters for vocal fold length, vocal fold thickness, bowing, and reduction in movement. A moderate agreement (kappa coefficient > 0.41) existed between raters for piriform opening and reduced kinesis. The senior author was accurately able to diagnose the side of paresis in 89.5% of cases for a kappa coefficient of 0.78, whereas the trainees correctly predicted the side of paresis in 63.1% for a kappa coefficient of 0.35. The raters agreed on the diagnosis in 73.7% of cases for a kappa coefficient of 0.50. Using a standard set of laryngoscopy findings may improve the provider's ability to identify the laterality of vocal fold paresis and increase interrater reliability compared with other series. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  1. Translation, adaptation and inter-rater reliability of the administration manual for the Fugl-Meyer assessment.

    Science.gov (United States)

    Michaelsen, Stella M; Rocha, André S; Knabben, Rodrigo J; Rodrigues, Luciano P; Fernandes, Claudia G C

    2011-01-01

    Recently, the reliability of the Brazilian version of the Fugl-Meyer Assessment (FMA) was assessed through the scoring given according to observations made by a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. Eighteen adults (59±10 years) with chronic hemiparesis (38±35 months after a stroke) took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98) and lower limbs (ICC=0.90), as well as for movement sense (ICC=0.98) and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively). The reliability was moderate for tactile sensitivity (0.75). The joint pain assessment presented low reliability. The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.

  2. The Berg Balance Scale has high intra- and inter-rater reliability but absolute reliability varies across the scale: a systematic review.

    Science.gov (United States)

    Downs, Stephen; Marquez, Jodie; Chiarelli, Pauline

    2013-06-01

    What is the intra-rater and inter-rater relative reliability of the Berg Balance Scale? What is the absolute reliability of the Berg Balance Scale? Does the absolute reliability of the Berg Balance Scale vary across the scale? Systematic review with meta-analysis of reliability studies. Any clinical population that has undergone assessment with the Berg Balance Scale. Relative intra-rater reliability, relative inter-rater reliability, and absolute reliability. Eleven studies involving 668 participants were included in the review. The relative intrarater reliability of the Berg Balance Scale was high, with a pooled estimate of 0.98 (95% CI 0.97 to 0.99). Relative inter-rater reliability was also high, with a pooled estimate of 0.97 (95% CI 0.96 to 0.98). A ceiling effect of the Berg Balance Scale was evident for some participants. In the analysis of absolute reliability, all of the relevant studies had an average score of 20 or above on the 0 to 56 point Berg Balance Scale. The absolute reliability across this part of the scale, as measured by the minimal detectable change with 95% confidence, varied between 2.8 points and 6.6 points. The Berg Balance Scale has a higher absolute reliability when close to 56 points due to the ceiling effect. We identified no data that estimated the absolute reliability of the Berg Balance Scale among participants with a mean score below 20 out of 56. The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects. The review was only able to comment on the absolute reliability of the Berg Balance Scale among people with moderately poor to normal balance. Copyright © 2013 Australian Physiotherapy Association. Published by .. All rights reserved.

  3. The interrater and intrarater reliability of the Philpott-Javer staging system based on level of training.

    Science.gov (United States)

    Parhar, Harman S; Thamboo, Andrew; Habib, Al-Rahim; Chang, Brent; Gan, Eng Cern; Javer, Amin R

    2014-04-01

    The Philpott-Javer postoperative endoscopic mucosal staging system for allergic fungal rhinosinusitis has previously demonstrated acceptable interrater reliability among rhinologists. There are, however, numerous learners involved in patient care at tertiary centers. This study aims to analyze the interrater and intrarater reliability of this system among learners in otolaryngology at different stages in training. A prospective analysis of retrospectively collected endoscopic photographs. A tertiary care teaching hospital (January 2013). Fifty patients undergoing routine follow-up. Three photographs from each of 50 patients undergoing routine postsurgical nasoendoscopy were reviewed. Images were played twice, 1 week apart, in 2 differently randomized cycles and scored according to Philpott-Javer criteria by a rhinologist, a rhinology fellow, a senior otolaryngology resident, a junior otolaryngology resident, and a medical student. Interobserver reliability was assessed using the intraclass correlation coefficient, while intrarater reliability was assessed by Shrout-Fleiss κ values. Agreement between each learner and the rhinologist was also assessed using κ values. The interclass correlation among the 5 raters was 0.7600 (95% confidence interval, 0.6917-0.8161) for the Philpott-Javer scoring system, suggesting substantial reliability. Intrarater data showed substantial to almost-perfect reliability (κ values between 0.668 and 0.815) among all raters using this system. There was also moderate to substantial agreement between the learners and the rhinologist (κ values between 0.534 and 0.710). Results suggest that the Philpott-Javer staging system has acceptable intrarater and interrater reliability among learners of differing levels of clinical experience and is suitable for evaluating progress following surgery.

  4. "An Investigation Into The Interrater Reliability Of The Modified Ashworth Scale In The Assessment Of Muscle Spasticity In Hemiplegic Patients "

    Directory of Open Access Journals (Sweden)

    N. Nokhostin-Ansari

    2006-06-01

    Full Text Available Background and Aim: Spasticity is a velocity-dependent increase in tonic stretch reflexes (muscle tone with exaggerated tendon jerks, resulting from hyperexcitability of the stretch reflex. The measurement of spasticity is necessary to determine the effect of treatments. The Modified Ashworth Scale is the most widely used method for assessing muscle spasticity in clinical practice and research. The purpose of this study was to investigate the interrater reliability of Modified Ashworth Scale in hemiplegic patients. Materials and Methods: Thirty subjects (16 males, 14 females with a mean age of 59.40 (SD =14.013 recruited. Shoulder adductor , elbow flexor , wrist dorsiflexor , hip adductor , knee extensor and ankle plantarflexor on the hemiplegic side were tested by two physiotherapists. Results: In the upper limb, the interrater reliability for shoulder adductor and elbow flexor muscles was fair (0.372 and 0.369, respectively. The reliability for the wrist flexors was good (0.612. The difference in Kappa value for the proximal muscle (shoulder adductor; 0.372 and the distal muscle (wrist flexor; 0.612 was significant (²X=33.87, df=1, p0.05. The mean value for the upper limb (0.505 and the lower limb (0,.516 was not significantly different (²X=0.1407, df=1, p>0.05. Conclusion: The interrater reliability of Modified Ashworth Scale was not good . The limb, upper or lower, had no significant effect on the reliability. In the upper limb, the reliability for the proximal and distal muscle was significantly different. However. The difference in the lower limb was not significant.When using the scale, one should consider it's limitation.

  5. Environmental education curriculum evaluation questionnaire: A reliability and validity study

    Science.gov (United States)

    Minner, Daphne Diane

    The intention of this research project was to bridge the gap between social science research and application to the environmental domain through the development of a theoretically derived instrument designed to give educators a template by which to evaluate environmental education curricula. The theoretical base for instrument development was provided by several developmental theories such as Piaget's theory of cognitive development, Developmental Systems Theory, Life-span Perspective, as well as curriculum research within the area of environmental education. This theoretical base fueled the generation of a list of components which were then translated into a questionnaire with specific questions relevant to the environmental education domain. The specific research question for this project is: Can a valid assessment instrument based largely on human development and education theory be developed that reliably discriminates high, moderate, and low quality in environmental education curricula? The types of analyses conducted to answer this question were interrater reliability (percent agreement, Cohen's Kappa coefficient, Pearson's Product-Moment correlation coefficient), test-retest reliability (percent agreement, correlation), and criterion-related validity (correlation). Face validity and content validity were also assessed through thorough reviews. Overall results indicate that 29% of the questions on the questionnaire demonstrated a high level of interrater reliability and 43% of the questions demonstrated a moderate level of interrater reliability. Seventy-one percent of the questions demonstrated a high test-retest reliability and 5% a moderate level. Fifty-five percent of the questions on the questionnaire were reliable (high or moderate) both across time and raters. Only eight questions (8%) did not show either interrater or test-retest reliability. The global overall rating of high, medium, or low quality was reliable across both coders and time, indicating

  6. The Surgical Safety Checklist and Teamwork Coaching Tools: a study of inter-rater reliability.

    Science.gov (United States)

    Huang, Lyen C; Conley, Dante; Lipsitz, Stu; Wright, Christopher C; Diller, Thomas W; Edmondson, Lizabeth; Berry, William R; Singer, Sara J

    2014-08-01

    To assess the inter-rater reliability (IRR) of two novel observation tools for measuring surgical safety checklist performance and teamwork. Data surgical safety checklists can promote adherence to standards of care and improve teamwork in the operating room. Their use has been associated with reductions in mortality and other postoperative complications. However, checklist effectiveness depends on how well they are performed. Authors from the Safe Surgery 2015 initiative developed a pair of novel observation tools through literature review, expert consultation and end-user testing. In one South Carolina hospital participating in the initiative, two observers jointly attended 50 surgical cases and independently rated surgical teams using both tools. We used descriptive statistics to measure checklist performance and teamwork at the hospital. We assessed IRR by measuring percent agreement, Cohen's κ, and weighted κ scores. The overall percent agreement and κ between the two observers was 93% and 0.74 (95% CI 0.66 to 0.79), respectively, for the Checklist Coaching Tool and 86% and 0.84 (95% CI 0.77 to 0.90) for the Surgical Teamwork Tool. Percent agreement for individual sections of both tools was 79% or higher. Additionally, κ scores for six of eight sections on the Checklist Coaching Tool and for two of five domains on the Surgical Teamwork Tool achieved the desired 0.7 threshold. However, teamwork scores were high and variation was limited. There were no significant changes in the percent agreement or κ scores between the first 10 and last 10 cases observed. Both tools demonstrated substantial IRR and required limited training to use. These instruments may be used to observe checklist performance and teamwork in the operating room. However, further refinement and calibration of observer expectations, particularly in rating teamwork, could improve the utility of the tools. Published by the BMJ Publishing Group Limited. For permission to use (where not already

  7. Intra- and interrater reliability of the Chicago Classification of achalasia subtypes in pediatric high-resolution esophageal manometry (HRM) recordings.

    Science.gov (United States)

    Singendonk, M M J; Rosen, R; Oors, J; Rommel, N; van Wijk, M P; Benninga, M A; Nurko, S; Omari, T I

    2017-11-01

    Subtyping achalasia by high-resolution manometry (HRM) is clinically relevant as response to therapy and prognosis have shown to vary accordingly. The aim of this study was to assess inter- and intrarater reliability of diagnosing achalasia and achalasia subtyping in children using the Chicago Classification (CC) V3.0. Six observers analyzed 40 pediatric HRM recordings (22 achalasia and 18 non-achalasia) twice by using dedicated analysis software (ManoView 3.0, Given Imaging, Los Angeles, CA, USA). Integrated relaxation pressure (IRP4s), distal contractile integral (DCI), intrabolus pressurization pattern (IBP), and distal latency (DL) were extracted and analyzed hierarchically. Cohen's κ (2 raters) and Fleiss' κ (>2 raters) and the intraclass correlation coefficient (ICC) were used for categorical and ordinal data, respectively. Based on the results of dedicated analysis software only, intra- and interrater reliability was excellent and moderate (κ=0.89 and κ=0.52, respectively) for differentiating achalasia from non-achalasia. For subtyping achalasia, reliability decreased to substantial and fair (κ=0.72 and κ=0.28, respectively). When observers were allowed to change the software-driven diagnosis according to their own interpretation of the manometric patterns, intra- and interrater reliability increased for diagnosing achalasia (κ=0.98 and κ=0.92, respectively) and for subtyping achalasia (κ=0.79 and κ=0.58, respectively). Intra- and interrater agreement for diagnosing achalasia when using HRM and the CC was very good to excellent when results of automated analysis software were interpreted by experienced observers. More variability was seen when relying solely on the software-driven diagnosis and for subtyping achalasia. Therefore, diagnosing and subtyping achalasia should be performed in pediatric motility centers with significant expertise. © 2017 John Wiley & Sons Ltd.

  8. Qualitative soil moisture assessment in semi-arid Africa - the role of experience and training on inter-rater reliability

    Science.gov (United States)

    Rinderer, M.; Komakech, H. C.; Müller, D.; Wiesenberg, G. L. B.; Seibert, J.

    2015-08-01

    Soil and water management is particularly relevant in semi-arid regions to enhance agricultural productivity. During periods of water scarcity, soil moisture differences are important indicators of the soil water deficit and are traditionally used for allocating water resources among farmers of a village community. Here we present a simple, inexpensive soil wetness classification scheme based on qualitative indicators which one can see or touch on the soil surface. It incorporates the local farmers' knowledge on the best soil moisture conditions for seeding and brick making in the semi-arid environment of the study site near Arusha, Tanzania. The scheme was tested twice in 2014 with farmers, students and experts (April: 40 persons, June: 25 persons) for inter-rater reliability, bias of individuals and functional relation between qualitative and quantitative soil moisture values. During the test in April farmers assigned the same wetness class in 46 % of all cases, while students and experts agreed on about 60 % of all cases. Students who had been trained in how to apply the method gained higher inter-rater reliability than their colleagues with only a basic introduction. When repeating the test in June, participants were given improved instructions, organized in small subgroups, which resulted in a higher inter-rater reliability among farmers. In 66 % of all classifications, farmers assigned the same wetness class and the spread of class assignments was smaller. This study demonstrates that a wetness classification scheme based on qualitative indicators is a robust tool and can be applied successfully regardless of experience in crop growing and education level when an in-depth introduction and training is provided. The use of a simple and clear layout of the assessment form is important for reliable wetness class assignments.

  9. Qualitative soil moisture assessment in semi-arid Africa: the role of experience and training on inter-rater reliability

    Science.gov (United States)

    Rinderer, M.; Komakech, H.; Müller, D.; Seibert, J.

    2015-03-01

    Soil and water management is particularly relevant in semi-arid regions to enhance agricultural productivity. During periods of water scarcity soil moisture differences are important indicators of the soil water deficit and are traditionally used for allocating water resources among farmers of a village community. Here we present a simple, inexpensive soil wetness classification scheme based on qualitative indicators which one can see or touch on the soil surface. It incorporates the local farmers' knowledge on the best soil moisture conditions for seeding and brick making in the semi-arid environment of the study site near Arusha, Tanzania. The scheme was tested twice in 2014 with farmers, students and experts (April: 40 persons, June: 25 persons) for inter-rater reliability, bias of individuals and functional relation between qualitative and quantitative soil moisture values. During the test in April farmers assigned the same wetness class in 46% of all cases while students and experts agreed in about 60% of all cases. Students who had been trained in how to apply the method gained higher inter-rater reliability than their colleagues with only a basic introduction. When repeating the test in June, participants were given improved instructions, organized in small sub-groups, which resulted in a higher inter-rater reliability among farmers. In 66% of all classifications farmers assigned the same wetness class and the spread of class assignments was smaller. This study demonstrates that a wetness classification scheme based on qualitative indicators is a robust tool and can be applied successfully regardless of experience in crop growing and education level when an in-depth introduction and training is provided. The use of a simple and clear layout of the assessment form is important for reliable wetness class assignments.

  10. Validity and reliability of balance assessment software using the Nintendo Wii balance board: usability and validation.

    Science.gov (United States)

    Park, Dae-Sung; Lee, GyuChang

    2014-06-10

    A balance test provides important information such as the standard to judge an individual's functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment.

  11. Interrater reliability of the Saint-Anne Dargassies Scale in assessing the neurological patterns of healthy preterm newborns

    Directory of Open Access Journals (Sweden)

    Carla Ismirna Santos Alves

    Full Text Available Abstract Objectives: to assess the interrater reliability of the Saint-Anne Dargassies Scale in assessing neurological patterns of healthy preterm newborns. Methods: twenty preterm newborns met the inclusion criteria for participation in this prospective study. The neurologic examination was performed using the Saint-Anne Dargassies Scale, showing normal serial cranial ultrasound examination. In order to test the reliability, the study was structured as follows: group I (rater 1/physiotherapist; rater 2/neonatologist; group II (rater 3/physiotherapist; rater 4/child neurologist and the gold standard (expert and professor in pediatric neurology. Results: high interrater agreement was observed between groups I - II compared with the gold standard in assessing postural pattern (p<0.01. Regarding the assessment ofprimitive reflexes, greater agreement was observed in the evaluation of palmar grasp reflex and Moro reflex (p< 0.01 for group I compared with the gold standard. An analysis of tone demonstrated heterogeneous agreement, without compromising the reliability of the scale. The probability of equality between measurements of head circumference in the two groups, compared with the gold standard, was observed. Conclusions: the Saint-Anne Dargassies Scale demonstrated high reliability and homogeneity with significant power of reproducibility and may be capable to identify preterm newborns suspected of having neurological deficits.

  12. Test–re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females

    OpenAIRE

    Chris Beardsley; Tim Egerton; Brendon Skinner

    2016-01-01

    Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females.\\ud \\ud Method. The inter-rater reliability and test–re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart.\\ud \\ud Results. For measuring pel...

  13. Exploration of the (Interrater) Reliability and Latent Factor Structure of the Alcohol Use Disorders Identification Test (AUDIT) and the Drug Use Disorders Identification Test (DUDIT) in a Sample of Dutch Probationers.

    Science.gov (United States)

    Hildebrand, Martin; Noteborn, Mirthe G C

    2015-01-01

    The use of brief, reliable, valid, and practical measures of substance use is critical for conducting individual (risk and need) assessments in probation practice. In this exploratory study, the basic psychometric properties of the Alcohol Use Disorders Identification Test (AUDIT) and the Drug Use Disorders Identification Test (DUDIT) are evaluated. The instruments were administered as an oral interview instead of a self-report questionnaire. The sample comprised 383 offenders (339 men, 44 women). A subset of 56 offenders (49 men, 7 women) participated in the interrater reliability study. Data collection took place between September 2011 and November 2012. Overall, both instruments have acceptable levels of interrater reliability for total scores and acceptable to good interrater reliabilities for most of the individual items. Confirmatory factor analyses (CFA) indicated that the a priori one-, two- and three-factor solutions for the AUDIT did not fit the observed data very well. Principal axis factoring (PAF) supported a two-factor solution for the AUDIT that included a level of alcohol consumption/consequences factor (Factor 1) and a dependence factor (Factor 2), with both factors explaining substantial variance in AUDIT scores. For the DUDIT, CFA and PAF suggest that a one-factor solution is the preferred model (accounting for 62.61% of total variance). The Dutch language versions of the AUDIT and the DUDIT are reliable screening instruments for use with probationers and both instruments can be reliably administered by probation officers in probation practice. However, future research on concurrent and predictive validity is warranted.

  14. Binge Eating Disorder: Reliability and Validity of a New Diagnostic Category.

    Science.gov (United States)

    Brody, Michelle L.; And Others

    1994-01-01

    Examined reliability and validity of binge eating disorder (BED), proposed for inclusion in Diagnostic and Statistical Manual of Mental Disorders (DSM), fourth edition. Interrater reliability of BED diagnosis compared favorably with that of most diagnoses in DSM revised third edition. Study comparing obese individuals with and without BED and…

  15. Reliability and validity of a tool to assess airway management skills in anesthesia trainees

    Directory of Open Access Journals (Sweden)

    Aliya Ahmed

    2016-01-01

    Conclusion: The tool designed to assess bag-mask ventilation and tracheal intubation skills in anesthesia trainees demonstrated excellent inter-rater reliability, fair test-retest reliability, and good construct validity. The authors recommend its use for formative and summative assessment of junior anesthesia trainees.

  16. Reliability and validity of the Wolfram Unified Rating Scale (WURS

    Directory of Open Access Journals (Sweden)

    Nguyen Chau

    2012-11-01

    Full Text Available Abstract Background Wolfram syndrome (WFS is a rare, neurodegenerative disease that typically presents with childhood onset insulin dependent diabetes mellitus, followed by optic atrophy, diabetes insipidus, deafness, and neurological and psychiatric dysfunction. There is no cure for the disease, but recent advances in research have improved understanding of the disease course. Measuring disease severity and progression with reliable and validated tools is a prerequisite for clinical trials of any new intervention for neurodegenerative conditions. To this end, we developed the Wolfram Unified Rating Scale (WURS to measure the severity and individual variability of WFS symptoms. The aim of this study is to develop and test the reliability and validity of the Wolfram Unified Rating Scale (WURS. Methods A rating scale of disease severity in WFS was developed by modifying a standardized assessment for another neurodegenerative condition (Batten disease. WFS experts scored the representativeness of WURS items for the disease. The WURS was administered to 13 individuals with WFS (6-25 years of age. Motor, balance, mood and quality of life were also evaluated with standard instruments. Inter-rater reliability, internal consistency reliability, concurrent, predictive and content validity of the WURS were calculated. Results The WURS had high inter-rater reliability (ICCs>.93, moderate to high internal consistency reliability (Cronbach’s α = 0.78-0.91 and demonstrated good concurrent and predictive validity. There were significant correlations between the WURS Physical Assessment and motor and balance tests (rs>.67, ps>.76, ps=-.86, p=.001. The WURS demonstrated acceptable content validity (Scale-Content Validity Index=0.83. Conclusions These preliminary findings demonstrate that the WURS has acceptable reliability and validity and captures individual differences in disease severity in children and young adults with WFS.

  17. Inter-rater reliability of data elements from a prototype of the Paul Coverdell National Acute Stroke Registry

    Directory of Open Access Journals (Sweden)

    Wehner Susan

    2008-06-01

    Full Text Available Abstract Background The Paul Coverdell National Acute Stroke Registry (PCNASR is a U.S. based national registry designed to monitor and improve the quality of acute stroke care delivered by hospitals. The registry monitors care through specific performance measures, the accuracy of which depends in part on the reliability of the individual data elements used to construct them. This study describes the inter-rater reliability of data elements collected in Michigan's state-based prototype of the PCNASR. Methods Over a 6-month period, 15 hospitals participating in the Michigan PCNASR prototype submitted data on 2566 acute stroke admissions. Trained hospital staff prospectively identified acute stroke admissions, abstracted chart information, and submitted data to the registry. At each hospital 8 randomly selected cases were re-abstracted by an experienced research nurse. Inter-rater reliability was estimated by the kappa statistic for nominal variables, and intraclass correlation coefficient (ICC for ordinal and continuous variables. Factors that can negatively impact the kappa statistic (i.e., trait prevalence and rater bias were also evaluated. Results A total of 104 charts were available for re-abstraction. Excellent reliability (kappa or ICC > 0.75 was observed for many registry variables including age, gender, black race, hemorrhagic stroke, discharge medications, and modified Rankin Score. Agreement was at least moderate (i.e., 0.75 > kappa ≥; 0.40 for ischemic stroke, TIA, white race, non-ambulance arrival, hospital transfer and direct admit. However, several variables had poor reliability (kappa Conclusion The excellent reliability of many of the data elements supports the use of the PCNASR to monitor and improve care. However, the poor reliability for several variables, particularly time-related events in the emergency department, indicates the need for concerted efforts to improve the quality of data collection. Specific recommendations

  18. The intra- and inter-rater reliability of five clinical muscle performance tests in patients with and without neck pain

    Science.gov (United States)

    2013-01-01

    Background This study investigates the reliability of muscle performance tests using cost- and time-effective methods similar to those used in clinical practice. When conducting reliability studies, great effort goes into standardising test procedures to facilitate a stable outcome. Therefore, several test trials are often performed. However, when muscle performance tests are applied in the clinical setting, clinicians often only conduct a muscle performance test once as repeated testing may produce fatigue and pain, thus variation in test results. We aimed to investigate whether cervical muscle performance tests, which have shown promising psychometric properties, would remain reliable when examined under conditions similar to those of daily clinical practice. Methods The intra-rater (between-day) and inter-rater (within-day) reliability was assessed for five cervical muscle performance tests in patients with (n = 33) and without neck pain (n = 30). The five tests were joint position error, the cranio-cervical flexion test, the neck flexor muscle endurance test performed in supine and in a 45°-upright position and a new neck extensor test. Results Intra-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.48-0.82), the cranio-cervical flexion test (ICC ≥ 0.69), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.68) and in a 45°-upright position (ICC ≥ 0.41) with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement (ICC = 0.14-0.41). Likewise, inter-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.51-0.75), the cranio-cervical flexion test (ICC ≥ 0.85), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.70) and in a 45°-upright position (ICC ≥ 0.56). However, only slight to fair agreement was found for the neck extensor test (ICC

  19. Inter-rater reliability of h-index scores calculated by Web of Science and Scopus for clinical epidemiology scientists.

    Science.gov (United States)

    Walker, Benjamin; Alavifard, Sepand; Roberts, Surain; Lanes, Andrea; Ramsay, Tim; Boet, Sylvain

    2016-06-01

    We investigated the inter-rater reliability of Web of Science (WoS) and Scopus when calculating the h-index of 25 senior scientists in the Clinical Epidemiology Program of the Ottawa Hospital Research Institute. Bibliometric information and the h-indices for the subjects were computed by four raters using the automatic calculators in WoS and Scopus. Correlation and agreement between ratings was assessed using Spearman's correlation coefficient and a Bland-Altman plot, respectively. Data could not be gathered from Google Scholar due to feasibility constraints. The Spearman's rank correlation between the h-index of scientists calculated with WoS was 0.81 (95% CI 0.72-0.92) and with Scopus was 0.95 (95% CI 0.92-0.99). The Bland-Altman plot showed no significant rater bias in WoS and Scopus; however, the agreement between ratings is higher in Scopus compared to WoS. Our results showed a stronger relationship and increased agreement between raters when calculating the h-index of a scientist using Scopus compared to WoS. The higher inter-rater reliability and simple user interface used in Scopus may render it the more effective database when calculating the h-index of senior scientists in epidemiology. © 2016 Health Libraries Group.

  20. Computerized back postural assessment in physiotherapy practice: Intra-rater and inter-rater reliability of the MIDAS system.

    Science.gov (United States)

    McAlpine, R T; Bettany-Saltikov, J A; Warren, J G

    2009-01-01

    Assessment of spinal posture during physiotherapy practice is routine, yet few objective measures exist to this end. The Middlesbrough Integrated Digital Assessment System (MIDAS) is a low cost portable system able to record 3D information on posture. The purpose of this study was to assess both the intra-rater and inter-rater reliability of the MIDAS system. Twenty-five healthy subjects were recruited. A repeated measures design was used to record fifteen pre-palpated landmarks on the back of each subject. To limit the sources of variability, the principal researcher palpated the landmarks for each subject. Each of three raters took two measurements on each subject in a standardized upright posture. X (medio-lateral), Y (antero-posterior) and Z (height) landmark positions were recorded via a computer interface. Both intra-rater agreement (mean ICCs - rater 1 r=0.970, rater 2 r=0.965 and rater 3 r=0.965, pMIDAS demonstrated both high inter-rater and intra-rater reliability and provides an objective method for the assessment of posture in physiotherapy practice.

  1. Inter-rater reliability and agreement of the 6-minute walk test in females with hip fractures

    DEFF Research Database (Denmark)

    Overgaard, Jan; Larsen, Camilla Marie; Tange Kristensen, Morten

    physiotherapy students independently examined (randomized order) a convenient sample of 20 participants; their assessments were separated by two days, and testing followed instructions from the American Thoracic Society. Hip pain was assessed with the Verbal Ranking Scale. Participants (all women) with a mean...... (SD) age of 78.1 ± 5.9 years performed the test within a mean of 31.5 ± 5.8 days post-surgery; 10 had a cervical and 10 a trochanteric fracture. Excellent inter-rater reliability; ICC2.1 = 0.92 (95% CI, 0.81 - 0.97) was found, and the standard error of measurement (SEM) and smallest real difference.......6 meters longer, at the second trial (P = 0.002). Participants with moderate hip fracture-related pain walked a shorter distance than those with no or light pain during the first test (P = 0.04), while this was not the case during the second (P = 0.25). Excellent inter-rater reliability was found...

  2. Interrater Reliability of the Categorization of Late Radiographic Changes After Lung Stereotactic Body Radiation Therapy

    Energy Technology Data Exchange (ETDEWEB)

    Faruqi, Salman [Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON (Canada); Giuliani, Meredith E., E-mail: meredith.giuliani@rmp.uhn.on.ca [Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON (Canada); Raziee, Hamid; Yap, Mei Ling [Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON (Canada); Roberts, Heidi [Department of Radiology, University Health Network, Toronto, Ontario (Canada); Le, Lisa W. [Department of Biostatistics, Princess Margaret Cancer Centre, Toronto, Ontario (Canada); Brade, Anthony; Cho, John; Sun, Alexander; Bezjak, Andrea; Hope, Andrew J. [Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON (Canada)

    2014-08-01

    Purpose: Radiographic changes after lung stereotactic body radiation therapy (SBRT) have been categorized into 4 groups: modified conventional pattern (A), mass-like fibrosis; (B), scar-like fibrosis (C), and no evidence of increased density (D). The purpose of this study was to assess the interrater reliability of this categorization system in patients with early-stage non-small cell lung cancer (NSCLC). Methods and Materials: Seventy-seven patients were included in this study, all treated with SBRT for early-stage (T1/2) NSCLC at a single institution, with a minimum follow-up of 6 months. Six experienced clinicians familiar with post-SBRT radiographic changes scored the serial posttreatment CT images independently in a blinded fashion. The proportion of patients categorized as A, B, C, or D at each interval was determined. Krippendorff's alpha (KA), Multirater kappa (M-kappa), and Gwet's AC1 (AC1) scores were used to establish interrater reliability. A leave-one-out analysis was performed to demonstrate the variability among raters. Interrater agreement of the first and last 20 patients scored was calculated to explore whether a training effect existed. Results: The number of ratings ranged from 450 at 6 months to 84 at 48 months of follow-up. The proportion of patients in each category was as follows: A, 45%; B, 16%; C, 13%; and D, 26%. KA and M-kappa ranged from 0.17 to 0.34. AC1 measure range was 0.22 to 0.48. KA increased from 0.24 to 0.36 at 12 months with training. The percent agreement for pattern A peaked at 12 month with a 54% chance of having >50% raters in agreement and decreased over time, whereas that for patterns B and C increased over time to a maximum of 20% and 22%, respectively. Conclusion: This post-SBRT radiographic change categorization system has modest interrater agreement, and there is a suggestion of a training effect. Patterns of fibrosis evolve after SBRT and alternative categorization systems should be evaluated.

  3. Inter-rater reliability of three musculoskeletal physical examination techniques used to assess motion in three planes while standing.

    Science.gov (United States)

    Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John

    2009-07-01

    The objective of the study was to measure the reliability between examiners of 3 basic maneuvers of the Total Body Functional Profile physical examination test. The hypothesis was musculoskeletal health care providers of different disciplines could reliably use the 3 basic maneuvers as part of the musculoskeletal physical examination. A prospective observational study was conducted. Twenty-eight adult volunteers were measured on both the left and right side by 2 independent raters on a single occasion. The subjects were recruited through advertisements placed by the orthopedic department at a tertiary university. Twenty-eight volunteers were recruited and completed the study. The volunteers were between the ages of 18 and 51 years of age, had no symptoms in the lower extremity or spine, had no previous history of surgery or tumor involving the lower extremity, and no medical conditions that would preclude participation. On a single occasion, 2 examiners per 1 volunteer were blinded to their own and each others' measurements. Each examiner assessed the distance of frontal and sagittal plane lunge and angle of motion for transverse plane testing. Inter-rater agreement is expressed with intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals (CIs). The difference between raters is reported with 95% CIs. Baseline demographics, University of California Los Angeles (UCLA), and Harris hip questionnaires were completed by all participants. The UCLA and Harris hip scores showed no significant activity restrictions or pain limitations in all participants. The inter-rater reliability for sagittal, frontal, and transverse plane matrix testing was good with ICCs of 0.86 (95% CI 0.77-0.91), 0.90 (95% CI 0.84-0.94), and 0.85 (95% CI 0.75-0.91), respectively. The rater reliability between disciplines for transverse, sagittal, and frontal plane matrix testing was good with ICCs of 0.89 (95% CI 0.80-0.94), 0.88 (95% CI 0.79-0.94), and 0.90 (95% CI 0

  4. [Evaluation of Suicide Risk Levels in Hospitals: Validity and Reliability Tests].

    Science.gov (United States)

    Macagnino, Sandro; Steinert, Tilman; Uhlmann, Carmen

    2018-05-01

    Examination of in-hospital suicide risk levels concerning their validity and their reliability. The internal suicide risk levels were evaluated in a cross sectional study of in 163 inpatients. A reliability check was performed via determining interrater-reliability of senior physician, therapist and the responsible nurse. Within the scope of the validity check, we conducted analyses of criterion validity and construct validity. For the total sample an "acceptable" to "good" interrater-reliability (Kendalls W = .77) of suicide risk levels were obtained. Schizophrenic disorders showed the lowest values, for personality disorders we found the highest level of interrater-reliability. When examining the criterion validity, Item-9 of the BDI-II is substantial correlated to our suicide risk levels (ρ m  = .54, p validity check, affective disorders showed the highest correlation (ρ = .77), compatible also with "convergent validity". They differed with schizophrenic disorders which showed the least concordance (ρ = .43). In-hospital suicide risk levels may represent an important contribution to the assessment of suicidal behavior of inpatients experiencing psychiatric treatment due to their overall good validity and reliability. © Georg Thieme Verlag KG Stuttgart · New York.

  5. Rating scales for dystonia in cerebral palsy: reliability and validity.

    Science.gov (United States)

    Monbaliu, E; Ortibus, E; Roelens, F; Desloovere, K; Deklerck, J; Prinzie, P; de Cock, P; Feys, H

    2010-06-01

    This study investigated the reliability and validity of the Barry-Albright Dystonia Scale (BADS), the Burke-Fahn-Marsden Movement Scale (BFMMS), and the Unified Dystonia Rating Scale (UDRS) in patients with bilateral dystonic cerebral palsy (CP). Three raters independently scored videotapes of 10 patients (five males, five females; mean age 13 y 3 mo, SD 5 y 2 mo, range 5-22 y). One patient each was classified at levels I-IV in the Gross Motor Function Classification System and six patients were classified at level V. Reliability was measured by (1) intraclass correlation coefficient (ICC) for interrater reliability, (2) standard error of measurement (SEM) and smallest detectable difference (SDD), and (3) Cronbach's alpha for internal consistency. Validity was assessed by Pearson's correlations among the three scales used and by content analysis. Moderate to good interrater reliability was found for total scores of the three scales (ICC: BADS=0.87; BFMMS=0.86; UDRS=0.79). However, many subitems showed low reliability, in particular for the UDRS. SEM and SDD were respectively 6.36% and 17.72% for the BADS, 9.88% and 27.39% for the BFMMS, and 8.89% and 24.63% for the UDRS. High internal consistency was found. Pearson's correlations were high. Content validity showed insufficient accordance with the new CP definition and classification. Our results support the internal consistency and concurrent validity of the scales; however, taking into consideration the limitations in reliability, including the large SDD values and the content validity, further research on methods of assessment of dystonia is warranted.

  6. Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

    Science.gov (United States)

    van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M.

    2018-01-01

    In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…

  7. Intra- and inter-rater reliability of movement and palpation tests in patients with neck pain: A systematic review.

    Science.gov (United States)

    Jonsson, Anders; Rasmussen-Barr, Eva

    2018-03-01

    Neck pain is common and often becomes chronic. Various clinical tests of the cervical spine are used to direct and evaluate treatment. This systematic review aimed to identify studies examining the intra- and/or interrater reliability of tests used in clinical examination of patients with neck pain. A database search up to April 2016 was conducted in PubMed, CINAHL, and AMED. The Quality Appraisal of Reliability Studies Checklist (QAREL) was used to assess risk of bias. Eleven studies were included, comprising tests of active and passive movement and pain evaluating participants with ongoing neck pain. One study was assessed with a low risk of bias, three with medium risk, while the rest were assessed with high risk of bias. The results showed differing reliabilities for the included tests ranging from poor to almost perfect. In conclusion, active movement and pain for pain or mobility overall presented acceptable to very good reliability (Kappa >0.40); while passive intervertebral tests had lower Kappa values, suggesting poor reliability. It may be a coincidence that the studies indicating very good reliability tended to be of higher quality (low to moderate risk of bias), while studies finding poor reliability tended to be of lower quality (high risk of bias). Regardless, the current recommendation from this review would suggest the clinical use of tests with acceptable reliability and avoiding the use of tests that have been shown to not be reliable. Finally, it is critical that all future reliability studies are of higher quality with low risk of bias.

  8. Measurement of the Inter-Rater Reliability Rate Is Mandatory for Improving the Quality of a Medical Database: Experience with the Paulista Lung Cancer Registry.

    Science.gov (United States)

    Lauricella, Leticia L; Costa, Priscila B; Salati, Michele; Pego-Fernandes, Paulo M; Terra, Ricardo M

    2018-06-01

    Database quality measurement should be considered a mandatory step to ensure an adequate level of confidence in data used for research and quality improvement. Several metrics have been described in the literature, but no standardized approach has been established. We aimed to describe a methodological approach applied to measure the quality and inter-rater reliability of a regional multicentric thoracic surgical database (Paulista Lung Cancer Registry). Data from the first 3 years of the Paulista Lung Cancer Registry underwent an audit process with 3 metrics: completeness, consistency, and inter-rater reliability. The first 2 methods were applied to the whole data set, and the last method was calculated using 100 cases randomized for direct auditing. Inter-rater reliability was evaluated using percentage of agreement between the data collector and auditor and through calculation of Cohen's κ and intraclass correlation. The overall completeness per section ranged from 0.88 to 1.00, and the overall consistency was 0.96. Inter-rater reliability showed many variables with high disagreement (>10%). For numerical variables, intraclass correlation was a better metric than inter-rater reliability. Cohen's κ showed that most variables had moderate to substantial agreement. The methodological approach applied to the Paulista Lung Cancer Registry showed that completeness and consistency metrics did not sufficiently reflect the real quality status of a database. The inter-rater reliability associated with κ and intraclass correlation was a better quality metric than completeness and consistency metrics because it could determine the reliability of specific variables used in research or benchmark reports. This report can be a paradigm for future studies of data quality measurement. Copyright © 2018 American College of Surgeons. Published by Elsevier Inc. All rights reserved.

  9. Intra-rater and inter-rater reliability of a medical record abstraction study on transition of care after childhood cancer.

    Directory of Open Access Journals (Sweden)

    Micòl E Gianinazzi

    Full Text Available The abstraction of data from medical records is a widespread practice in epidemiological research. However, studies using this means of data collection rarely report reliability. Within the Transition after Childhood Cancer Study (TaCC which is based on a medical record abstraction, we conducted a second independent abstraction of data with the aim to assess a intra-rater reliability of one rater at two time points; b the possible learning effects between these two time points compared to a gold-standard; and c inter-rater reliability.Within the TaCC study we conducted a systematic medical record abstraction in the 9 Swiss clinics with pediatric oncology wards. In a second phase we selected a subsample of medical records in 3 clinics to conduct a second independent abstraction. We then assessed intra-rater reliability at two time points, the learning effect over time (comparing each rater at two time-points with a gold-standard and the inter-rater reliability of a selected number of variables. We calculated percentage agreement and Cohen's kappa.For the assessment of the intra-rater reliability we included 154 records (80 for rater 1; 74 for rater 2. For the inter-rater reliability we could include 70 records. Intra-rater reliability was substantial to excellent (Cohen's kappa 0-6-0.8 with an observed percentage agreement of 75%-95%. In all variables learning effects were observed. Inter-rater reliability was substantial to excellent (Cohen's kappa 0.70-0.83 with high agreement ranging from 86% to 100%.Our study showed that data abstracted from medical records are reliable. Investigating intra-rater and inter-rater reliability can give confidence to draw conclusions from the abstracted data and increase data quality by minimizing systematic errors.

  10. Examining the interrater reliability of the Hare Psychopathy Checklist-Revised across a large sample of trained raters.

    Science.gov (United States)

    Blais, Julie; Forth, Adelle E; Hare, Robert D

    2017-06-01

    The goal of the current study was to assess the interrater reliability of the Psychopathy Checklist-Revised (PCL-R) among a large sample of trained raters (N = 280). All raters completed PCL-R training at some point between 1989 and 2012 and subsequently provided complete coding for the same 6 practice cases. Overall, 3 major conclusions can be drawn from the results: (a) reliability of individual PCL-R items largely fell below any appropriate standards while the estimates for Total PCL-R scores and factor scores were good (but not excellent); (b) the cases representing individuals with high psychopathy scores showed better reliability than did the cases of individuals in the moderate to low PCL-R score range; and (c) there was a high degree of variability among raters; however, rater specific differences had no consistent effect on scoring the PCL-R. Therefore, despite low reliability estimates for individual items, Total scores and factor scores can be reliably scored among trained raters. We temper these conclusions by noting that scoring standardized videotaped case studies does not allow the rater to interact directly with the offender. Real-world PCL-R assessments typically involve a face-to-face interview and much more extensive collateral information. We offer recommendations for new web-based training procedures. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  11. Development and inter-rater reliability of a standardized verbal instruction manual for the Chinese Geriatric Depression Scale-short form.

    Science.gov (United States)

    Wong, M T P; Ho, T P; Ho, M Y; Yu, C S; Wong, Y H; Lee, S Y

    2002-05-01

    The Geriatric Depression Scale (GDS) is a common screening tool for elderly depression in Hong Kong. This study aimed at (1) developing a standardized manual for the verbal administration and scoring of the GDS-SF, and (2) comparing the inter-rater reliability between the standardized and non-standardized verbal administration of GDS-SF. Two studies were reported. In Study 1, the process of developing the manual was described. In Study 2, we compared the inter-rater reliabilities of GDS-SF scores using the standardized verbal instructions and the traditional non-standardized administration. Results of Study 2 indicated that the standardized procedure in verbal administration and scoring improved the inter-rater reliabilities of GDS-SF. Copyright 2002 John Wiley & Sons, Ltd.

  12. Measuring the quality of life in mild to very severe dementia: testing the inter-rater and intra-rater reliability of the German version of the QUALIDEM.

    Science.gov (United States)

    Dichter, Martin Nikolaus; Schwab, Christian G G; Meyer, Gabriele; Bartholomeyczik, Sabine; Dortmann, Olga; Halek, Margareta

    2014-05-01

    Quality of life (Qol) is an increasingly used outcome measure in dementia research. The QUALIDEM is a dementia-specific and proxy-rated Qol instrument. We aimed to determine the inter-rater and intra-rater reliability in residents with dementia in German nursing homes. The QUALIDEM consists of nine subscales that were applied to a sample of 108 people with mild to severe dementia and six consecutive subscales that were applied to a sample of 53 people with very severe dementia. The proxy raters were 49 registered nurses and nursing assistants. Inter-rater and intra-rater reliability scores were calculated on the subscale and item level. None of the QUALIDEM subscales showed strong inter-rater reliability based on the single-measure Intra-Class Correlation Coefficient (ICC) for absolute agreement ≥ 0.70. Based on the average-measure ICC for four raters, eight subscales for people with mild to severe dementia (care relationship, positive affect, negative affect, restless tense behavior, social relations, social isolation, feeling at home and having something to do) and five subscales for very severe dementia (care relationship, negative affect, restless tense behavior, social relations and social isolation) yielded a strong inter-rater agreement (ICC: 0.72-0.86). All of the QUALIDEM subscales, regardless of dementia severity, showed strong intra-rater agreement. The ICC values ranged between 0.70 and 0.79 for people with mild to severe dementia and between 0.75 and 0.87 for people with very severe dementia. This study demonstrated insufficient inter-rater reliability and sufficient intra-rater reliability for all subscales of both versions of the German QUALIDEM. The degree of inter-rater reliability can be improved by collaborative Qol rating by more than one nurse. The development of a measurement manual with accurate item definitions and a standardized education program for proxy raters is recommended.

  13. Validity and reliability of three definitions of hip osteoarthritis: cross sectional and longitudinal approach

    OpenAIRE

    Reijman, Max; Hazes, Mieke; Pols, Huib; Bernsen, Roos; Koes, Bart; Bierma-Zeinstra, Sita

    2004-01-01

    textabstractOBJECTIVES: To compare the reliability and validity in a large open population of three frequently used radiological definitions of hip osteoarthritis (OA): Kellgren and Lawrence grade, minimal joint space (MJS), and Croft grade; and to investigate whether the validity of the three definitions of hip OA is sex dependent. METHODS: SUBJECTS: from the Rotterdam study (aged > or= 55 years, n = 3585) were evaluated. The inter-rater reliability was tested in a random set of 148 x rays. ...

  14. The interrater reliability of rating non-exercise activity of inpatients with eating disorders using a visual analogue scale.

    Science.gov (United States)

    Mazloum, A; Johnston, M; Lundrigan, M; Birmingham, C L

    2008-12-01

    Non-exercise activity thermogenesis (NEAT) is the energy expended by body movement, other than sleeping, eating or sports-like activities. The obese have been reported to have a lower NEAT (walking, standing, and fidgeting) than controls. We hypothesize that an elevated NEAT could explain why some patients with anorexia nervosa are resistant to weight gain. To evaluate the interrater reliability of a rating of non-exercise activity of inpatients with eating disorders (ED) using a visual analogue scale (VAS). Health care providers were asked to rate the non-exercise activity of inpatients by marking a VAS. Eight patients were individually rated by 10 clinicians. Results were analyzed using the intraclass correlation coefficient (ICC) and Cohen's multi-rater kappa statistic (kappa). The ICC(3,k) was 0.257 (pexercise activity and physiological measurements should be used.

  15. Reliability and Validity of Autism Diagnostic Interview-Revised, Japanese Version

    Science.gov (United States)

    Tsuchiya, Kenji J.; Matsumoto, Kaori; Yagi, Atsuko; Inada, Naoko; Kuroda, Miho; Inokuchi, Eiko; Koyama, Tomonori; Kamio, Yoko; Tsujii, Masatsugu; Sakai, Saeko; Mohri, Ikuko; Taniike, Masako; Iwanaga, Ryoichiro; Ogasahara, Kei; Miyachi, Taishi; Nakajima, Shunji; Tani, Iori; Ohnishi, Masafumi; Inoue, Masahiko; Nomura, Kazuyo; Hagiwara, Taku; Uchiyama, Tokio; Ichikawa, Hironobu; Kobayashi, Shuji; Miyamoto, Ken; Nakamura, Kazuhiko; Suzuki, Katsuaki; Mori, Norio; Takei, Nori

    2013-01-01

    To examine the inter-rater reliability of Autism Diagnostic Interview-Revised, Japanese Version (ADI-R-JV), the authors recruited 51 individuals aged 3-19 years, interviewed by two independent raters. Subsequently, to assess the discriminant and diagnostic validity of ADI-R-JV, the authors investigated 317 individuals aged 2-19 years, who were…

  16. An Assessment of Reliability and Validity of a Rubric for Grading APA-Style Introductions

    Science.gov (United States)

    Stellmack, Mark A.; Konheim-Kalkstein, Yasmine L.; Manor, Julia E.; Massey, Abigail R.; Schmitz, Julie Ann P.

    2009-01-01

    This article describes the empirical evaluation of the reliability and validity of a grading rubric for grading APA-style introductions of undergraduate students. Levels of interrater agreement and intrarater agreement were not extremely high but were similar to values reported in the literature for comparably structured rubrics. Rank-order…

  17. Reliability and validity of food portion size estimation from images using manual flexible digital virtual meshes

    Science.gov (United States)

    The eButton takes frontal images at 4 second intervals throughout the day. A three-dimensional (3D) manually administered wire mesh procedure has been developed to quantify portion sizes from the two-dimensional (2D) images. This paper reports a test of the interrater reliability and validity of use...

  18. Reliability and validity of the de Morton Mobility Index in individuals with sub-acute stroke.

    Science.gov (United States)

    Braun, Tobias; Marks, Detlef; Thiel, Christian; Grüneberg, Christian

    2018-02-04

    To establish the validity and reliability of the de Morton Mobility Index (DEMMI) in patients with sub-acute stroke. This cross-sectional study was performed in a neurological rehabilitation hospital. We assessed unidimensionality, construct validity, internal consistency reliability, inter-rater reliability, minimal detectable change and possible floor and ceiling effects of the DEMMI in adult patients with sub-acute stroke. The study included a total sample of 121 patients with sub-acute stroke. We analysed validity (n = 109) and reliability (n = 51) in two sub-samples. Rasch analysis indicated unidimensionality with an overall fit to the model (chi-square = 12.37, p = 0.577). All hypotheses on construct validity were confirmed. Internal consistency reliability (Cronbach's alpha = 0.94) and inter-rater reliability (intraclass correlation coefficient = 0.95; 95% confidence interval: 0.92-0.97) were excellent. The minimal detectable change with 90% confidence was 13 points. No floor or ceiling effects were evident. These results indicate unidimensionality, sufficient internal consistency reliability, inter-rater reliability, and construct validity of the DEMMI in patients with a sub-acute stroke. Advantages of the DEMMI in clinical application are the short administration time, no need for special equipment and interval level data. The de Morton Mobility Index, therefore, may be a useful performance-based bedside test to measure mobility in individuals with a sub-acute stroke across the whole mobility spectrum. Implications for Rehabilitation The de Morton Mobility Index (DEMMI) is an unidimensional measurement instrument of mobility in individuals with sub-acute stroke. The DEMMI has excellent internal consistency and inter-rater reliability, and sufficient construct validity. The minimal detectable change of the DEMMI with 90% confidence in stroke rehabilitation is 13 points. The lack of any floor or ceiling effects on hospital admission indicates

  19. Palliative sedation: reliability and validity of sedation scales.

    Science.gov (United States)

    Arevalo, Jimmy J; Brinkkemper, Tijn; van der Heide, Agnes; Rietjens, Judith A; Ribbe, Miel; Deliens, Luc; Loer, Stephan A; Zuurmond, Wouter W A; Perez, Roberto S G M

    2012-11-01

    Observer-based sedation scales have been used to provide a measurable estimate of the comfort of nonalert patients in palliative sedation. However, their usefulness and appropriateness in this setting has not been demonstrated. To study the reliability and validity of observer-based sedation scales in palliative sedation. A prospective evaluation of 54 patients under intermittent or continuous sedation with four sedation scales was performed by 52 nurses. Included scales were the Minnesota Sedation Assessment Tool (MSAT), Richmond Agitation-Sedation Scale (RASS), Vancouver Interaction and Calmness Scale (VICS), and a sedation score proposed in the Guideline for Palliative Sedation of the Royal Dutch Medical Association (KNMG). Inter-rater reliability was tested with the intraclass correlation coefficient (ICC) and Cohen's kappa coefficient. Correlations between the scales using Spearman's rho tested concurrent validity. We also examined construct, discriminative, and evaluative validity. In addition, nurses completed a user-friendliness survey. Overall moderate to high inter-rater reliability was found for the VICS interaction subscale (ICC = 0.85), RASS (ICC = 0.73), and KNMG (ICC = 0.71). The largest correlation between scales was found for the RASS and KNMG (rho = 0.836). All scales showed discriminative and evaluative validity, except for the MSAT motor subscale and VICS calmness subscale. Finally, the RASS was less time consuming, clearer, and easier to use than the MSAT and VICS. The RASS and KNMG scales stand as the most reliable and valid among the evaluated scales. In addition, the RASS was less time consuming, clearer, and easier to use than the MSAT and VICS. Further research is needed to evaluate the impact of the scales on better symptom control and patient comfort. Copyright © 2012 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.

  20. Method of Quantifying Size of Retinal Hemorrhages in Eyes with Branch Retinal Vein Occlusion Using 14-Square Grid: Interrater and Intrarater Reliability

    Directory of Open Access Journals (Sweden)

    Yuko Takashima

    2016-01-01

    Full Text Available Purpose. To describe a method of quantifying the size of the retinal hemorrhages in branch retinal vein occlusion (BRVO and to determine the interrater and intrarater reliabilities of these measurements. Methods. Thirty-five fundus photographs from 35 consecutive eyes with BRVO were studied. The fundus images were analyzed with Power-Point® software, and a grid of 14 squares was laid over the fundus image. Raters were asked to judge the percentage of each of the 14 squares that was covered by the hemorrhages, and the average of the 14 squares was taken to be the relative size of the retinal hemorrhage. Results. Interrater reliability between three raters was higher when a grid with 14 squares was used (intraclass correlation coefficient (ICC, 0.96 than that when a box with no grid was used (ICC, 0.78. Intrarater reliability, which was calculated by the retinal hemorrhage area measured on two different days, was also higher (ICC, 0.97 than that with no grid (ICC, 0.86. Interrater reliability for five fundus pictures with poor image quality was also good when a grid with 14 squares was used (ICC, 0.88. Conclusions. Although our method is subjective, excellent interrater and intrarater reliabilities indicate that this method can be adapted for clinical use.

  1. The inter-rater reliability of the incontinence-associated dermatitis intervention tool-D (IADIT-D) between two independent registered nurses of nursing home residents in long-term care facilities.

    Science.gov (United States)

    Braunschmidt, Brigitte; Müller, Gerhard; Jukic-Puntigam, Margareta; Steininger, Alfred

    2013-01-01

    Incontinence-associated dermatitis (IAD) is the clinical manifestation of moisture related skin damage (Beeckman, Woodward, & Gray, 2011). Valid assessment instruments are needed for risk assessment and classification of IAD. Aim of the quantitative-descriptive cross-sectional study was to determine the inter-rater reliability of the item scores of the German Incontinence Associated Dermatitis Intervention Tool (IADIT-D) between two independent assessors of nursing home residents (n = 381) in long-term care facilities. The 19 pairs of assessors consisted of registered nurses. The data analysis was computed first with the calculation of the total percentage of agreement. Because this value is not randomly adjusted, the calculation of the Kappa-coefficients and AC1-Statistic was done as well. The total percentage of the inter-rater agreement was 84% (n = 319). In a second step of analysis, the calculation of all items determined high (kappa = .70) and very high agreement (AC1 = .83) levels, respectively. For the risk assessment (kappa = .82; AC1 = .94), the values amounted to very high agreement levels and for the classification (kappa(w) = .70; AC1 = .76) to high agreement levels. The high to very high agreement values of IADIT-D demonstrate that the items can be regarded as stable in regards to the inter-rater reliability for the use in long-term care facilities. In addition, further validation studies are needed.

  2. Intra- and inter-rater reliability of the Knee Society Knee Score when used by two physiotherapists in patients post total knee arthroplasty

    Directory of Open Access Journals (Sweden)

    S. Gopal

    2010-01-01

    Full Text Available Background and Purpose: It has yet to be shown whether routine physiotherapy plays a role in the rehabilitation of patients post totalknee arthroplasty (Rajan et al 2004. Physiotherapists should be using validoutcome measures to provide evidence of the benefit of their intervention. The aim of this study was to establish the intra and inter-rater reliability of the Knee Society Knee Score, a scoring system developed by Insall et al(1989. The Knee Society Knee Score can be used to assess the integrity of theknee joint of patients undergoing total knee arthroplasty. Since the scoreinvolves clinical testing, the intra-rater reliability of the clinician should be established prior to using the scores as datain clinical research. W here multiple clinicians are involved, inter-rater reliability should also be established.Design: This was a correlation study.Subjects: A  sample of thirty patients post total knee arthroplasty attending the arthroplasty clinic at Johannesburg Hospital between six weeks and twelve months postoperatively.M ethod: Recruited patients were evaluated twice with a time interval of one hour between each assessment. Statistical A nalysis: The intra- and inter-rater reliability were estimated using Intraclass Correlation Coefficient (ICC. R esults: The intra-rater reliability showed excellent reliability (h= 0.95 for Examiner A  and good reliability (h= 0.71for Examiner B. The inter-rater reliability showed moderate reliability (h= 0.67 during test one and h= 0.66 during test two.Conclusion: The KSKS has good intra-rater reliability when tested within a period of one hour. The KSKS demonstrated moderate agreement for inter rater reliability.

  3. A Comparison of Three Methods for the Analysis of Skin Flap Viability: Reliability and Validity.

    Science.gov (United States)

    Tim, Carla Roberta; Martignago, Cintia Cristina Santi; da Silva, Viviane Ribeiro; Dos Santos, Estefany Camila Bonfim; Vieira, Fabiana Nascimento; Parizotto, Nivaldo Antonio; Liebano, Richard Eloin

    2018-05-01

    Objective: Technological advances have provided new alternatives to the analysis of skin flap viability in animal models; however, the interrater validity and reliability of these techniques have yet to be analyzed. The present study aimed to evaluate the interrater validity and reliability of three different methods: weight of paper template (WPT), paper template area (PTA), and photographic analysis. Approach: Sixteen male Wistar rats had their cranially based dorsal skin flap elevated. On the seventh postoperative day, the viable tissue area and the necrotic area of the skin flap were recorded using the paper template method and photo image. The evaluation of the percentage of viable tissue was performed using three methods, simultaneously and independently by two raters. The analysis of interrater reliability and viability was performed using the intraclass correlation coefficient and Bland Altman Plot Analysis was used to visualize the presence or absence of systematic bias in the evaluations of data validity. Results: The results showed that interrater reliability for WPT, measurement of PTA, and photographic analysis were 0.995, 0.990, and 0.982, respectively. For data validity, a correlation >0.90 was observed for all comparisons made between the three methods. In addition, Bland Altman Plot Analysis showed agreement between the comparisons of the methods and the presence of systematic bias was not observed. Innovation: Digital methods are an excellent choice for assessing skin flap viability; moreover, they make data use and storage easier. Conclusion: Independently from the method used, the interrater reliability and validity proved to be excellent for the analysis of skin flaps' viability.

  4. The impact of revised DSM-5 criteria on the relative distribution and inter-rater reliability of eating disorder diagnoses in a residential treatment setting.

    Science.gov (United States)

    Thomas, Jennifer J; Eddy, Kamryn T; Murray, Helen B; Tromp, Marilou D P; Hartmann, Andrea S; Stone, Melissa T; Levendusky, Philip G; Becker, Anne E

    2015-09-30

    This study evaluated the relative distribution and inter-rater reliability of revised DSM-5 criteria for eating disorders in a residential treatment program. Consecutive adolescent and young adult females (N=150) admitted to a residential eating disorder treatment facility were assigned both DSM-IV and DSM-5 diagnoses by a clinician (n=14) via routine clinical interview and a research assessor (n=4) via structured interview. We compared the frequency of diagnostic assignments under each taxonomy and by type of assessor. We evaluated concordance between clinician and researcher assignment through inter-rater reliability kappa and percent agreement. Significantly fewer patients received either clinician or researcher diagnoses of a residual eating disorder under DSM-5 (clinician-12.0%; researcher-31.3%) versus DSM-IV (clinician-28.7%; researcher-59.3%), with the majority of reassigned DSM-IV residual cases reclassified as DSM-5 anorexia nervosa. Researcher and clinician diagnoses showed moderate inter-rater reliability under DSM-IV (κ=.48) and DSM-5 (κ=.57), though agreement for specific DSM-5 other specified feeding or eating disorder (OSFED) presentations was poor (κ=.05). DSM-5 revisions were associated with significantly less frequent residual eating disorder diagnoses, but not with reduced inter-rater reliability. Findings support specific dimensions of clinical utility for revised DSM-5 criteria for eating disorders. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  5. Can Physicians Identify Inappropriate Nuclear Stress Tests? An Examination of Inter-rater Reliability for the 2009 Appropriate Use Criteria for Radionuclide Imaging

    Science.gov (United States)

    Ye, Siqin; Rabbani, LeRoy E.; Kelly, Christopher R.; Kelly, Maureen R.; Lewis, Matthew; Paz, Yehuda; Peck, Clara L.; Rao, Shaline; Bokhari, Sabahat; Weiner, Shepard D.; Einstein, Andrew J.

    2014-01-01

    Background We sought to determine inter-rater reliability of the 2009 Appropriate Use Criteria (AUC) for radionuclide imaging (RNI) and whether physicians at various levels of training can effectively identify nuclear stress tests with inappropriate indications. Methods and Results Four hundred patients were randomly selected from a consecutive cohort of patients undergoing nuclear stress testing at an academic medical center. Raters with different levels of training (including cardiology attending physicians, cardiology fellows, internal medicine hospitalists, and internal medicine interns) classified individual nuclear stress tests using the 2009 AUC. Consensus classification by two cardiologists was considered the operational gold standard, and sensitivity and specificity of individual raters for identifying inappropriate tests was calculated. Inter-rater reliability of the AUC was assessed using Cohen’s kappa statistics for pairs of different raters. The mean age of patients was 61.5 years; 214 (54%) were female. The cardiologists rated 256 (64%) of 400 NSTs as appropriate, 68 (18%) as uncertain, 55 (14%) as inappropriate; 21 (5%) tests were unable to be classified. Inter-rater reliability for non-cardiologist raters was modest (unweighted Cohen’s kappa, 0.51, 95% confidence interval, 0.45 to 0.55). Sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, while specificity ranged from 85% to 97%. Conclusions Inter-rater reliability for the 2009 AUC for RNI is modest, and there is considerable variation in the ability of raters at different levels of training to identify inappropriate tests. PMID:25563660

  6. The Achievement of Therapeutic Objectives Scale: Interrater Reliability and Sensitivity to Change in Short-Term Dynamic Psychotherapy and Cognitive Therapy

    Science.gov (United States)

    Valen, Jakob; Ryum, Truls; Svartberg, Martin; Stiles, Tore C.; McCullough, Leigh

    2011-01-01

    This study examined interrater reliability and sensitivity to change of the Achievement of Therapeutic Objectives Scale (ATOS; McCullough, Larsen, et al., 2003) in short-term dynamic psychotherapy (STDP) and cognitive therapy (CT). The ATOS is a process scale originally developed to assess patients' achievements of treatment objectives in STDP,…

  7. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age.

    Science.gov (United States)

    van Daalen, Emma; Kemner, Chantal; Dietz, Claudine; Swinkels, Sophie H N; Buitelaar, Jan K; van Engeland, Herman

    2009-11-01

    To examine the inter-rater reliability and stability of autism spectrum disorder (ASD) diagnoses made at a very early age in children identified through a screening procedure around 14 months of age. In a prospective design, preschoolers were recruited from a screening study for ASD. The inter-rater reliability of the diagnosis of ASD was measured through an independent assessment of a randomly selected subsample of 38 patients by two other psychiatrists. The diagnoses at 23 months and 42 months of 131 patients, based on the clinical assessment and the diagnostic classifications of standardised instruments, were compared to evaluate stability of the diagnosis of ASD. Inter-rater reliability on a diagnosis of ASD versus non-ASD at 23 months was 87% with a weighted kappa of 0.74 (SE 0.11). The stability of the different diagnoses in the autism spectrum was 63% for autistic disorder, 54% for pervasive developmental disorder, not otherwise specified (PDD-NOS), and 91% for the whole category of ASD. Most diagnostic changes at 42 months were within the autism spectrum from autistic disorder to PDD-NOS and were mainly due to diminished symptom severity. Children who moved outside the ASD category at 42 months made significantly larger gains in cognitive and language skills than children with a stable ASD diagnosis. In conclusion, the inter-rater reliability and stability of the diagnoses of ASD established at 23 months in this population-based sample of very young children are good.

  8. Inter-rater reliability of direct observations of the physical and psychosocial working conditions in eldercare

    DEFF Research Database (Denmark)

    Karstad, Kristina; Rugulies, Reiner; Skotte, Jørgen

    2018-01-01

    The aim of the study was to develop and evaluate the reliability of the "Danish observational study of eldercare work and musculoskeletal disorders" (DOSES) observation instrument to assess physical and psychosocial risk factors for musculoskeletal disorders (MSD) in eldercare work. During 1.5 ye...... is appropriate for assessing physical and psychosocial risk factors for MSD among eldercare workers.......The aim of the study was to develop and evaluate the reliability of the "Danish observational study of eldercare work and musculoskeletal disorders" (DOSES) observation instrument to assess physical and psychosocial risk factors for musculoskeletal disorders (MSD) in eldercare work. During 1...

  9. Inter-rater reliability of direct observations of the physical and psychosocial working conditions in eldercare

    DEFF Research Database (Denmark)

    Karstad, Kristina; Rugulies, Reiner; Skotte, Jørgen

    2018-01-01

    The aim of the study was to develop and evaluate the reliability of the "Danish observational study of eldercare work and musculoskeletal disorders" (DOSES) observation instrument to assess physical and psychosocial risk factors for musculoskeletal disorders (MSD) in eldercare work. During 1...... is appropriate for assessing physical and psychosocial risk factors for MSD among eldercare workers....

  10. Inter-Rater Reliability and Downstream Financial Implications of Electrocardiography Screening in Young Athletes.

    Science.gov (United States)

    Dhutia, Harshil; Malhotra, Aneil; Yeo, Tee Joo; Ster, Irina Chis; Gabus, Vincent; Steriotis, Alexandros; Dores, Helder; Mellor, Greg; García-Corrales, Carmen; Ensam, Bode; Jayalapan, Viknesh; Ezzat, Vivienne Anne; Finocchiaro, Gherardo; Gati, Sabiha; Papadakis, Michael; Tome-Esteban, Maria; Sharma, Sanjay

    2017-08-01

    Preparticipation screening for cardiovascular disease in young athletes with electrocardiography is endorsed by the European Society of Cardiology and several major sporting organizations. One of the concerns of the ECG as a screening test in young athletes relates to the potential for variation in interpretation. We investigated the degree of variation in ECG interpretation in athletes and its financial impact among cardiologists of differing experience. Eight cardiologists (4 with experience in screening athletes) each reported 400 ECGs of consecutively screened young athletes according to the 2010 European Society of Cardiology recommendations, Seattle criteria, and refined criteria. Cohen κ coefficient was used to calculate interobserver reliability. Cardiologists proposed secondary investigations after ECG interpretation, the costs of which were based on the UK National Health Service tariffs. Inexperienced cardiologists were more likely to classify an ECG as abnormal compared with experienced cardiologists (odds ratio, 1.44; 95% confidence interval, 1.03-2.02). Modification of ECG interpretation criteria improved interobserver reliability for categorizing an ECG as abnormal from poor (2010 European Society of Cardiology recommendations; κ=0.15) to moderate (refined criteria; κ=0.41) among inexperienced cardiologists; however, interobserver reliability was moderate for all 3 criteria among experienced cardiologists (κ=0.40-0.53). Inexperienced cardiologists were more likely to refer athletes for further evaluation compared with experienced cardiologists (odds ratio, 4.74; 95% confidence interval, 3.50-6.43) with poorer interobserver reliability (κ=0.22 versus κ=0.47). Interobserver reliability for secondary investigations after ECG interpretation ranged from poor to fair among inexperienced cardiologists (κ=0.15-0.30) and fair to moderate among experienced cardiologists (κ=0.21-0.46). The cost of cardiovascular evaluation per athlete was $175 (95

  11. Towards criterion validity in classroom language analysis: methodological constraints of metadiscourse and inter-rater agreement

    Directory of Open Access Journals (Sweden)

    Douglas Altamiro Consolo

    2001-02-01

    Full Text Available

    This paper reports on a process to validate a revised version of a system for coding classroom discourse in foreign language lessons, a context in which the dual role of language (as content and means of communication and the speakers' specific pedagogical aims lead to a certain degree of ambiguity in language analysis. The language used by teachers and students has been extensively studied, and a framework of concepts concerning classroom discourse well-established. Models for coding classroom language need, however, to be revised when they are applied to specific research contexts. The application and revision of an initial framework can lead to the development of earlier models, and to the re-definition of previously established categories of analysis that have to be validated. The procedures followed to validate a coding system are related here as guidelines for conducting research under similar circumstances. The advantages of using instruments that incorporate two types of data, that is, quantitative measures and qualitative information from raters' metadiscourse, are discussed, and it is suggested that such procedure can contribute to the process of validation itself, towards attaining reliability of research results, as well as indicate some constraints of the adopted research methodology.

  12. Inter-rater reliability of the German version of the Nurses' Global Assessment of Suicide Risk scale.

    Science.gov (United States)

    Kozel, Bernd; Grieser, Manuela; Abderhalden, Christoph; Cutcliffe, John R

    2016-10-01

    In comparison to the general population, the suicide rates of psychiatric inpatient populations in Germany and Switzerland are very high. An important preventive contribution to the lowering of the suicide rates in mental health care is to ensure that the risk of suicide of psychiatric inpatients is assessed as accurately as possible. While risk-assessment instruments can serve an important function in determining such risk, very few have been translated to German. Therefore, in the present study, we reported on the German version of Nurses' Global Assessment of Suicide Risk (NGASR) scale. After translating the original instrument into German and pretesting the German version, we tested the inter-rater reliability of the instrument. Twelve video case studies were evaluated by 13 raters with the NGASR scale in a 'laboratory' trial. In each case, the observer's agreement was calculated for the single items, the overall scale, the risk levels, and the sum scores. The statistical data analysis was conducted with kappa and AC1 statistics for dichotomous (items, scale) scales. A high-to-very high observers' agreement (AC1: 0.62-1.00, kappa: 0.00-1.00) was determined for 16 items of the German version of the NGASR scale. We conclude that the German version of the NGASR scale is a reliable instrument for evaluating risk factors for suicide. A reliable application in the clinical practise appears to be enhanced by training in the use of the instrument and the right implementation instructions. © 2016 Australian College of Mental Health Nurses Inc.

  13. Reliability and Validity of the Dyadic Observed Communication Scale (DOCS).

    Science.gov (United States)

    Hadley, Wendy; Stewart, Angela; Hunter, Heather L; Affleck, Katelyn; Donenberg, Geri; Diclemente, Ralph; Brown, Larry K

    2013-02-01

    We evaluated the reliability and validity of the Dyadic Observed Communication Scale (DOCS) coding scheme, which was developed to capture a range of communication components between parents and adolescents. Adolescents and their caregivers were recruited from mental health facilities for participation in a large, multi-site family-based HIV prevention intervention study. Seventy-one dyads were randomly selected from the larger study sample and coded using the DOCS at baseline. Preliminary validity and reliability of the DOCS was examined using various methods, such as comparing results to self-report measures and examining interrater reliability. Results suggest that the DOCS is a reliable and valid measure of observed communication among parent-adolescent dyads that captures both verbal and nonverbal communication behaviors that are typical intervention targets. The DOCS is a viable coding scheme for use by researchers and clinicians examining parent-adolescent communication. Coders can be trained to reliably capture individual and dyadic components of communication for parents and adolescents and this complex information can be obtained relatively quickly.

  14. The interrater and test-retest reliability of the Home Falls and Accidents Screening Tool (HOME FAST) in Malaysia: Using raters with a range of professional backgrounds.

    Science.gov (United States)

    Romli, Muhammad Hibatullah; Mackenzie, Lynette; Lovarini, Meryl; Tan, Maw Pin; Clemson, Lindy

    2017-06-01

    Falls can be a devastating issue for older people living in the community, including those living in Malaysia. Health professionals and community members have a responsibility to ensure that older people have a safe home environment to reduce the risk of falls. Using a standardised screening tool is beneficial to intervene early with this group. The Home Falls and Accidents Screening Tool (HOME FAST) should be considered for this purpose; however, its use in Malaysia has not been studied. Therefore, the aim of this study was to evaluate the interrater and test-retest reliability of the HOME FAST with multiple professionals in the Malaysian context. A cross-sectional design was used to evaluate interrater reliability where the HOME FAST was used simultaneously in the homes of older people by 2 raters and a prospective design was used to evaluate test-retest reliability with a separate group of older people at different times in their homes. Both studies took place in an urban area of Kuala Lumpur. Professionals from 9 professional backgrounds participated as raters in this study, and a group of 51 community older people were recruited for the interrater reliability study and another group of 30 for the test-retest reliability study. The overall agreement was moderate for interrater reliability and good for test-retest reliability. The HOME FAST was consistently rated by different professionals, and no bias was found among the multiple raters. The HOME FAST can be used with confidence by a variety of professionals across different settings. The HOME FAST can become a universal tool to screen for home hazards related to falls. © 2017 John Wiley & Sons, Ltd.

  15. Rater reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS).

    Science.gov (United States)

    Baker, Nancy A; Cook, James R; Redfern, Mark S

    2009-01-01

    This paper describes the inter-rater and intra-rater reliability, and the concurrent validity of an observational instrument, the Keyboard Personal Computer Style instrument (K-PeCS), which assesses stereotypical postures and movements associated with computer keyboard use. Three trained raters independently rated the video clips of 45 computer keyboard users to ascertain inter-rater reliability, and then re-rated a sub-sample of 15 video clips to ascertain intra-rater reliability. Concurrent validity was assessed by comparing the ratings obtained using the K-PeCS to scores developed from a 3D motion analysis system. The overall K-PeCS had excellent reliability [inter-rater: intra-class correlation coefficients (ICC)=.90; intra-rater: ICC=.92]. Most individual items on the K-PeCS had from good to excellent reliability, although six items fell below ICC=.75. Those K-PeCS items that were assessed for concurrent validity compared favorably to the motion analysis data for all but two items. These results suggest that most items on the K-PeCS can be used to reliably document computer keyboarding style.

  16. Inter-rater Reliability for Metrics Scored in a Binary Fashion-Performance Assessment for an Arthroscopic Bankart Repair.

    Science.gov (United States)

    Gallagher, Anthony G; Ryu, Richard K N; Pedowitz, Robert A; Henn, Patrick; Angelo, Richard L

    2018-05-02

    To determine the inter-rater reliability (IRR) of a procedure-specific checklist scored in a binary fashion for the evaluation of surgical skill and whether it meets a minimum level of agreement (≥0.8 between 2 raters) required for high-stakes assessment. In a prospective randomized and blinded fashion, and after detailed assessment training, 10 Arthroscopy Association of North America Master/Associate Master faculty arthroscopic surgeons (in 5 pairs) with an average of 21 years of surgical experience assessed the video-recorded 3-anchor arthroscopic Bankart repair performance of 44 postgraduate year 4 or 5 residents from 21 Accreditation Council for Graduate Medical Education orthopaedic residency training programs from across the United States. No paired scores of resident surgeon performance evaluated by the 5 teams of faculty assessors dropped below the 0.8 IRR level (mean = 0.93; range 0.84-0.99; standard deviation = 0.035). A comparison between the 5 assessor groups with 1 factor analysis of variance showed that there was no significant difference between the groups (P = .205). Pearson's product-moment correlation coefficient revealed a strong and statistically significant negative correlation, that is, -0.856 (P fashion meet the need and can show a high (>80%) IRR. Copyright © 2018 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.

  17. Inter-Rater Reliability and Agreement of the 6-Minute Walk Test in Women With Hip Fracture

    DEFF Research Database (Denmark)

    Larsen, Camilla Marie; Overgaard, Jan; Tange Kristensen, Morten

    MWT in individuals with hip fractures. Methods: Two senior physiotherapy students independently examined (randomized order) a convenient sample of 20 participants; their assessments were separated by two days, and testing followed instructions from the American Thoracic Society(1). Hip pain...... was assessed with the Verbal Ranking Scale. Results: Participants (all women) with a mean (SD) age of 78.1 ± 5.9 years performed the test within a mean of 31.5 ± 5.8 days post-surgery; 10 had a cervical and 10 a trochanteric fracture. Excellent inter-rater reliability; ICC2.1 =0.92 (95% CI, 0.81 - 0...... = -0.196, P = 0.41). On the contrary, participants walked a mean of 21.7 ± 22.6 meters longer, at the second trial (P = 0.002). Participants with moderate hip fracture- related pain walked a shorter distance than those with no or light pain during the first test (P = 0.04), while this was not the case...

  18. Assessment of the nursing care product (APROCENF: a reliability and construct validity study

    Directory of Open Access Journals (Sweden)

    Danielle Fabiana Cucolo

    Full Text Available ABSTRACT Objectives: to verify the reliability and construct validity estimates of the "Assessment of nursing care product" scale (APROCENF and its applicability. Methods: this validation study included a sample of 40 (inter-rater reliability and 172 (construct validity assessments performed by nurses at the end of the work shift at nine inpatient services of a teaching hospital in the Brazilian Southeast. The data were collected between February and September/2014 with interruptions. Cronbach's alpha and Spearman's correlation coefficients were calculated, as well as the intraclass correlation and the weighted kappa index (inter-rater reliability. Exploratory factor analysis was used with principal component extraction and varimax rotation (construct validity. Results: the internal consistency revealed an alpha coefficient of 0.85, item-item correlation ranging between 0.13 and 0.61 and item-total correlation between 0.43 and 0.69. Inter-rater equivalence was obtained and all items evidenced significant factor loadings. Conclusion: this research evidenced the reliability and construct validity of the scale to assess the nursing care product. Its application in nursing practice permits identifying improvements needed in the production process, contributing to management and care decisions.

  19. Validity and Reliability of the Clinical Competency Evaluation Instrument for Use among Physiotherapy Students: Pilot study.

    Science.gov (United States)

    Muhamad, Zailani; Ramli, Ayiesah; Amat, Salleh

    2015-05-01

    The aim of this study was to determine the content validity, internal consistency, test-retest reliability and inter-rater reliability of the Clinical Competency Evaluation Instrument (CCEVI) in assessing the clinical performance of physiotherapy students. This study was carried out between June and September 2013 at University Kebangsaan Malaysia (UKM), Kuala Lumpur, Malaysia. A panel of 10 experts were identified to establish content validity by evaluating and rating each of the items used in the CCEVI with regards to their relevance in measuring students' clinical competency. A total of 50 UKM undergraduate physiotherapy students were assessed throughout their clinical placement to determine the construct validity of these items. The instrument's reliability was determined through a cross-sectional study involving a clinical performance assessment of 14 final-year undergraduate physiotherapy students. The content validity index of the entire CCEVI was 0.91, while the proportion of agreement on the content validity indices ranged from 0.83-1.00. The CCEVI construct validity was established with factor loading of ≥0.6, while internal consistency (Cronbach's alpha) overall was 0.97. Test-retest reliability of the CCEVI was confirmed with a Pearson's correlation range of 0.91-0.97 and an intraclass coefficient correlation range of 0.95-0.98. Inter-rater reliability of the CCEVI domains ranged from 0.59 to 0.97 on initial and subsequent assessments. This pilot study confirmed the content validity of the CCEVI. It showed high internal consistency, thereby providing evidence that the CCEVI has moderate to excellent inter-rater reliability. However, additional refinement in the wording of the CCEVI items, particularly in the domains of safety and documentation, is recommended to further improve the validity and reliability of the instrument.

  20. Validity and Reliability of the Arabic Version of the Positive and Negative Syndrome Scale.

    Science.gov (United States)

    Yehya, Arij; Ghuloum, Suhaila; Mahfoud, Ziyad; Opler, Mark; Khan, Anzalee; Hammoudeh, Samer; Abdulhakam, Abdulmoneim; Al-Mujalli, Azza; Hani, Yahya; Elsherbiny, Reem; Al-Amin, Hassen

    The Positive and Negative Syndrome Scale (PANSS) is widely used for patients with schizophrenia. This scale is reliable and valid. The PANSS was translated and validated in several languages. The aim of this study was to translate and validate the PANSS in the Arab population. The PANSS was translated into formal Arabic language using the back-translation method. 101 Arab patients with schizophrenia and 98 Arabs with no diagnosis of any mental disorder were recruited. The Arabic version of the Mini International Neuropsychiatric Interview (MINI-6) was used as a diagnostic tool to confirm the diagnosis of schizophrenia or rule out any diagnosis for the healthy control group. Reliability of the scale was assessed by calculating internal consistency, interrater reliability and test-retest reliability. Construct validity was assessed using the Arabic version of the MINI-6. PANSS total scores were correlated with the Clinical Global Impression-Severity scale. Our findings showed that the internal consistency was good (0.92). Scores on the PANSS of the patients were much higher than those of the healthy controls. The PANSS showed good interrater reliability and test-retest reliability (0.92 and 0.75, respectively). In comparison with the MINI-6, the PANSS showed good sensitivity and specificity, which implies good construct validity of this version. In conclusion, the Arabic version of the PANSS is a reliable and valid instrument for the assessment of patients with schizophrenia in the Arab population. © 2016 S. Karger AG, Basel.

  1. Reliable and valid assessment of Lichtenstein hernia repair skills.

    Science.gov (United States)

    Carlsen, C G; Lindorff-Larsen, K; Funch-Jensen, P; Lund, L; Charles, P; Konge, L

    2014-08-01

    Lichtenstein hernia repair is a common surgical procedure and one of the first procedures performed by a surgical trainee. However, formal assessment tools developed for this procedure are few and sparsely validated. The aim of this study was to determine the reliability and validity of an assessment tool designed to measure surgical skills in Lichtenstein hernia repair. Key issues were identified through a focus group interview. On this basis, an assessment tool with eight items was designed. Ten surgeons and surgical trainees were video recorded while performing Lichtenstein hernia repair, (four experts, three intermediates, and three novices). The videos were blindly and individually assessed by three raters (surgical consultants) using the assessment tool. Based on these assessments, validity and reliability were explored. The internal consistency of the items was high (Cronbach's alpha = 0.97). The inter-rater reliability was very good with an intra-class correlation coefficient (ICC) = 0.93. Generalizability analysis showed a coefficient above 0.8 even with one rater. The coefficient improved to 0.92 if three raters were used. One-way analysis of variance found a significant difference between the three groups which indicates construct validity, p fashion with the new procedure-specific assessment tool. We recommend this tool for future assessment of trainees performing Lichtenstein hernia repair to ensure that the objectives of competency-based surgical training are met.

  2. Number of test trials needed for performance stability and interrater reliability of the one leg stand test in patients with a major non-traumatic lower limb amputation

    DEFF Research Database (Denmark)

    Kristensen, Morten Tange; Nielsen, Anni Østergaard; Madsen Topp, Ulla

    2014-01-01

    Balance is beneficial for daily functioning of patients with a lower limb amputation and sometimes assessed by the one-leg stand test (OLST). The aims of the study were to examine (1) the number of trials needed to achieve performance stability, (2) the interrater reliability of the OLST in patie......Balance is beneficial for daily functioning of patients with a lower limb amputation and sometimes assessed by the one-leg stand test (OLST). The aims of the study were to examine (1) the number of trials needed to achieve performance stability, (2) the interrater reliability of the OLST...... in patients with a major non-traumatic lower limb amputation, and (3) to provide a test procedure....

  3. Reliability and validity of a nutrition and physical activity environmental self-assessment for child care

    Directory of Open Access Journals (Sweden)

    Ammerman Alice S

    2007-07-01

    Full Text Available Abstract Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for

  4. Inter-rater and test-retest reliability of quality assessments by novice student raters using the Jadad and Newcastle-Ottawa Scales.

    Science.gov (United States)

    Oremus, Mark; Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

    2012-01-01

    Quality assessment of included studies is an important component of systematic reviews. The authors investigated inter-rater and test-retest reliability for quality assessments conducted by inexperienced student raters. Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle-Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13-20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. McMaster Integrative Neuroscience Discovery and Study Program. 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test-retest reliability using ICC(2,1). Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI -0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were -0.14 (95% CI -0.28 to 0.00) to 0.39 (95% CI -0.02 to 0.81) for the NOS cohort and -0.20 (95% CI -0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case-control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test-retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were -0.19 (95% CI -0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case-control, the ICC(2,1)s were 0.46 (95% CI -0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95). Inter-rater reliability was generally poor

  5. Intra- and inter-rater reliability of 3D passive intervertebral motion in subjects with nonspecific neck pain assessed by physical therapy students: A pilot study.

    Science.gov (United States)

    Rossettini, Giacomo; Rondoni, Angie; Lovato, Tommaso; Strobe, Marco; Verzè, Elisa; Vicentini, Marco; Testa, Marco

    2016-06-03

    Passive Intervertebral Movements (PIVMs) are commonly used to assess and treat patients with nonspecific neck pain. Only very few studies have investigated 3D movements until now. This study assessed intra- and inter-rater reliability of three-dimensional (3D) cervical PIVMs performed by physical therapy students in patients with nonspecific neck pain. Thirty-one patients, mean age 47.2 ± 7.2 years, were independently evaluated by 2 physical therapy students. The raters (A and B) assessed mobility, end-feel and pain provocation performing bilaterally the 3D cervical segmental side-bending test (3D CSSB) from levels C2-C3 to C6-C7. Percentage agreement (raw, positive and negative), Cohen's kappa (95% CI), prevalence index and bias index were calculated to estimate intra- and inter-reliability. Intra-rater reliability showed kappa values ranging between fair and substantial (k 0.29-0.80) for pain provocation, mobility and end-feel, with percentage agreements between 61%-90%. Inter-rater reliability presented kappa values ranging between fair and substantial (k 0.22-0.62) for pain provocation, mobility and end-feel, with percentage agreements between 61% and 80%. Intra-rater reliability of 3D PIVMs was superior to inter-rater reliability in patients with nonspecific neck pain. The most repeatable evaluation parameter was pain. However overall poor reliability suggests avoiding the use of these techniques alone to examine patients and measure their outcome. Further studies are needed to investigate PIVMs reliability in combination with other assessment procedure in symptomatic patients.

  6. Transcultural Adaptation of GRID Hamilton Rating Scale For Depression (GRID-HAMD) to Brazilian Portuguese and Evaluation of the Impact of Training Upon Inter-Rater Reliability.

    Science.gov (United States)

    Henrique-Araújo, Ricardo; Osório, Flávia L; Gonçalves Ribeiro, Mônica; Soares Monteiro, Ivandro; Williams, Janet B W; Kalali, Amir; Alexandre Crippa, José; Oliveira, Irismar Reis De

    2014-07-01

    GRID-HAMD is a semi-structured interview guide developed to overcome flaws in HAM-D, and has been incorporated into an increasing number of studies. Carry out the transcultural adaptation of GRID-HAMD into the Brazilian Portuguese language, evaluate the inter-rater reliability of this instrument and the training impact upon this measure, and verify the raters' opinions of said instrument. The transcultural adaptation was conducted by appropriate methodology. The measurement of inter-rater reliability was done by way of videos that were evaluated by 85 professionals before and after training for the use of this instrument. The intraclass correlation coefficient (ICC) remained between 0.76 and 0.90 for GRID-HAMD-21 and between 0.72 and 0.91 for GRID-HAMD-17. The training did not have an impact on the ICC, except for a few groups of participants with a lower level of experience. Most of the participants showed high acceptance of GRID-HAMD, when compared to other versions of HAM-D. The scale presented adequate inter-rater reliability even before training began. Training did not have an impact on this measure, except for a few groups with less experience. GRID-HAMD received favorable opinions from most of the participants.

  7. Reliability and validity in a nutshell.

    Science.gov (United States)

    Bannigan, Katrina; Watson, Roger

    2009-12-01

    To explore and explain the different concepts of reliability and validity as they are related to measurement instruments in social science and health care. There are different concepts contained in the terms reliability and validity and these are often explained poorly and there is often confusion between them. To develop some clarity about reliability and validity a conceptual framework was built based on the existing literature. The concepts of reliability, validity and utility are explored and explained. Reliability contains the concepts of internal consistency and stability and equivalence. Validity contains the concepts of content, face, criterion, concurrent, predictive, construct, convergent (and divergent), factorial and discriminant. In addition, for clinical practice and research, it is essential to establish the utility of a measurement instrument. To use measurement instruments appropriately in clinical practice, the extent to which they are reliable, valid and usable must be established.

  8. Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department

    Directory of Open Access Journals (Sweden)

    Paul Walsh

    2014-11-01

    Full Text Available Objectives. To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so.Study Design and Setting. We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial ‘gestalt’ assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other’s assessment. Our primary analysis was graphical. We also calculated Cohen’s κ, Gwet’s agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement.Results. We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9–14.6, 99/159 (62% were boys and 22/159 (14% were admitted. Overall 118/159 (74% and 119/159 (75% were classified as well appearing on initial ‘gestalt’ impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet’s AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of ‘not ill appearing’ were more reliable than others.Conclusion. The inter-rater reliability of emergency providers’ assessment of overall clinical appearance was adequate when described graphically and by Gwet’s AC. Different summary statistics yield different results for the same dataset.

  9. Interrater reliability and accuracy of clinicians and trained research assistants performing prospective data collection in emergency department patients with potential acute coronary syndrome.

    Science.gov (United States)

    Cruz, Carlos O; Meshberg, Emily B; Shofer, Frances S; McCusker, Christine M; Chang, Anna Marie; Hollander, Judd E

    2009-07-01

    Clinical research requires high-quality data collection. Data collected at the emergency department evaluation is generally considered more precise than data collected through chart abstraction but is cumbersome and time consuming. We test whether trained research assistants without a medical background can obtain clinical research data as accurately as physicians. We hypothesize that they would be at least as accurate because they would not be distracted by clinical requirements. We conducted a prospective comparative study of 33 trained research assistants and 39 physicians (35 residents) to assess interrater reliability with respect to guideline-recommended clinical research data. Immediately after the research assistant and clinician evaluation, the data were compared by a tiebreaker third person who forced the patient to choose one of the 2 answers as the correct one when responses were discordant. Crude percentage agreement and interrater reliability were assessed (kappa statistic). One hundred forty-three patients were recruited (mean age 50.7 years; 47% female patients). Overall, the median agreement was 81% (interquartile range [IQR] 73% to 92%) and interrater reliability was fair (kappa value 0.36 [IQR 0.26 to 0.52]) but varied across categories of data: cardiac risk factors (median 86% [IQR 81% to 93%]; median 0.69 [IQR 0.62 to 0.83]), other cardiac history (median 93% [IQR 79% to 95%]; median 0.56 [IQR 0.29 to 0.77]), pain location (median 92% [IR 86% to 94%]; median 0.37 [IQR 0.25 to 0.29]), radiation (median 86% [IQR 85% to 87%]; median 0.37 [IQR 0.26 to 0.42]), quality (median 85% [IQR 75% to 94%]; median 0.29 [IQR 0.23 to 0.40]), and associated symptoms (median 74% [IQR 65% to 78%]; median 0.28 [IQR 0.20 to 0.40]). When discordant information was obtained, the research assistant was more often correct (median 64% [IQR 53% to 72%]). The relatively fair interrater reliability observed in our study is consistent with previous studies evaluating

  10. Interrater reliability of the Volume-Viscosity Swallow Test; screening for dysphagia among hospitalized elderly medical patients

    DEFF Research Database (Denmark)

    Jørgensen, Lise Walther; Søndergaard, Kasper; Melgaard, Dorte

    2017-01-01

    Background: Oropharyngeal dysphagia (OD) is prevalent among medical and geriatric patients admitted due to acute illness and it is associated with malnutrition, increased length of stay and increased mortality. A valid and reliable bedside screening test for patients at risk of OD is essential...... in order to detect patients in need of further assessment. The Volume-Viscosity Swallow Test (V-VST) has been shown to be a valid screening test for OD in mixed outpatient populations. However, as reliability of the test has yet to be investigated in a population of medical and geriatric patients admitted...... skilled occupational therapists examined an unselected group of 110 patients admitted to geriatric or medical wards. In an overall agreement phase raters reached ≥80% agreement before data collection phase was commenced. The V-VST was applied to patients twice within maximum one hour by raters who...

  11. Interrater and Test-Retest Reliability and Minimal Detectable Change of the Balance Evaluation Systems Test (BESTest) and Subsystems With Community-Dwelling Older Adults.

    Science.gov (United States)

    Wang-Hsu, Elizabeth; Smith, Susan S

    2017-01-10

    Falls are a common cause of injuries and hospital admissions in older adults. Balance limitation is a potentially modifiable factor contributing to falls. The Balance Evaluation Systems Test (BESTest), a clinical balance measure, categorizes balance into 6 underlying subsystems. Each of the subsystems is scored individually and summed to obtain a total score. The reliability of the BESTest and its individual subsystems has been reported in patients with various neurological disorders and cancer survivors. However, the reliability and minimal detectable change (MDC) of the BESTest with community-dwelling older adults have not been reported. The purposes of our study were to (1) determine the interrater and test-retest reliability of the BESTest total and subsystem scores; and (2) estimate the MDC of the BESTest and its individual subsystem scores with community-dwelling older adults. We used a prospective cohort methodological design. Community-dwelling older adults (N = 70; aged 70-94 years; mean = 85.0 [5.5] years) were recruited from a senior independent living community. Trained testers (N = 3) administered the BESTest. All participants were tested with the BESTest by the same tester initially and then retested 7 to 14 days later. With 32 of the participants, a second tester concurrently scored the retest for interrater reliability. Testers were blinded to each other's scores. Intraclass correlation coefficients [ICC(2,1)] were used to determine the interrater and test-retest reliability. Test-retest reliability was also analyzed using method error and the associated coefficients of variation (CVME). MDC was calculated using standard error of measurement. Interrater reliability (N = 32) of the BESTest total score was ICC(2, 1) = 0.97 (95% confidence interval [CI], 0.94-0.99). The ICCs for the individual subsystem scores ranged from 0.85 to 0.94. Test-retest reliability (N = 70) of the BESTest total score was ICC(2,1) = 0.93 (95% CI, 0.89-0.96). ICCs for the

  12. The reliability and validity of the Turkish version of Fullerton Advanced Balance (FAB-T) scale.

    Science.gov (United States)

    Iyigun, Gozde; Kirmizigil, Berkiye; Angin, Ender; Oksuz, Sevim; Can, Filiz; Eker, Levent; Rose, Debra J

    2018-06-04

    The aim of this study was to evaluate the reliability and validity of the Turkish version of the FAB(FAB-T) scale in the older Turkish adults. The reliability and validity of the scale was tested on 200 community-dwelling older adults. FAB-T scale was scored by different physiotherapists on different days to evaluate inter-rater and intrarater reliability. The Berg Balance Scale (BBS) was used for the evaluation of convergent validity, and the content validity of the FAB-T scale was investigated. The FAB-T scale showed very high inter- and intra-rater reliability. For inter-rater agreement, on the individual test items and total score ICC values were 0.92 (95 %CI; 0.90-0.94) and 0.96 (95% CI; 0.95-0.97) respectively. The intra-rater agreement, on the individual test items and total score ICC values were 0.93 (95 %CI; 0.91- 0.95) and 0.96 (95% CI; 0.95- 0.97) respectively. There was a good agreement between the FAB-T and BBS scales. A high correlation was found between the BBS and FAB-T scales [rho = 0.70 (%95 CI; 0.62-0.76)] indicating good convergent validity. Considering the content validity of the FAB-T scale, no floor (floor score: 0%) or ceiling (ceiling score: 6.5%) effect was detected. The FAB-T scale was successfully translated from the original English version (FAB) and demonstrated strong psychometric features. It was found that the FAB-T scale has very high inter-rater and intra-rater reliability. Considering the convergent validity, the scale has high correlation with the BBS. The FAB-T has no floor and ceiling effect. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. The Outdoor MEDIA DOT: The development and inter-rater reliability of a tool designed to measure food and beverage outlets and outdoor advertising.

    Science.gov (United States)

    Poulos, Natalie S; Pasch, Keryn E

    2015-07-01

    Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8-229 per school). Overall inter-rater reliability of the developed tool ranged from 69-89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Rater reliability and construct validity of a mobile application for posture analysis.

    Science.gov (United States)

    Szucs, Kimberly A; Brown, Elena V Donoso

    2018-01-01

    [Purpose] Measurement of posture is important for those with a clinical diagnosis as well as researchers aiming to understand the impact of faulty postures on the development of musculoskeletal disorders. A reliable, cost-effective and low tech posture measure may be beneficial for research and clinical applications. The purpose of this study was to determine rater reliability and construct validity of a posture screening mobile application in healthy young adults. [Subjects and Methods] Pictures of subjects were taken in three standing positions. Two raters independently digitized the static standing posture image twice. The app calculated posture variables, including sagittal and coronal plane translations and angulations. Intra- and inter-rater reliability were calculated using the appropriate ICC models for complete agreement. Construct validity was determined through comparison of known groups using repeated measures ANOVA. [Results] Intra-rater reliability ranged from 0.71 to 0.99. Inter-rater reliability was good to excellent for all translations. ICCs were stronger for translations versus angulations. The construct validity analysis found that the app was able to detect the change in the four variables selected. [Conclusion] The posture mobile application has demonstrated strong rater reliability and preliminary evidence of construct validity. This application may have utility in clinical and research settings.

  15. Intra-Rater, Inter-Rater and Test-Retest Reliability of an Instrumented Timed Up and Go (iTUG Test in Patients with Parkinson's Disease.

    Directory of Open Access Journals (Sweden)

    Rob C van Lummel

    Full Text Available The "Timed Up and Go" (TUG is a widely used measure of physical functioning in older people and in neurological populations, including Parkinson's Disease. When using an inertial sensor measurement system (instrumented TUG [iTUG], the individual components of the iTUG and the trunk kinematics can be measured separately, which may provide relevant additional information.The aim of this study was to determine intra-rater, inter-rater and test-retest reliability of the iTUG in patients with Parkinson's Disease.Twenty eight PD patients, aged 50 years or older, were included. For the iTUG the DynaPort Hybrid (McRoberts, The Hague, The Netherlands was worn at the lower back. The device measured acceleration and angular velocity in three directions at a rate of 100 samples/s. Patients performed the iTUG five times on two consecutive days. Repeated measurements by the same rater on the same day were used to calculate intra-rater reliability. Repeated measurements by different raters on the same day were used to calculate intra-rater and inter-rater reliability. Repeated measurements by the same rater on different days were used to calculate test-retest reliability.Nineteen ICC values (15% were ≥ 0.9 which is considered as excellent reliability. Sixty four ICC values (49% were ≥ 0.70 and < 0.90 which is considered as good reliability. Thirty one ICC values (24% were ≥ 0.50 and < 0.70, indicating moderate reliability. Sixteen ICC values (12% were ≥ 0.30 and < 0.50 indicating poor reliability. Two ICT values (2% were < 0.30 indicating very poor reliability.In conclusion, in patients with Parkinson's disease the intra-rater, inter-rater, and test-retest reliability of the individual components of the instrumented TUG (iTUG was excellent to good for total duration and for turning durations, and good to low for the sub durations and for the kinematics of the SiSt and StSi. The results of this fully automated analysis of instrumented TUG movements

  16. Validity and Reliability Study of the Korean Tinetti Mobility Test for Parkinson's Disease.

    Science.gov (United States)

    Park, Jinse; Koh, Seong-Beom; Kim, Hee Jin; Oh, Eungseok; Kim, Joong-Seok; Yun, Ji Young; Kwon, Do-Young; Kim, Younsoo; Kim, Ji Seon; Kwon, Kyum-Yil; Park, Jeong-Ho; Youn, Jinyoung; Jang, Wooyoung

    2018-01-01

    Postural instability and gait disturbance are the cardinal symptoms associated with falling among patients with Parkinson's disease (PD). The Tinetti mobility test (TMT) is a well-established measurement tool used to predict falls among elderly people. However, the TMT has not been established or widely used among PD patients in Korea. The purpose of this study was to evaluate the reliability and validity of the Korean version of the TMT for PD patients. Twenty-four patients diagnosed with PD were enrolled in this study. For the interrater reliability test, thirteen clinicians scored the TMT after watching a video clip. We also used the test-retest method to determine intrarater reliability. For concurrent validation, the unified Parkinson's disease rating scale, Hoehn and Yahr staging, Berg Balance Scale, Timed-Up and Go test, 10-m walk test, and gait analysis by three-dimensional motion capture were also used. We analyzed receiver operating characteristic curve to predict falling. The interrater reliability and intrarater reliability of the Korean Tinetti balance scale were 0.97 and 0.98, respectively. The interrater reliability and intra-rater reliability of the Korean Tinetti gait scale were 0.94 and 0.96, respectively. The Korean TMT scores were significantly correlated with the other clinical scales and three-dimensional motion capture. The cutoff values for predicting falling were 14 points (balance subscale) and 10 points (gait subscale). We found that the Korean version of the TMT showed excellent validity and reliability for gait and balance and had high sensitivity and specificity for predicting falls among patients with PD.

  17. Validity and Reliability in Social Science Research

    Science.gov (United States)

    Drost, Ellen A.

    2011-01-01

    In this paper, the author aims to provide novice researchers with an understanding of the general problem of validity in social science research and to acquaint them with approaches to developing strong support for the validity of their research. She provides insight into these two important concepts, namely (1) validity; and (2) reliability, and…

  18. The reliability and validity of the Turkish version of the Neuropsychiatric Inventory-Clinician.

    Science.gov (United States)

    Sahin Cankurtaran, Eylem; Danişman, Mustafa; Tutar, Hasan; Ulusoy Kaymak, Semra

    2015-01-01

    The Neuropsychiatric Inventory-Clinician (NPI-C) scale is one of the best-known scales for evaluating the behavioral and psychological symptoms of dementia. This study aimed to assess the reliability and validity of the Turkish version of the NPI-C scale in patients with Alzheimer disease (AD). The NPI-C scale was administered to 125 patients with AD. For reliability, both Cronbach's α and interrater reliability were analyzed. The Behavioral Pathology in Alzheimer's Disease (BEHAVE-AD) scale was applied for validity and, in addition, the Mini Mental State Examination (MMSE), Instrumental Activities of Daily Living (IADL) scale, and Disability Assessment of Dementia (DAD) scale were completed. The Turkish version of the NPI-C scale showed high internal consistency (Cronbach's α = 0.75) and mostly good interrater reliability. Assessments of validity showed that the NPI-C and corresponding BEHAVE-AD domains were found to be significantly correlated, between 0.925 and 0.195. Moreover, the correlations between NPI-C and MMSE were significant for all domains except the dysphoria, anxiety, and elation/euphoria domains. When we conducted a correlation analysis of NPI-C with IADL, all domains were statistically significantly correlated except aggression, anxiety, elation/euphoria, and dysphoria. The Turkish version of the NPI-C scale was found to be a reliable and valid instrument to assess neuropsychiatric symptoms in Turkish elderly subjects with AD.

  19. The Use of Bayesian Networks to Assess the Quality of Evidence from Research Synthesis: 2. Inter-Rater Reliability and Comparison with Standard GRADE Assessment.

    Directory of Open Access Journals (Sweden)

    Alexis Llewellyn

    Full Text Available The grades of recommendation, assessment, development and evaluation (GRADE approach is widely implemented in systematic reviews, health technology assessment and guideline development organisations throughout the world. We have previously reported on the development of the Semi-Automated Quality Assessment Tool (SAQAT, which enables a semi-automated validity assessment based on GRADE criteria. The main advantage to our approach is the potential to improve inter-rater agreement of GRADE assessments particularly when used by less experienced researchers, because such judgements can be complex and challenging to apply without training. This is the first study examining the inter-rater agreement of the SAQAT.We conducted two studies to compare: a the inter-rater agreement of two researchers using the SAQAT independently on 28 meta-analyses and b the inter-rater agreement between a researcher using the SAQAT (who had no experience of using GRADE and an experienced member of the GRADE working group conducting a standard GRADE assessment on 15 meta-analyses.There was substantial agreement between independent researchers using the Quality Assessment Tool for all domains (for example, overall GRADE rating: weighted kappa 0.79; 95% CI 0.65 to 0.93. Comparison between the SAQAT and a standard GRADE assessment suggested that inconsistency was parameterised too conservatively by the SAQAT. Therefore the tool was amended. Following amendment we found fair-to-moderate agreement between the standard GRADE assessment and the SAQAT (for example, overall GRADE rating: weighted kappa 0.35; 95% CI 0.09 to 0.87.Despite a need for further research, the SAQAT may aid consistent application of GRADE, particularly by less experienced researchers.

  20. The Reliability and Predictive Validity of the Stalking Risk Profile.

    Science.gov (United States)

    McEwan, Troy E; Shea, Daniel E; Daffern, Michael; MacKenzie, Rachel D; Ogloff, James R P; Mullen, Paul E

    2018-03-01

    This study assessed the reliability and validity of the Stalking Risk Profile (SRP), a structured measure for assessing stalking risks. The SRP was administered at the point of assessment or retrospectively from file review for 241 adult stalkers (91% male) referred to a community-based forensic mental health service. Interrater reliability was high for stalker type, and moderate-to-substantial for risk judgments and domain scores. Evidence for predictive validity and discrimination between stalking recidivists and nonrecidivists for risk judgments depended on follow-up duration. Discrimination was moderate (area under the curve = 0.66-0.68) and positive and negative predictive values good over the full follow-up period ( Mdn = 170.43 weeks). At 6 months, discrimination was better than chance only for judgments related to stalking of new victims (area under the curve = 0.75); however, high-risk stalkers still reoffended against their original victim(s) 2 to 4 times as often as low-risk stalkers. Implications for the clinical utility and refinement of the SRP are discussed.

  1. Assessing the suitability of written stroke materials: an evaluation of the interrater reliability of the suitability assessment of materials (SAM) checklist.

    Science.gov (United States)

    Hoffmann, Tammy; Ladner, Yvette

    2012-01-01

    Written materials are frequently used to provide education to stroke patients and their carers. However, poor quality materials are a barrier to effective information provision. A quick and reliable method of evaluating material quality is needed. This study evaluated the interrater reliability of the Suitability Assessment of Materials (SAM) checklist in a sample of written stroke education materials. Two independent raters evaluated the materials (n = 25) using the SAM, and ratings were analyzed to reveal total percentage agreements and weighted kappa values for individual items and overall SAM rating. The majority of the individual SAM items had high interrater reliability, with 17 of the 22 items achieving substantial, almost perfect, or perfect weighted kappa value scores. The overall SAM rating achieved a weighted kappa value of 0.60, with a percentage total agreement of 96%. Health care professionals should evaluate the content and design characteristics of written education materials before using them with patients. A tool such as the SAM checklist can be used; however, raters should exercise caution when interpreting results from items with more subjective scoring criteria. Refinements to the scoring criteria for these items are recommended. The value of the SAM is that it can be used to identify specific elements that should be modified before education materials are provided to patients.

  2. The health preoccupation diagnostic interview: inter-rater reliability of a structured interview for diagnostic assessment of DSM-5 somatic symptom disorder and illness anxiety disorder.

    Science.gov (United States)

    Axelsson, Erland; Andersson, Erik; Ljótsson, Brjánn; Wallhed Finn, Daniel; Hedman, Erik

    2016-06-01

    Somatic symptom disorder (SSD) and illness anxiety disorder (IAD) are two new diagnoses introduced in the DSM-5. There is a need for reliable instruments to facilitate the assessment of these disorders. We therefore developed a structured diagnostic interview, the Health Preoccupation Diagnostic Interview (HPDI), which we hypothesized would reliably differentiate between SSD, IAD, and no diagnosis. Persons with clinically significant health anxiety (n = 52) and healthy controls (n = 52) were interviewed using the HPDI. Diagnoses were then compared with those made by an independent assessor, who listened to audio recordings of the interviews. Ratings generally indicated moderate to almost perfect inter-rater agreement, as illustrated by an overall Cohen's κ of .85. Disagreements primarily concerned (a) the severity of somatic symptoms, (b) the differential diagnosis of panic disorder, and (c) SSD specifiers. We conclude that the HPDI can be used to reliably diagnose DSM-5 SSD and IAD.

  3. Reproducibility of tender point examination in chronic low back pain patients as measured by intrarater and inter-rater reliability and agreement

    DEFF Research Database (Denmark)

    Jensen, Ole Kudsk; Callesen, Jacob; Nielsen, Merete Graakjaer

    2013-01-01

    back examination and return-to-work intervention, 43 and 39 patients, respectively (18 women, 46%) entered and completed the study. MAIN OUTCOME MEASURES: The reliability was estimated by the intraclass correlation coefficient (ICC), and agreement was calculated for up to ±3 TPs. Furthermore......, the smallest detectable difference was calculated. RESULTS: TP examination was performed twice by two consultants in rheumatology and rehabilitation at 20 min intervals and repeated 1 week later. Intrarater reliability in the more and less experienced rater was ICC 0.84 (95% CI 0.69 to 0.98) and 0.72 (95% CI 0.......49 to 0.95), respectively. The figures for inter-rater reliability were intermediate between these figures. In more than 70% of the cases, the raters agreed within ±3 TPs in both men and women and between test days. The smallest detectable difference between raters was 5, and for the more and less...

  4. Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales

    Science.gov (United States)

    Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

    2012-01-01

    Introduction Quality assessment of included studies is an important component of systematic reviews. Objective The authors investigated inter-rater and test–retest reliability for quality assessments conducted by inexperienced student raters. Design Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle–Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13–20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. Setting McMaster Integrative Neuroscience Discovery and Study Program. Participants 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. Main outcome measures The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test–retest reliability using ICC(2,1). Results Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI −0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were −0.14 (95% CI −0.28 to 0.00) to 0.39 (95% CI −0.02 to 0.81) for the NOS cohort and −0.20 (95% CI −0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case–control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test–retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were −0.19 (95% CI −0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case–control, the ICC(2

  5. Validity and reliability of the European portuguese version of neuropsychiatric inventory in an institutionalized sample.

    Science.gov (United States)

    Ferreira, Ana Rita; Martins, Sonia; Ribeiro, Orquidea; Fernandes, Lia

    2015-01-01

    Neuropsychiatric symptoms are very common in dementia and have been associated with patient and caregiver distress, increased risk of institutionalization and higher costs of care. In this context, the neuropsychiatric inventory (NPI) is the most widely used comprehensive tool designed to measure neuropsychiatric Symptoms in geriatric patients with dementia. The aim of this study was to present the validity and reliability of the European Portuguese version of NPI. A cross-sectional study was carried out with a convenience sample of institutionalized patients (≥ 50 years old) in three nursing homes in Portugal. All patients were also assessed with mini-mental state examination (MMSE) (cognition), geriatric depression scale (GDS) (depression) and adults and older adults functional assessment inventory (IAFAI) (functionality). NPI was administered to a formal caregiver, usually from the clinical staff. Inter-rater and test-retest reliability were assessed in a subsample of 25 randomly selected subjects. The sample included 166 elderly, with a mean age of 80.9 (standard deviation: 10.2) years. Three out of the NPI behavioral items had negative correlations with MMSE: delusions (rs = -0.177, P = 0.024), disinhibition (rs = -0.174, P = 0.026) and aberrant motor activity (rs = -0.182, P = 0.020). The NPI subsection of depression/dysphoria correlated positively with GDS total score (rs = 0.166, P = 0.038). NPI showed good internal consistency (overall α = 0.766; frequency α = 0.737; severity α = 0.734). The inter-rater reliability was excellent (intraclass correlation coefficient (ICC): 1.00, 95% confidence interval (CI) 1.00 - 1.00), as well as test-retest reliability (ICC: 0.91, 95% CI 0.80 - 0.96). The results found for convergent validity, inter-rater and test-retest reliability, showed that this version appears to be a valid and reliable instrument for evaluation of neuropsychiatric symptoms in institutionalized elderly.

  6. Publishing nutrition research: validity, reliability, and diagnostic test assessment in nutrition-related research.

    Science.gov (United States)

    Gleason, Philip M; Harris, Jeffrey; Sheean, Patricia M; Boushey, Carol J; Bruemmer, Barbara

    2010-03-01

    This is the sixth in a series of monographs on research design and analysis. The purpose of this article is to describe and discuss several concepts related to the measurement of nutrition-related characteristics and outcomes, including validity, reliability, and diagnostic tests. The article reviews the methodologic issues related to capturing the various aspects of a given nutrition measure's reliability, including test-retest, inter-item, and interobserver or inter-rater reliability. Similarly, it covers content validity, indicators of absolute vs relative validity, and internal vs external validity. With respect to diagnostic assessment, the article summarizes the concepts of sensitivity and specificity. The hope is that dietetics practitioners will be able to both use high-quality measures of nutrition concepts in their research and recognize these measures in research completed by others. Copyright 2010 American Dietetic Association. Published by Elsevier Inc. All rights reserved.

  7. Reliable and Valid Assessment of Clinical Bronchoscopy Performance

    DEFF Research Database (Denmark)

    Konge, Lars; Larsen, Klaus Richter; Clementsen, Paul

    2012-01-01

    : The interrater reliability was high, with Cronbach's a = 0.86. Assessment of 3 bronchoscopies by a single rater had a generalizability coefficient of 0.84. The correlation between experience and performance was good (Pearson correlation = 0.76). There were significant differences between the groups for all...

  8. Assessment of apraxia: inter-rater reliability of a new apraxia test, association between apraxia and other cognitive deficits and prevalence of apraxia in a rehabilitation setting.

    Science.gov (United States)

    Zwinkels, Angeliek; Geusgens, Chantal; van de Sande, Peter; Van Heugten, Caroline

    2004-11-01

    To investigate the inter-rater reliability of a new apraxia test. Furthermore to examine the association of apraxia with other neuropsychological impairments and the prevalence of apraxia in a rehabilitation setting on the basis of the new test. Cross-sectional cohort study, involving 100 patients with a first stroke admitted to a rehabilitation centre in the Netherlands. General patient characteristics and stroke-related aspects. Cognitive screening involving apraxia, visuospatial scanning, abstract thinking and reasoning, memory, attention, planning and aphasia. The indices for inter-rater agreement range from excellent to poor. Significant correlations are found between apraxia and visuospatial scanning, memory, attention, planning and aphasia. The patients with apraxia perform significantly worse than the patients without apraxia on memory, the time needed to complete the tests for scanning and attention, and aphasia. The prevalence of apraxia is 25.3% in the total group, 51.3% in the left hemisphere stroke patients and 6.0% in the right hemisphere stroke patients. Patients with and without apraxia do not differ significantly concerning age, gender and type of stroke. The apraxia test has been shown to be a reliable instrument. Apraxia is often associated with aphasia, memory problems and mental slowness. This study shows that on the basis of the apraxia test, the prevalence of apraxia among patients in the rehabilitation centre is high, especially among patients with left hemisphere lesions.

  9. An Investigation of Interrater Reliability for the Rorschach Performance Assessment System (R-PAS) in a Nonpatient U.S. Sample.

    Science.gov (United States)

    Kivisalu, Trisha M; Lewey, Jennifer H; Shaffer, Thomas W; Canfield, Merle L

    2016-01-01

    The Rorschach Performance Assessment System (R-PAS) aims to provide an evidence-based approach to administration, coding, and interpretation of the Rorschach Inkblot Method (RIM). R-PAS analyzes individualized communications given by respondents to each card to code a wide pool of possible variables. Due to the large number of possible codes that can be assigned to these responses, it is important to consider the concordance rates among different assessors. This study investigated interrater reliability for R-PAS protocols. Data were analyzed from a nonpatient convenience sample of 50 participants who were recruited through networking, local marketing, and advertising efforts from January 2013 through October 2014. Blind recoding was used and discrepancies between the initial and blind coders' ratings were analyzed for each variable with SPSS yielding percent agreement and intraclass correlation values. Data for Location, Space, Contents, Synthesis, Vague, Pairs, Form Quality, Populars, Determinants, and Cognitive and Thematic codes are presented. Rates of agreement for 1,168 responses were higher for more simplistic coding (e.g., Location), whereas agreement was lower for more complex codes (e.g., Cognitive and Thematic codes). Overall, concordance rates achieved good to excellent agreement. Results suggest R-PAS is an effective method with high interrater reliability supporting its empirical basis.

  10. Validity and reliability of three definitions of hip osteoarthritis: cross sectional and longitudinal approach.

    Science.gov (United States)

    Reijman, M; Hazes, J M W; Pols, H A P; Bernsen, R M D; Koes, B W; Bierma-Zeinstra, S M A

    2004-11-01

    To compare the reliability and validity in a large open population of three frequently used radiological definitions of hip osteoarthritis (OA): Kellgren and Lawrence grade, minimal joint space (MJS), and Croft grade; and to investigate whether the validity of the three definitions of hip OA is sex dependent. from the Rotterdam study (aged > or= 55 years, n = 3585) were evaluated. The inter-rater reliability was tested in a random set of 148 x rays. The validity was expressed as the ability to identify patients who show clinical symptoms of hip OA (construct validity) and as the ability to predict total hip replacement (THR) at follow up (predictive validity). Inter-rater reliability was similar for the Kellgren and Lawrence grade and MJS (kappa statistics 0.68 and 0.62, respectively) but lower for Croft's grade (kappa statistic, 0.51). The Kellgren and Lawrence grade and MJS showed the strongest associations with clinical symptoms of hip OA. Sex appeared to be an effect modifier for Kellgren and Lawrence and MJS definitions, women showing a stronger association between grading and symptoms than men. However, the sex dependency was attributed to differences in height between women and men. The Kellgren and Lawrence grade showed the highest predictive value for THR at follow up. Based on these findings, Kellgren and Lawrence still appears to be a useful OA definition for epidemiological studies focusing on the presence of hip OA.

  11. [Reliability and validity of warning signs checklist for screening psychological, behavioral and developmental problems of children].

    Science.gov (United States)

    Huang, X N; Zhang, Y; Feng, W W; Wang, H S; Cao, B; Zhang, B; Yang, Y F; Wang, H M; Zheng, Y; Jin, X M; Jia, M X; Zou, X B; Zhao, C X; Robert, J; Jing, Jin

    2017-06-02

    Objective: To evaluate the reliability and validity of warning signs checklist developed by the National Health and Family Planning Commission of the People's Republic of China (NHFPC), so as to determine the screening effectiveness of warning signs on developmental problems of early childhood. Method: Stratified random sampling method was used to assess the reliability and validity of checklist of warning sign and 2 110 children 0 to 6 years of age(1 513 low-risk subjects and 597 high-risk subjects) were recruited from 11 provinces of China. The reliability evaluation for the warning signs included the test-retest reliability and interrater reliability. With the use of Age and Stage Questionnaire (ASQ) and Gesell Development Diagnosis Scale (GESELL) as the criterion scales, criterion validity was assessed by determining the correlation and consistency between the screening results of warning signs and the criterion scales. Result: In terms of the warning signs, the screening positive rates at different ages ranged from 10.8%(21/141) to 26.2%(51/137). The median (interquartile) testing time for each subject was 1(0.6) minute. Both the test-retest reliability and interrater reliability of warning signs reached 0.7 or above, indicating that the stability was good. In terms of validity assessment, there was remarkable consistency between ASQ and warning signs, with the Kappa value of 0.63. With the use of GESELL as criterion, it was determined that the sensitivity of warning signs in children with suspected developmental delay was 82.2%, and the specificity was 77.7%. The overall Youden index was 0.6. Conclusion: The reliability and validity of warning signs checklist for screening early childhood developmental problems have met the basic requirements of psychological screening scales, with the characteristics of short testing time and easy operation. Thus, this warning signs checklist can be used for screening psychological and behavioral problems of early childhood

  12. Using the eating disorder examination in the assessment of bulimia and anorexia: issues of reliability and validity.

    Science.gov (United States)

    Guest, T

    2000-01-01

    The Eating Disorder Examination will be assessed according to its reliability and validity in the assessment of anorexia nervosa and bulimia nervosa. A thorough review of the literature was conducted to judge the reliability and validity of the Eating Disorder Examination and its subscales. The review shows that the EDE and its subscales have good interrater reliability and internal consistency reliability. Similarly, high levels of discriminant validity, construct validity, and treatment validity in the assessment of eating disorders were also found. A summary of each study concerning the various types of reliability and validity will be provided. The EDE is considered to be the "gold standard" by which to identify eating disorders, so this tool used in conjunction with other behavioral measures will be imperative for clinical social work practice.

  13. Intra- and inter-rater reliabilities of measurement of ultrasound imaging for muscle thickness and pennation angle of tibialis anterior muscle in stroke patients.

    Science.gov (United States)

    Cho, Ki Hun; Lee, Hwang Jae; Lee, Wan Hee

    2017-07-01

    Dysfunction of skeletal muscle has been commonly reported in stroke patients. The purpose of this study was to investigate the intra- and inter-rater reliabilities of measurement of ultrasound imaging (USI) for pennation angle (PA) and muscle thickness (MT) of tibialis anterior muscle in stroke patients. Thirty-four stroke patients (19 men) participated in this study. USI was used for measurement of PA and MT of the tibialis anterior muscles at rest and during maximum voluntary contraction (MVC). Two examiners acquired images from all participants during two separate testing sessions, seven days apart. Intra-class correlation coefficients (ICCs), confidence interval (CI), standard error of measurement, minimal detectable change, and Bland-Altman plots were used for estimation of reliability. In the intra-rater reliability between measures, for all variables (PA and MT of the paretic and non-paretic sides of tibialis anterior muscles at rest and during MVC), the ICCs ranged between 0.639 and 0.998 and the CI was within an acceptable range of 0.388-0.999. In inter-rater reliability between examiners for the two tests, for all variables, the ICCs ranged between 0.690 and 0.995 and the CI was within an acceptable range of 0.463-0.997. In addition, significant difference was observed between the paretic and non-paretic sides of the tibialis anterior muscle architecture (p stroke patients. In addition, objective and quantitative measurements of tibialis anterior muscle using USI may provide appropriate management for the walking recovery of stroke patients.

  14. Validity and Reliability of 2 Goniometric Mobile Apps: Device, Application, and Examiner Factors.

    Science.gov (United States)

    Wellmon, Robert H; Gulick, Dawn T; Paterson, Mark L; Gulick, Colleen N

    2016-12-01

    Smartphones are being used in a variety of practice settings to measure joint range of motion (ROM). A number of factors can affect the validity of the measurements generated. However, there are no studies examining smartphone-based goniometer applications focusing on measurement variability and error arising from the electromechanical properties of the device being used. To examine the concurrent validity and interrater reliability of 2 goniometric mobile applications (Goniometer Records, Goniometer Pro), an inclinometer, and a universal goniometer (UG). Nonexperimental, descriptive validation study. University laboratory. 3 physical therapists having an average of 25 y of experience. Three standardized angles (acute, right, obtuse) were constructed to replicate the movement of a hinge joint in the human body. Angular changes were measured and compared across 3 raters who used 3 different devices (UG, inclinometer, and 2 goniometric apps installed on 3 different smartphones: Apple iPhone 5, LG Android, and Samsung SIII Android). Intraclass correlation coefficients (ICCs) and Bland-Altman plots were used to examine interrater reliability and concurrent validity. Interrater reliability for each of the smartphone apps, inclinometer and UG were excellent (ICC = .995-1.000). Concurrent validity was also good (ICC = .998-.999). Based on the Bland-Altman plots, the means of the differences between the devices were low (range = -0.4° to 1.2°). This study identifies the error inherent in measurement that is independent of patient factors and due to the smartphone, the installed apps, and examiner skill. Less than 2° of measurement variability was attributable to those factors alone. The data suggest that 3 smartphones with the 2 installed apps are a viable substitute for using a UG or an inclinometer when measuring angular changes that typically occur when examining ROM and demonstrate the capacity of multiple examiners to accurately use smartphone-based goniometers.

  15. How do cognitively impaired elderly patients define "testament": reliability and validity of the testament definition scale.

    Science.gov (United States)

    Heinik, J; Werner, P; Lin, R

    1999-01-01

    The testament definition scale (TDS) is a specifically designed six-item scale aimed at measuring the respondent's capacity to define "testament." We assessed the reliability and validity of this new short scale in 31 community-dwelling cognitively impaired elderly patients. Interrater reliability for the six items ranged from .87 to .97. The interrater reliability for the total score was .77. Significant correlations were found between the TDS score and the Mini-Mental State Examination (MMSE) and the Cambridge Cognitive Examination scores (r = .71 and .72 respectively, p = .001). Criterion validity yielded significantly different means for subjects with MMSE scores of 24-30 and 0-23: mean 3.9 and 1.6 respectively (t(20) = 4.7, p = .001). Using a cutoff point of 0-2 vs. 3+, 79% of the subjects were correctly classified as severely cognitively impaired, with only 8.3% false positives, and a positive predictive value of 94%. Thus, TDS was found both reliable and valid. This scale, however, is not synonymous with testamentary capacity. The discussion deals with the methodological limitations of this study, and highlights the practical as well as the theoretical relevance of TDS. Future studies are warranted to elucidate the relationships between TDS and existing legal requirements of testamentary capacity.

  16. Reliability and validity of risk analysis

    International Nuclear Information System (INIS)

    Aven, Terje; Heide, Bjornar

    2009-01-01

    In this paper we investigate to what extent risk analysis meets the scientific quality requirements of reliability and validity. We distinguish between two types of approaches within risk analysis, relative frequency-based approaches and Bayesian approaches. The former category includes both traditional statistical inference methods and the so-called probability of frequency approach. Depending on the risk analysis approach, the aim of the analysis is different, the results are presented in different ways and consequently the meaning of the concepts reliability and validity are not the same.

  17. Correcting Fallacies in Validity, Reliability, and Classification

    Science.gov (United States)

    Sijtsma, Klaas

    2009-01-01

    This article reviews three topics from test theory that continue to raise discussion and controversy and capture test theorists' and constructors' interest. The first topic concerns the discussion of the methodology of investigating and establishing construct validity; the second topic concerns reliability and its misuse, alternative definitions…

  18. Content validity and reliability of test of gross motor development in Chilean children

    Directory of Open Access Journals (Sweden)

    Marcelo Cano-Cappellacci

    2015-01-01

    Full Text Available ABSTRACT OBJECTIVE To validate a Spanish version of the Test of Gross Motor Development (TGMD-2 for the Chilean population. METHODS Descriptive, transversal, non-experimental validity and reliability study. Four translators, three experts and 92 Chilean children, from five to 10 years, students from a primary school in Santiago, Chile, have participated. The Committee of Experts has carried out translation, back-translation and revision processes to determine the translinguistic equivalence and content validity of the test, using the content validity index in 2013. In addition, a pilot implementation was achieved to determine test reliability in Spanish, by using the intraclass correlation coefficient and Bland-Altman method. We evaluated whether the results presented significant differences by replacing the bat with a racket, using T-test. RESULTS We obtained a content validity index higher than 0.80 for language clarity and relevance of the TGMD-2 for children. There were significant differences in the object control subtest when comparing the results with bat and racket. The intraclass correlation coefficient for reliability inter-rater, intra-rater and test-retest reliability was greater than 0.80 in all cases. CONCLUSIONS The TGMD-2 has appropriate content validity to be applied in the Chilean population. The reliability of this test is within the appropriate parameters and its use could be recommended in this population after the establishment of normative data, setting a further precedent for the validation in other Latin American countries.

  19. NDE reliability and advanced NDE technology validation

    International Nuclear Information System (INIS)

    Doctor, S.R.; Deffenbaugh, J.D.; Good, M.S.; Green, E.R.; Heasler, P.G.; Hutton, P.H.; Reid, L.D.; Simonen, F.A.; Spanner, J.C.; Vo, T.V.

    1989-01-01

    This paper reports on progress for three programs: (1) evaluation and improvement in nondestructive examination reliability for inservice inspection of light water reactors (LWR) (NDE Reliability Program), (2) field validation acceptance, and training for advanced NDE technology, and (3) evaluation of computer-based NDE techniques and regional support of inspection activities. The NDE Reliability Program objectives are to quantify the reliability of inservice inspection techniques for LWR primary system components through independent research and establish means for obtaining improvements in the reliability of inservice inspections. The areas of significant progress will be described concerning ASME Code activities, re-analysis of the PISC-II data, the equipment interaction matrix study, new inspection criteria, and PISC-III. The objectives of the second program are to develop field procedures for the AE and SAFT-UT techniques, perform field validation testing of these techniques, provide training in the techniques for NRC headquarters and regional staff, and work with the ASME Code for the use of these advanced technologies. The final program's objective is to evaluate the reliability and accuracy of interpretation of results from computer-based ultrasonic inservice inspection systems, and to develop guidelines for NRC staff to monitor and evaluate the effectiveness of inservice inspections conducted on nuclear power reactors. This program started in the last quarter of FY89, and the extent of the program was to prepare a work plan for presentation to and approval from a technical advisory group of NRC staff

  20. Verification, validation, and reliability of predictions

    International Nuclear Information System (INIS)

    Pigford, T.H.; Chambre, P.L.

    1987-04-01

    The objective of predicting long-term performance should be to make reliable determinations of whether the prediction falls within the criteria for acceptable performance. Establishing reliable predictions of long-term performance of a waste repository requires emphasis on valid theories to predict performance. The validation process must establish the validity of the theory, the parameters used in applying the theory, the arithmetic of calculations, and the interpretation of results; but validation of such performance predictions is not possible unless there are clear criteria for acceptable performance. Validation programs should emphasize identification of the substantive issues of prediction that need to be resolved. Examples relevant to waste package performance are predicting the life of waste containers and the time distribution of container failures, establishing the criteria for defining container failure, validating theories for time-dependent waste dissolution that depend on details of the repository environment, and determining the extent of congruent dissolution of radionuclides in the UO 2 matrix of spent fuel. Prediction and validation should go hand in hand and should be done and reviewed frequently, as essential tools for the programs to design and develop repositories. 29 refs

  1. Validity and Inter-Rater Reliability of a Novel Bedside Referral Tool for Spasticity

    Science.gov (United States)

    2018-02-20

    Spasticity, Muscle; Muscular Diseases; Musculoskeletal Disease; Muscle Hypertonia; Muscle Spasticity; Neuromuscular Manifestations; Signs and Symptoms; Nervous System Diseases; Neurologic Manifestations

  2. An initial reliability and validity study of the Interaction, Communication, and Literacy Skills Audit.

    Science.gov (United States)

    El-Choueifati, Nisrine; Purcell, Alison; McCabe, Patricia; Heard, Robert; Munro, Natalie

    2014-06-01

    Early childhood educators (ECEs) have an important role in promoting positive outcomes for children's language and literacy development. This paper reports the development of a new tool, The Interaction Communication and Literacy (ICL) Skills Audit, and pilots its reliability and validity. Intra- and inter-rater reliability was examined by three speech-language pathologists (SLPs). Five skill areas relating to ECE language and literacy practice were rated. The face and content validity of the ICL Skills Audit was examined by expert SLPs (n = 8) and expert ECEs (n = 4) via questionnaire. The overall intra-rater reliability for the ICL Skills Audit was excellent with percentage close agreement (PCA) of 91-94. Inter-rater agreement was PCA 68-80. Expert SLPs and ECEs agreed that the content was comprehensive and practical. Based on this preliminary study, the ICL Skills Audit appears to be a promising tool that can be used by SLPs and ECEs in collaboration to measure the skills of ECEs in the areas of language and literacy support. Future psychometric and outcome research on the revised ICL Skills Audit is warranted.

  3. Evaluation of the Validity and Reliability of the Waterlow Pressure Ulcer Risk Assessment Scale.

    Science.gov (United States)

    Charalambous, Charalambos; Koulori, Agoritsa; Vasilopoulos, Aristidis; Roupa, Zoe

    2018-04-01

    Prevention is the ideal strategy to tackle the problem of pressure ulcers. Pressure ulcer risk assessment scales are one of the most pivotal measures applied to tackle the problem, much criticisms has been developed regarding the validity and reliability of these scales. To investigate the validity and reliability of the Waterlow pressure ulcer risk assessment scale. The methodology used is a narrative literature review, the bibliography was reviewed through Cinahl, Pubmed, EBSCO, Medline and Google scholar, 26 scientific articles where identified. The articles where chosen due to their direct correlation with the objective under study and their scientific relevance. The construct and face validity of the Waterlow appears adequate, but with regards to content validity changes in the category age and gender can be beneficial. The concurrent validity cannot be assessed. The predictive validity of the Waterlow is characterized by high specificity and low sensitivity. The inter-rater reliability has been demonstrated to be inadequate, this may be due to lack of clear definitions within the categories and differentiating level of knowledge between the users. Due to the limitations presented regarding the validity and reliability of the Waterlow pressure ulcer risk assessment scale, the scale should be used in conjunction with clinical assessment to provide optimum results.

  4. Evaluation of the Validity and Reliability of the Waterlow Pressure Ulcer Risk Assessment Scale

    Science.gov (United States)

    Charalambous, Charalambos; Koulori, Agoritsa; Vasilopoulos, Aristidis; Roupa, Zoe

    2018-01-01

    Introduction Prevention is the ideal strategy to tackle the problem of pressure ulcers. Pressure ulcer risk assessment scales are one of the most pivotal measures applied to tackle the problem, much criticisms has been developed regarding the validity and reliability of these scales. Objective To investigate the validity and reliability of the Waterlow pressure ulcer risk assessment scale. Method The methodology used is a narrative literature review, the bibliography was reviewed through Cinahl, Pubmed, EBSCO, Medline and Google scholar, 26 scientific articles where identified. The articles where chosen due to their direct correlation with the objective under study and their scientific relevance. Results The construct and face validity of the Waterlow appears adequate, but with regards to content validity changes in the category age and gender can be beneficial. The concurrent validity cannot be assessed. The predictive validity of the Waterlow is characterized by high specificity and low sensitivity. The inter-rater reliability has been demonstrated to be inadequate, this may be due to lack of clear definitions within the categories and differentiating level of knowledge between the users. Conclusion Due to the limitations presented regarding the validity and reliability of the Waterlow pressure ulcer risk assessment scale, the scale should be used in conjunction with clinical assessment to provide optimum results. PMID:29736104

  5. Reliability and validity of a Chinese version of the Diagnostic Interview for Borderlines-Revised.

    Science.gov (United States)

    Wang, Lanlan; Yuan, Chenmei; Qiu, Jianying; Gunderson, John; Zhang, Min; Jiang, Kaida; Leung, Freedom; Zhong, Jie; Xiao, Zeping

    2014-09-01

    Borderline personality disorder (BPD) is the most studied of the axis II disorders. One of the most widely used diagnostic instruments is the Diagnostic Interview for Borderline Patients-Revised (DIB-R). The aim of this study was to test the reliability and validity of DIB-R for use in the Chinese culture. The reliability and validity of the DIB-R Chinese version were assessed in a sample of 236 outpatients with a probable BPD diagnosis. The Structured Clinical Interview for DSM-IV Personality Disorders (SCID-II) was used as a standard. Test-retest reliability was tested six months later with 20 patients, and inter-rater reliability was tested on 32 patients. The Chinese version of the DIB-R showed good internal global consistency (Cronbach's α of 0.916), good test-retest reliability (Pearson correlation of 0.704), good inter-rater reliability (intra-class correlation coefficient of 0.892 and kappa of 0.861). When compared with the DSM-IV diagnosis as measured by the SCID-II, the DIB-R showed relatively good sensitivity (0.768) and specificity (0.891) at the cutoff of 7, moderate diagnostic convergence (kappa of 0.631), as well as good discriminating validity. The Chinese version of the DIB-R has good psychometric properties, which renders it a valuable method for examining the presence, the severity, and component phenotypes of BPD in Chinese samples. © 2013 Wiley Publishing Asia Pty Ltd.

  6. Relative and Absolute Interrater Reliabilities of a Hand-Held Myotonometer to Quantify Mechanical Muscle Properties in Patients with Acute Stroke in an Inpatient Ward

    Directory of Open Access Journals (Sweden)

    Wai Leung Ambrose Lo

    2017-01-01

    Full Text Available Introduction. The reliability of using MyotonPRO to quantify muscles mechanical properties in a ward setting for the acute stroke population remains unknown. Aims. To investigate the within-session relative and absolute interrater reliability of MyotonPRO. Methods. Mechanical properties of biceps brachii, brachioradialis, rectus femoris, and tibialis anterior were recorded at bedside. Participants were within 1 month of the first occurrence of stroke. Relative reliability was assessed by intraclass correlation coefficient (ICC. Absolute reliability was assessed by standard error of measurement (SEM, SEM%, smallest real difference (SRD, SRD%, and the Bland-Altman 95% limits of agreement. Results. ICCs of all studied muscles ranged between 0.63 and 0.97. The SEM of all muscles ranged within 0.30–0.88 Hz for tone, 0.07–0.19 for decrement, 6.42–20.20 N/m for stiffness, and 0.04–0.07 for creep. The SRD of all muscles ranged within 0.70–2.05 Hz for tone, 0.16–0.45 for decrement, 14.98–47.15 N/m for stiffness, and 0.09–0.17 for creep. Conclusions. MyotonPRO demonstrated acceptable relative and absolute reliability in a ward setting for patients with acute stroke. However, results must be interpreted with caution, due to the varying level of consistency between different muscles, as well as between different parameters within a muscle.

  7. Validity and Reliability of Assessing Body Composition Using a Mobile Application.

    Science.gov (United States)

    Macdonald, Elizabeth Z; Vehrs, Pat R; Fellingham, Gilbert W; Eggett, Dennis; George, James D; Hager, Ronald

    2017-12-01

    The purpose of this study was to determine the validity and reliability of the LeanScreen (LS) mobile application that estimates percent body fat (%BF) using estimates of circumferences from photographs. The %BF of 148 weight-stable adults was estimated once using dual-energy x-ray absorptiometry (DXA). Each of two administrators assessed the %BF of each subject twice using the LS app and manually measured circumferences. A mixed-model ANOVA and Bland-Altman analyses were used to compare the estimates of %BF obtained from each method. Interrater and intrarater reliabilities values were determined using multiple measurements taken by each of the two administrators. The LS app and manually measured circumferences significantly underestimated (P < 0.05) the %BF determined using DXA by an average of -3.26 and -4.82 %BF, respectively. The LS app (6.99 %BF) and manually measured circumferences (6.76 %BF) had large limits of agreement. All interrater and intrarater reliability coefficients of estimates of %BF using the LS app and manually measured circumferences exceeded 0.99. The estimates of %BF from manually measured circumferences and the LS app were highly reliable. However, these field measures are not currently recommended for the assessment of body composition because of significant bias and large limits of agreements.

  8. Measuring Mobility Limitations in Children with Cerebral Palsy: Interrater and Intrarater Reliability of a Mobility Questionnaire (MobQues)

    Science.gov (United States)

    Van Ravesteyn, Nicolien T.; Dallmeijer, Annet J.; Scholtes, Vanessa A.; Roorda, Leo D.; Becher, Jules G.

    2010-01-01

    Aim: The objective of this study was to assess the reliability of a mobility questionnaire (MobQues) that was developed to measure the mobility limitations of children with cerebral palsy (CP) as rated by their parents. A clinical version of the questionnaire, consisting of 47 items (MobQues47), is available, as well as a research version with 28…

  9. The Barthel Index: comparing inter-rater reliability between nurses and doctors in an older adult rehabilitation unit.

    LENUS (Irish Health Repository)

    Hartigan, Irene

    2011-02-01

    To ensure accuracy in recording the Barthel Index (BI) in older people, it is essential to determine who is best placed to administer the index. The aim of this study was to compare doctors\\' and nurses\\' reliability in scoring the BI.

  10. Intra- and interrater reliability of the Chicago Classification of achalasia subtypes in pediatric high-resolution esophageal manometry (HRM) recordings

    NARCIS (Netherlands)

    Singendonk, M. M. J.; Rosen, R.; Oors, J.; Rommel, N.; van Wijk, M. P.; Benninga, M. A.; Nurko, S.; Omari, T. I.

    2017-01-01

    BackgroundSubtyping achalasia by high-resolution manometry (HRM) is clinically relevant as response to therapy and prognosis have shown to vary accordingly. The aim of this study was to assess inter- and intrarater reliability of diagnosing achalasia and achalasia subtyping in children using the

  11. Investigating the reliability and validity of the waterlow risk assessment scale: a literature review.

    LENUS (Irish Health Repository)

    Walsh, Breda

    2012-02-01

    The aim of this review was to examine health literature on the reliability and validity of the Waterlow pressure sore assessment scale. A systematic review of published studies relating to the topic was conducted and literature was examined for its relevancy to the topic under investigation. Findings suggest that despite the availability of over 40 assessment tools, the Waterlow assessment scale is the most frequently used by health care staff. Research suggests that the Waterlow Scale is an unreliable method of assessing individuals at risk of pressure sore development with all studies indicating a poor interrater reliability status. Its validity has also been criticized because of its high-sensitivity but low-specificity levels.

  12. Investigating the reliability and validity of the waterlow risk assessment scale: a literature review.

    LENUS (Irish Health Repository)

    Walsh, Breda

    2011-05-01

    The aim of this review was to examine health literature on the reliability and validity of the Waterlow pressure sore assessment scale. A systematic review of published studies relating to the topic was conducted and literature was examined for its relevancy to the topic under investigation. Findings suggest that despite the availability of over 40 assessment tools, the Waterlow assessment scale is the most frequently used by health care staff. Research suggests that the Waterlow Scale is an unreliable method of assessing individuals at risk of pressure sore development with all studies indicating a poor interrater reliability status. Its validity has also been criticized because of its high-sensitivity but low-specificity levels.

  13. Intra- and inter-rater reliability of the Sollerman hand function test in patients with chronic stroke

    DEFF Research Database (Denmark)

    Brogårdh, Christina; Persson, Ann L; Sjölund, Bengt H

    2007-01-01

    PURPOSE: To examine whether the Sollerman hand function test is reliable in a test-retest situation in patients with chronic stroke. METHOD: Three independent examiners observed each patient at three experimental sessions; two days in week 1 (short-term test-retest) and one day in week 4 (long...... test seems to be a reliable test in patients with chronic stroke, but we recommend that the same examiner evaluates a patient's hand function pre- and post-treatment.......-term test-retest). A total of 24 patients with chronic stroke (mean age; 59.7 years, mean time since stroke onset 29.6 months) participated. The examiners simultaneously assessed the patients' ability to perform 20 subtests. Both ordinal data (generalized kappa) and total sum scores (Spearman's rank...

  14. Validity and reliability of a novel immunosuppressive adverse effects scoring system in renal transplant recipients.

    Science.gov (United States)

    Meaney, Calvin J; Arabi, Ziad; Venuto, Rocco C; Consiglio, Joseph D; Wilding, Gregory E; Tornatore, Kathleen M

    2014-06-12

    After renal transplantation, many patients experience adverse effects from maintenance immunosuppressive drugs. When these adverse effects occur, patient adherence with immunosuppression may be reduced and impact allograft survival. If these adverse effects could be prospectively monitored in an objective manner and possibly prevented, adherence to immunosuppressive regimens could be optimized and allograft survival improved. Prospective, standardized clinical approaches to assess immunosuppressive adverse effects by health care providers are limited. Therefore, we developed and evaluated the application, reliability and validity of a novel adverse effects scoring system in renal transplant recipients receiving calcineurin inhibitor (cyclosporine or tacrolimus) and mycophenolic acid based immunosuppressive therapy. The scoring system included 18 non-renal adverse effects organized into gastrointestinal, central nervous system and aesthetic domains developed by a multidisciplinary physician group. Nephrologists employed this standardized adverse effect evaluation in stable renal transplant patients using physical exam, review of systems, recent laboratory results, and medication adherence assessment during a clinic visit. Stable renal transplant recipients in two clinical studies were evaluated and received immunosuppressive regimens comprised of either cyclosporine or tacrolimus with mycophenolic acid. Face, content, and construct validity were assessed to document these adverse effect evaluations. Inter-rater reliability was determined using the Kappa statistic and intra-class correlation. A total of 58 renal transplant recipients were assessed using the adverse effects scoring system confirming face validity. Nephrologists (subject matter experts) rated the 18 adverse effects as: 3.1 ± 0.75 out of 4 (maximum) regarding clinical importance to verify content validity. The adverse effects scoring system distinguished 1.75-fold increased gastrointestinal adverse

  15. Reliability and validity of the Performance Recorder 1 for measuring isometric knee flexor and extensor strength.

    Science.gov (United States)

    Neil, Sarah E; Myring, Alec; Peeters, Mon Jef; Pirie, Ian; Jacobs, Rachel; Hunt, Michael A; Garland, S Jayne; Campbell, Kristin L

    2013-11-01

    Muscular strength is a key parameter of rehabilitation programs and a strong predictor of functional capacity. Traditional methods to measure strength, such as manual muscle testing (MMT) and hand-held dynamometry (HHD), are limited by the strength and experience of the tester. The Performance Recorder 1 (PR1) is a strength assessment tool attached to resistance training equipment and may be a time- and cost-effective tool to measure strength in clinical practice that overcomes some limitations of MMT and HHD. However, reliability and validity of the PR1 have not been reported. Test-retest and inter-rater reliability was assessed using the PR1 in healthy adults (n  =  15) during isometric knee flexion and extension. Criterion-related validity was assessed through comparison of values obtained from the PR1 and Biodex® isokinetic dynamometer. Test-retest reliability was excellent for peak knee flexion (intra-class correlation coefficient [ICC] of 0.96, 95% CI: 0.85, 0.99) and knee extension (ICC  =  0.96, 95% CI: 0.87, 0.99). Inter-rater reliability was also excellent for peak knee flexion (ICC  =  0.95, 95% CI: 0.85, 0.99) and peak knee extension (ICC  =  0.97, 95% CI: 0.91, 0.99). Validity was moderate for peak knee flexion (ICC  =  0.75, 95% CI: 0.38, 0.92) but poor for peak knee extension (ICC  =  0.37, 95% CI: 0, 0.73). The PR1 provides a reliable measure of isometric knee flexor and extensor strength in healthy adults that could be used in the clinical setting, but absolute values may not be comparable to strength assessment by gold-standard measures.

  16. The reliability, validity, and applicability of an English language version of the Mini-ICF-APP.

    Science.gov (United States)

    Molodynski, Andrew; Linden, Michael; Juckel, George; Yeeles, Ksenija; Anderson, Catriona; Vazquez-Montes, Maria; Burns, Tom

    2013-08-01

    This study aimed at establishing the validity and reliability of an English language version of the Mini-ICF-APP. One hundred and five patients under the care of secondary mental health care services were assessed using the Mini-ICF-APP and several well-established measures of functioning and symptom severity. 47 (45 %) patients were interviewed on two occasions to ascertain test-retest reliability and 50 (48 %) were interviewed by two researchers simultaneously to determine the instrument's inter-rater reliability. Occupational and sick leave status were also recorded to assess construct validity. The Mini-ICF-APP was found to have substantial internal consistency (Chronbach's α 0.869-0.912) and all 13 items correlated highly with the total score. Analysis also showed that the Mini-ICF-APP had good test-retest (ICC 0.832) and inter-rater (ICC 0.886) reliability. No statistically significant association with length of sick leave was found, but the unemployed scored higher on the Mini ICF-APP than those in employment (mean 18.4, SD 9.1 vs. 9.4, SD 6.4, p Mini-ICF-APP correlated highly with the other measures of illness severity and functioning considered in the study. The English version of the Mini-ICF-APP is a reliable and valid measure of disorders of capacity as defined by the International Classification of Functioning. Further work is necessary to establish whether the scale could be divided into sub scales which would allow the instrument to more sensitively measure an individual's specific impairments.

  17. Reliability and Validity of 3 Methods of Assessing Orthopedic Resident Skill in Shoulder Surgery.

    Science.gov (United States)

    Bernard, Johnathan A; Dattilo, Jonathan R; Srikumaran, Uma; Zikria, Bashir A; Jain, Amit; LaPorte, Dawn M

    Traditional measures for evaluating resident surgical technical skills (e.g., case logs) assess operative volume but not level of surgical proficiency. Our goal was to compare the reliability and validity of 3 tools for measuring surgical skill among orthopedic residents when performing 3 open surgical approaches to the shoulder. A total of 23 residents at different stages of their surgical training were tested for technical skill pertaining to 3 shoulder surgical approaches using the following measures: Objective Structured Assessment of Technical Skills (OSATS) checklists, the Global Rating Scale (GRS), and a final pass/fail assessment determined by 3 upper extremity surgeons. Adverse events were recorded. The Cronbach α coefficient was used to assess reliability of the OSATS checklists and GRS scores. Interrater reliability was calculated with intraclass correlation coefficients. Correlations among OSATS checklist scores, GRS scores, and pass/fail assessment were calculated with Spearman ρ. Validity of OSATS checklists was determined using analysis of variance with postgraduate year (PGY) as a between-subjects factor. Significance was set at p shoulder approaches. Checklist scores showed superior interrater reliability compared with GRS and subjective pass/fail measurements. GRS scores were positively correlated across training years. The incidence of adverse events was significantly higher among PGY-1 and PGY-2 residents compared with more experienced residents. OSATS checklists are a valid and reliable assessment of technical skills across 3 surgical shoulder approaches. However, checklist scores do not measure quality of technique. Documenting adverse events is necessary to assess quality of technique and ultimate pass/fail status. Multiple methods of assessing surgical skill should be considered when evaluating orthopedic resident surgical performance. Copyright © 2016 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights

  18. Developing a contributing factor classification scheme for Rasmussen's AcciMap: Reliability and validity evaluation.

    Science.gov (United States)

    Goode, N; Salmon, P M; Taylor, N Z; Lenné, M G; Finch, C F

    2017-10-01

    One factor potentially limiting the uptake of Rasmussen's (1997) Accimap method by practitioners is the lack of a contributing factor classification scheme to guide accident analyses. This article evaluates the intra- and inter-rater reliability and criterion-referenced validity of a classification scheme developed to support the use of Accimap by led outdoor activity (LOA) practitioners. The classification scheme has two levels: the system level describes the actors, artefacts and activity context in terms of 14 codes; the descriptor level breaks the system level codes down into 107 specific contributing factors. The study involved 11 LOA practitioners using the scheme on two separate occasions to code a pre-determined list of contributing factors identified from four incident reports. Criterion-referenced validity was assessed by comparing the codes selected by LOA practitioners to those selected by the method creators. Mean intra-rater reliability scores at the system (M = 83.6%) and descriptor (M = 74%) levels were acceptable. Mean inter-rater reliability scores were not consistently acceptable for both coding attempts at the system level (M T1  = 68.8%; M T2  = 73.9%), and were poor at the descriptor level (M T1  = 58.5%; M T2  = 64.1%). Mean criterion referenced validity scores at the system level were acceptable (M T1  = 73.9%; M T2  = 75.3%). However, they were not consistently acceptable at the descriptor level (M T1  = 67.6%; M T2  = 70.8%). Overall, the results indicate that the classification scheme does not currently satisfy reliability and validity requirements, and that further work is required. The implications for the design and development of contributing factors classification schemes are discussed. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Reliability and Validity of the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A)

    OpenAIRE

    Bervoets, Liene; Van Noten, Caroline; Van Roosbroeck, Sofie; Hansen, Dominique; Van Hoorenbeeck, Kim; Verheyen, Els; Van Hal, Guido; Vankerckhoven, Vanessa

    2014-01-01

    Background This study was designed to validate the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A). Methods After adjustment of the original Canadian PAQ-C and PAQ-A (i.e. translation/back-translation and evaluation by expert committee), content validity of both PAQs was assessed and calculated using item-level (I-CVI) and scale-level (S-CVI) content validity indexes. Inter-item and inter-rater reliability of 196 PAQ-C and 95 PAQ-A filled in by both childre...

  20. Inter-Rater Reliability of Historical Data Collected by Non-Medical Research Assistants and Physicians in Patients with Acute Abdominal Pain

    Directory of Open Access Journals (Sweden)

    Mills, Angela M

    2009-02-01

    Full Text Available OBJECTIVES: In many academic emergency departments (ED, physicians are asked to record clinical data for research that may be time consuming and distracting from patient care. We hypothesized that non-medical research assistants (RAs could obtain historical information from patients with acute abdominal pain as accurately as physicians.METHODS: Prospective comparative study conducted in an academic ED of 29 RAs to 32 resident physicians (RPs to assess inter-rater reliability in obtaining historical information in abdominal pain patients. Historical features were independently recorded on standardized data forms by a RA and RP blinded to each others' answers. Discrepancies were resolved by a third person (RA who asked the patient to state the correct answer on a third questionnaire, constituting the "criterion standard." Inter-rater reliability was assessed using kappa statistics (kappa and percent crude agreement (CrA.RESULTS: Sixty-five patients were enrolled (mean age 43. Of 43 historical variables assessed, the median agreement was moderate (kappa 0.59 [Interquartile range 0.37-0.69]; CrA 85.9% and varied across data categories: initial pain location (kappa 0.61 [0.59-0.73]; CrA 87.7%, current pain location (kappa 0.60 [0.47-0.67]; CrA 82.8%, past medical history (kappa 0.60 [0.48-0.74]; CrA 93.8%, associated symptoms (kappa 0.38 [0.37-0.74]; CrA 87.7%, and aggravating/alleviating factors (kappa 0.09 [-0.01-0.21]; CrA 61.5%. When there was disagreement between the RP and the RA, the RA more often agreed with the criterion standard (64% [55-71%] than the RP (36% [29-45%].CONCLUSION: Non-medical research assistants who focus on clinical research are often more accurate than physicians, who may be distracted by patient care responsibilities, at obtaining historical information from ED patients with abdominal pain.

  1. Validity and reliability of food security measures.

    Science.gov (United States)

    Cafiero, Carlo; Melgar-Quiñonez, Hugo R; Ballard, Terri J; Kepple, Anne W

    2014-12-01

    This paper reviews some of the existing food security indicators, discussing the validity of the underlying concept and the expected reliability of measures under reasonably feasible conditions. The main objective of the paper is to raise awareness on existing trade-offs between different qualities of possible food security measurement tools that must be taken into account when such tools are proposed for practical application, especially for use within an international monitoring framework. The hope is to provide a timely, useful contribution to the process leading to the definition of a food security goal and the associated monitoring framework within the post-2015 Development Agenda. © 2014 New York Academy of Sciences.

  2. Inter-rater Reliability of the Dysphagia Outcome and Severity Scale (DOSS): Effects of Clinical Experience, Audio-Recording and Training.

    Science.gov (United States)

    Zarkada, Angeliki; Regan, Julie

    2017-10-19

    The Dysphagia Outcome and Severity Scale (DOSS) is widely used to measure dysphagia severity based on videofluoroscopy (VFSS). This study investigated inter-rater reliability (IRR) of the DOSS. It also determined the effect of clinical experience, VFSS audio-recording and training on DOSS IRR. A quantitative prospective research design was used. Seventeen speech and language pathologists (SLPs) were recruited from an acute teaching hospital, Dublin (> 3 years' VFSS experience, n = 10) and from a postgraduate dysphagia programme in a university setting (training session on DOSS rating after which DOSS IRR was re-tested. Cohen's kappa co-efficient was used to establish IRR. IRR of the DOSS presented only fair agreement (κ = 0.36, p training (κ = 0.328) was significantly better comparing to post-training (κ = 0.218) (p < 0.05). Findings raise concerns as the DOSS is frequently used in clinical practice to capture dysphagia severity and to monitor changes.

  3. Elder abuse telephone screen reliability and validity.

    Science.gov (United States)

    Buri, Hilary M; Daly, Jeanette M; Jogerst, Gerald J

    2009-01-01

    (a) To identify reliable and valid questions that identify elder abuse, (b) to assess the reliability and validity of extant self-reported elder abuse screens in a high-risk elderly population, and (c) to describe difficulties of completing and interpreting screens in a high-need elderly population. All elders referred to research-trained social workers in a community service agency were asked to participate. Of the 70 elders asked, 49 participated, 44 completed the first questionnaire, and 32 completed the duplicate second questionnaire. A research assistant administered the telephone questionnaires. Twenty-nine (42%) persons were judged abused, 12 (17%) had abuse reported, and 4 (6%) had abuse substantiated. The elder abuse screen instruments were not found to be predictive of assessed abuse or as predictors of reported abuse; the measures tended toward being inversely predictive. Two questions regarding harm and taking of belongings were significantly different for the assessed abused group. In this small group of high-need community-dwelling elders, the screens were not effective in discriminating between abused and nonabused groups. Better instruments are needed to assess for elder abuse.

  4. Reliability and validity of the German version of the Structured Interview of Personality Organization (STIPO)

    Science.gov (United States)

    2013-01-01

    Background The assessment of personality organization and its observable behavioral manifestations, i.e. personality functioning, has a long tradition in psychodynamic psychiatry. Recently, the DSM-5 Levels of Personality Functioning Scale has moved it into the focus of psychiatric diagnostics. Based on Kernberg’s concept of personality organization the Structured Interview of Personality Organization (STIPO) was developed for diagnosing personality functioning. The STIPO covers seven dimensions: (1) identity, (2) object relations, (3) primitive defenses, (4) coping/rigidity, (5) aggression, (6) moral values, and (7) reality testing and perceptual distortions. The English version of the STIPO has previously revealed satisfying psychometric properties. Methods Validity and reliability of the German version of the 100-item instrument have been evaluated in 122 psychiatric patients. All patients were diagnosed according to the Diagnostic and Statistical Manual for Mental Disorders (DSM-IV) and were assessed by means of the STIPO. Moreover, all patients completed eight questionnaires that served as criteria for external validity of the STIPO. Results Interrater reliability varied between intraclass correlations of .89 and 1.0, Crohnbach’s α for the seven dimensions was .69 to .93. All a priori selected questionnaire scales correlated significantly with the corresponding STIPO dimensions. Patients with personality disorder (PD) revealed significantly higher STIPO scores (i.e. worse personality functioning) than patients without PD; patients cluster B PD showed significantly higher STIPO scores than patients with cluster C PD. Conclusions Interrater reliability, Crohnbach’s α, concurrent validity, and differential validity of the STIPO are satisfying. The STIPO represents an appropriate instrument for the assessment of personality functioning in clinical and research settings. PMID:23941404

  5. Reliability, validity and description of timed performance of the Jebsen-Taylor Test in patients with muscular dystrophies.

    Science.gov (United States)

    Artilheiro, Mariana Cunha; Fávero, Francis Meire; Caromano, Fátima Aparecida; Oliveira, Acary de Souza Bulle; Carvas, Nelson; Voos, Mariana Callil; Sá, Cristina Dos Santos Cardoso de

    2017-12-08

    The Jebsen-Taylor Test evaluates upper limb function by measuring timed performance on everyday activities. The test is used to assess and monitor the progression of patients with Parkinson disease, cerebral palsy, stroke and brain injury. To analyze the reliability, internal consistency and validity of the Jebsen-Taylor Test in people with Muscular Dystrophy and to describe and classify upper limb timed performance of people with Muscular Dystrophy. Fifty patients with Muscular Dystrophy were assessed. Non-dominant and dominant upper limb performances on the Jebsen-Taylor Test were filmed. Two raters evaluated timed performance for inter-rater reliability analysis. Test-retest reliability was investigated by using intraclass correlation coefficients. Internal consistency was assessed using the Cronbach alpha. Construct validity was conducted by comparing the Jebsen-Taylor Test with the Performance of Upper Limb. The internal consistency of Jebsen-Taylor Test was good (Cronbach's α=0.98). A very high inter-rater reliability (0.903-0.999), except for writing with an Intraclass correlation coefficient of 0.772-1.000. Strong correlations between the Jebsen-Taylor Test and the Performance of Upper Limb Module were found (rho=-0.712). The Jebsen-Taylor Test is a reliable and valid measure of timed performance for people with Muscular Dystrophy. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.

  6. [Reliability and validity of the standardized Mini Mental State Examination in the diagnosis of mild dementia in Turkish population].

    Science.gov (United States)

    Güngen, Can; Ertan, Turan; Eker, Engin; Yaşar, Resmiye; Engin, Funda

    2002-01-01

    Reliability and validity of the Mini Mental State Examination in differentiating mild dementia from normal controls in Turkish population. The Standardized Mini Mental State Examination (SMMSE) and its instruction were translated into Turkish. A total of 212 subjects with mean age of 77 +/- 6, were recruited for the study. 71 were diagnosed to be demented and 141 were evaluated as normal controls. The scale total score was analysed for discriminant validity using Student's t-test. Sensitivity, specificity, positive and negative predictive values and kappa score were calculated for all of the scores between 18 and 29. Kappa value was calculated for the comparison of the dementia diagnosis between the two investigators using the best cut off score obtained in the analysis above. Statistical analysis revealed that the Turkish version of the SMMSE has high discriminant validity and interrater reliability in the diagnosis of mild dementia. The cut off score 23/24 was found to have the highest sensitivity (0.91), specificity (0.95), positive and negative predictive values (0.90 and 0.95) and kappa score (0.86). Interrater reliability analysis showed high correlation (r:0.99) and kappa value (0.92). The results of this study showed that the Turkish version of the SMMSE has high reliability and validity for the diagnosis of mild dementia in Turkish population.

  7. The reliability and validity of cervical auscultation in the diagnosis of dysphagia: a systematic review.

    Science.gov (United States)

    Lagarde, Marloes L J; Kamalski, Digna M A; van den Engel-Hoek, Lenie

    2016-02-01

    To systematically review the available evidence for the reliability and validity of cervical auscultation in diagnosing the several aspects of dysphagia in adults and children suffering from dysphagia. Medline (PubMed), Embase and the Cochrane Library databases. The systematic review was carried out applying the steps of the PRISMA-statement. The methodological quality of the included studies were evaluated using the Dutch 'Cochrane checklist for diagnostic accuracy studies'. A total of 90 articles were identified through the search strategy, and after applying the inclusion and exclusion criteria, six articles were included in this review. In the six studies, 197 patients were assessed with cervical auscultation. Two of the six articles were considered to be of 'good' quality and three studies were of 'moderate' quality. One article was excluded because of a 'poor' methodological quality. Sensitivity ranges from 23%-94% and specificity ranges from 50%-74%. Inter-rater reliability was 'poor' or 'fair' in all studies. The intra-rater reliability shows a wide variance among speech language therapists. In this systematic review, conflicting evidence is found for the validity of cervical auscultation. The reliability of cervical auscultation is insufficient when used as a stand-alone tool in the diagnosis of dysphagia in adults. There is no available evidence for the validity and reliability of cervical auscultation in children. Cervical auscultation should not be used as a stand-alone instrument to diagnose dysphagia. © The Author(s) 2015.

  8. The risk of bias in systematic reviews tool showed fair reliability and good construct validity.

    Science.gov (United States)

    Bühn, Stefanie; Mathes, Tim; Prengel, Peggy; Wegewitz, Uta; Ostermann, Thomas; Robens, Sibylle; Pieper, Dawid

    2017-11-01

    There is a movement from generic quality checklists toward a more domain-based approach in critical appraisal tools. This study aimed to report on a first experience with the newly developed risk of bias in systematic reviews (ROBIS) tool and compare it with A Measurement Tool to Assess Systematic Reviews (AMSTAR), that is, the most common used tool to assess methodological quality of systematic reviews while assessing validity, reliability, and applicability. Validation study with four reviewers based on 16 systematic reviews in the field of occupational health. Interrater reliability (IRR) of all four raters was highest for domain 2 (Fleiss' kappa κ = 0.56) and lowest for domain 4 (κ = 0.04). For ROBIS, median IRR was κ = 0.52 (range 0.13-0.88) for the experienced pair of raters compared to κ = 0.32 (range 0.12-0.76) for the less experienced pair of raters. The percentage of "yes" scores of each review of ROBIS ratings was strongly correlated with the AMSTAR ratings (r s  = 0.76; P = 0.01). ROBIS has fair reliability and good construct validity to assess the risk of bias in systematic reviews. More validation studies are needed to investigate reliability and applicability, in particular. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Reliability and Validity of Korean Version of Apraxia Screen of TULIA (K-AST).

    Science.gov (United States)

    Kim, Soo Jin; Yang, You-Na; Lee, Jong Won; Lee, Jin-Youn; Jeong, Eunhwa; Kim, Bo-Ram; Lee, Jongmin

    2016-10-01

    To evaluate the reliability and validity of Korean version of AST (K-AST) as a bedside screening test of apraxia in patients with stroke for early and reliable detection. AST was translated into Korean, and the translated version received authorization from the author of AST. The performances of K-AST in 26 patients (21 males, 5 females; mean age 65.42±17.31 years) with stroke (23 ischemic, 3 hemorrhagic) were videotaped. To test the reliability and validity of K-AST, the recorded performances were assessed by two physiatrists and two occupational therapists twice at a 1-week interval. The patient performances at admission in Korean version of Mini-Mental State Examination (K-MMSE), self-care and transfer categories of Functional Independence Measure (FIM), and motor praxis area of Loewenstein Occupational Therapy Cognitive Assessment, the second edition (LOTCA-II) were also evaluated. Scores of motor praxis area of LOTCA-II was used to assess the validity of K-AST. Inter-rater reliabilities were 0.983 (preliable and valid test for bedside screening of apraxia.

  10. Are Validity and Reliability "Relevant" in Qualitative Evaluation Research?

    Science.gov (United States)

    Goodwin, Laura D.; Goodwin, William L.

    1984-01-01

    The views of prominant qualitative methodologists on the appropriateness of validity and reliability estimation for the measurement strategies employed in qualitative evaluations are summarized. A case is made for the relevance of validity and reliability estimation. Definitions of validity and reliability for qualitative measurement are presented…

  11. The Construct Validity and Reliability of an Assessment Tool for Competency in Cochlear Implant Surgery

    Directory of Open Access Journals (Sweden)

    Patorn Piromchai

    2014-01-01

    Full Text Available Introduction. We introduce a rating tool that objectively evaluates the skills of surgical trainees performing cochlear implant surgery. Methods. Seven residents and seven experts performed cochlear implant surgery sessions from mastoidectomy to cochleostomy on a standardized virtual reality temporal bone. A total of twenty-eight assessment videos were recorded and two consultant otolaryngologists evaluated the performance of each participant using these videos. Results. Interrater reliability was calculated using the intraclass correlation coefficient for both the global and checklist components of the assessment instrument. The overall agreement was high. The construct validity of this instrument was strongly supported by the significantly higher scores in the expert group for both components. Conclusion. Our results indicate that the proposed assessment tool for cochlear implant surgery is reliable, accurate, and easy to use. This instrument can thus be used to provide objective feedback on overall and task-specific competency in cochlear implantation.

  12. Validation of Land Cover Products Using Reliability Evaluation Methods

    Directory of Open Access Journals (Sweden)

    Wenzhong Shi

    2015-06-01

    Full Text Available Validation of land cover products is a fundamental task prior to data applications. Current validation schemes and methods are, however, suited only for assessing classification accuracy and disregard the reliability of land cover products. The reliability evaluation of land cover products should be undertaken to provide reliable land cover information. In addition, the lack of high-quality reference data often constrains validation and affects the reliability results of land cover products. This study proposes a validation schema to evaluate the reliability of land cover products, including two methods, namely, result reliability evaluation and process reliability evaluation. Result reliability evaluation computes the reliability of land cover products using seven reliability indicators. Process reliability evaluation analyzes the reliability propagation in the data production process to obtain the reliability of land cover products. Fuzzy fault tree analysis is introduced and improved in the reliability analysis of a data production process. Research results show that the proposed reliability evaluation scheme is reasonable and can be applied to validate land cover products. Through the analysis of the seven indicators of result reliability evaluation, more information on land cover can be obtained for strategic decision-making and planning, compared with traditional accuracy assessment methods. Process reliability evaluation without the need for reference data can facilitate the validation and reflect the change trends of reliabilities to some extent.

  13. Assessment of Lower Limb Muscle Strength and Power Using Hand-Held and Fixed Dynamometry: A Reliability and Validity Study

    Science.gov (United States)

    Perraton, Luke G.; Bower, Kelly J.; Adair, Brooke; Pua, Yong-Hao; Williams, Gavin P.; McGaw, Rebekah

    2015-01-01

    Introduction Hand-held dynamometry (HHD) has never previously been used to examine isometric muscle power. Rate of force development (RFD) is often used for muscle power assessment, however no consensus currently exists on the most appropriate method of calculation. The aim of this study was to examine the reliability of different algorithms for RFD calculation and to examine the intra-rater, inter-rater, and inter-device reliability of HHD as well as the concurrent validity of HHD for the assessment of isometric lower limb muscle strength and power. Methods 30 healthy young adults (age: 23±5yrs, male: 15) were assessed on two sessions. Isometric muscle strength and power were measured using peak force and RFD respectively using two HHDs (Lafayette Model-01165 and Hoggan microFET2) and a criterion-reference KinCom dynamometer. Statistical analysis of reliability and validity comprised intraclass correlation coefficients (ICC), Pearson correlations, concordance correlations, standard error of measurement, and minimal detectable change. Results Comparison of RFD methods revealed that a peak 200ms moving window algorithm provided optimal reliability results. Intra-rater, inter-rater, and inter-device reliability analysis of peak force and RFD revealed mostly good to excellent reliability (coefficients ≥ 0.70) for all muscle groups. Concurrent validity analysis showed moderate to excellent relationships between HHD and fixed dynamometry for the hip and knee (ICCs ≥ 0.70) for both peak force and RFD, with mostly poor to good results shown for the ankle muscles (ICCs = 0.31–0.79). Conclusions Hand-held dynamometry has good to excellent reliability and validity for most measures of isometric lower limb strength and power in a healthy population, particularly for proximal muscle groups. To aid implementation we have created freely available software to extract these variables from data stored on the Lafayette device. Future research should examine the reliability

  14. Monitoring sedation status over time in ICU patients: reliability and validity of the Richmond Agitation-Sedation Scale (RASS).

    Science.gov (United States)

    Ely, E Wesley; Truman, Brenda; Shintani, Ayumi; Thomason, Jason W W; Wheeler, Arthur P; Gordon, Sharon; Francis, Joseph; Speroff, Theodore; Gautam, Shiva; Margolin, Richard; Sessler, Curtis N; Dittus, Robert S; Bernard, Gordon R

    2003-06-11

    Goal-directed delivery of sedative and analgesic medications is recommended as standard care in intensive care units (ICUs) because of the impact these medications have on ventilator weaning and ICU length of stay, but few of the available sedation scales have been appropriately tested for reliability and validity. To test the reliability and validity of the Richmond Agitation-Sedation Scale (RASS). Prospective cohort study. Adult medical and coronary ICUs of a university-based medical center. Thirty-eight medical ICU patients enrolled for reliability testing (46% receiving mechanical ventilation) from July 21, 1999, to September 7, 1999, and an independent cohort of 275 patients receiving mechanical ventilation were enrolled for validity testing from February 1, 2000, to May 3, 2001. Interrater reliability of the RASS, Glasgow Coma Scale (GCS), and Ramsay Scale (RS); validity of the RASS correlated with reference standard ratings, assessments of content of consciousness, GCS scores, doses of sedatives and analgesics, and bispectral electroencephalography. In 290-paired observations by nurses, results of both the RASS and RS demonstrated excellent interrater reliability (weighted kappa, 0.91 and 0.94, respectively), which were both superior to the GCS (weighted kappa, 0.64; P<.001 for both comparisons). Criterion validity was tested in 411-paired observations in the first 96 patients of the validation cohort, in whom the RASS showed significant differences between levels of consciousness (P<.001 for all) and correctly identified fluctuations within patients over time (P<.001). In addition, 5 methods were used to test the construct validity of the RASS, including correlation with an attention screening examination (r = 0.78, P<.001), GCS scores (r = 0.91, P<.001), quantity of different psychoactive medication dosages 8 hours prior to assessment (eg, lorazepam: r = - 0.31, P<.001), successful extubation (P =.07), and bispectral electroencephalography (r = 0.63, P

  15. Toward feasible, valid, and reliable video-based assessments of technical surgical skills in the operating room

    DEFF Research Database (Denmark)

    Aggarwal, R.; Grantcharov, T.; Moorthy, K.

    2008-01-01

    .72). Conclusions: Video-based technical skills evaluation in the operating room is feasible, valid and reliable. Global rating scales hold promise for summative assessment, though further work is necessary to elucidate the value of procedural rating scales Udgivelsesdato: 2008/2......Objective: To determine the feasibility, validity, inter-rater, and intertest reliability of 4 previously published video-based rating scales, for technical skills assessment on a benchmark laparoscopic procedure. Summary Background Data: Assessment of technical skills is crucial...... to the demonstration and maintenance of competent healthcare practitioners. Traditional assessment methods are prone to subjectivity through a lack of proven validity and reliability. Methods: Nineteen surgeons (6 novice and 13 experienced) performed a median of 2 laparoscopic cholecystectomies each (range 1-5) on 53...

  16. [Reliability and validity of the Braden Scale for predicting pressure sore risk].

    Science.gov (United States)

    Boes, C

    2000-12-01

    For more accurate and objective pressure sore risk assessment various risk assessment tools were developed mainly in the USA and Great Britain. The Braden Scale for Predicting Pressure Sore Risk is one such example. By means of a literature analysis of German and English texts referring to the Braden Scale the scientific control criteria reliability and validity will be traced and consequences for application of the scale in Germany will be demonstrated. Analysis of 4 reliability studies shows an exclusive focus on interrater reliability. Further, even though examination of 19 validity studies occurs in many different settings, such examination is limited to the criteria sensitivity and specificity (accuracy). The range of sensitivity and specificity level is 35-100%. The recommended cut off points rank in the field of 10 to 19 points. The studies prove to be not comparable with each other. Furthermore, distortions in these studies can be found which affect accuracy of the scale. The results of the here presented analysis show an insufficient proof for reliability and validity in the American studies. In Germany, the Braden scale has not yet been tested under scientific criteria. Such testing is needed before using the scale in different German settings. During the course of such testing, construction and study procedures of the American studies can be used as a basis as can the problems be identified in the analysis presented below.

  17. The MacArthur Competence Assessment Tool-Criminal Adjudication: Factor structure, interrater reliability, and association with clinician opinion of competence in a forensic inpatient sample.

    Science.gov (United States)

    Wood, Mary E; Anderson, Jaime L; Glassmire, David M

    2017-06-01

    Adjudicative competence is the most frequently referred evaluation in the forensic context, and it is because of this that periodic evaluation of competence assessment instruments is imperative. Among those instruments, the MacArthur Competence Assessment Tool-Criminal Adjudication (MacCAT-CA) has demonstrated adequate psychometric properties suggesting its utility in informing the forensic inquiry. The purpose of the current study was to further investigate the psychometric properties and ultimate utility of subscale scores using archival data from a sample of 103 male and female forensic patients who were hospitalized for competence restoration treatment. Results of the present study suggested adequate internal consistency and good model fit for the factor structure. Interrater reliability was evaluated by comparing the absolute agreement of scores derived from 2 independent research assistants for each of the subscales; 2 of the 3 subscales fell within the acceptable range given established interpretative benchmarks for forensic assessment. Of particular interest was that the Appreciation subscale, while heralding the lowest intraclass correlation coefficient, explained the largest proportion of variance in clinician opinion relative to the other 2 subscales. In other words, the most subjective subscale (as evidenced by the lowest intraclass correlation), explained the largest proportion of variance in ultimate opinion. The authors argue that, although these results are an important consideration in these assessments, they are neither surprising nor entirely problematic when considering the case-specific nature of the inquiries on the subscale, as well as the subjectivity of scoring criteria for each of the Appreciation items. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  18. The Irvine, Beatties, and Bresnahan (IBB) Forelimb Recovery Scale: An Assessment of Reliability and Validity

    Science.gov (United States)

    Irvine, Karen-Amanda; Ferguson, Adam R.; Mitchell, Kathleen D.; Beattie, Stephanie B.; Lin, Amity; Stuck, Ellen D.; Huie, J. Russell; Nielson, Jessica L.; Talbott, Jason F.; Inoue, Tomoo; Beattie, Michael S.; Bresnahan, Jacqueline C.

    2014-01-01

    The IBB scale is a recently developed forelimb scale for the assessment of fine control of the forelimb and digits after cervical spinal cord injury [SCI; (1)]. The present paper describes the assessment of inter-rater reliability and face, concurrent and construct validity of this scale following SCI. It demonstrates that the IBB is a reliable and valid scale that is sensitive to severity of SCI and to recovery over time. In addition, the IBB correlates with other outcome measures and is highly predictive of biological measures of tissue pathology. Multivariate analysis using principal component analysis (PCA) demonstrates that the IBB is highly predictive of the syndromic outcome after SCI (2), and is among the best predictors of bio-behavioral function, based on strong construct validity. Altogether, the data suggest that the IBB, especially in concert with other measures, is a reliable and valid tool for assessing neurological deficits in fine motor control of the distal forelimb, and represents a powerful addition to multivariate outcome batteries aimed at documenting recovery of function after cervical SCI in rats. PMID:25071704

  19. A Structured Clinical Interview for Kleptomania (SCI-K): preliminary validity and reliability testing.

    Science.gov (United States)

    Grant, Jon E; Kim, Suck Won; McCabe, James S

    2006-06-01

    Kleptomania presents difficulties in diagnosis for clinicians. This study aimed to develop and test a DSM-IV-based diagnostic instrument for kleptomania. To assess for current kleptomania the Structured Clinical Interview for Kleptomania (SCI-K) was administered to 112 consecutive subjects requesting psychiatric outpatient treatment for a variety of disorders. Reliability and validity were determined. Classification accuracy was examined using the longitudinal course of illness. The SCI-K demonstrated excellent test-retest (Phi coefficient = 0.956 (95% CI = 0.937, 0.970)) and inter-rater reliability (phi coefficient = 0.718 (95% CI = 0.506, 0.848)) in the diagnosis of kleptomania. Concurrent validity was observed with a self-report measure using DSM-IV kleptomania criteria (phi coefficient = 0.769 (95% CI = 0.653, 0.850)). Discriminant validity was observed with a measure of depression (point biserial coefficient = -0.020 (95% CI = -0.205, 0.166)). The SCI-K demonstrated both high sensitivity and specificity based on longitudinal assessment. The SCI-K demonstrated excellent reliability and validity in diagnosing kleptomania in subjects presenting with various psychiatric problems. These findings require replication in larger groups, including non-psychiatric populations, to examine their generalizability. Copyright (c) 2006 John Wiley & Sons, Ltd.

  20. Validity and reliability of using photography for measuring knee range of motion: a methodological study

    Directory of Open Access Journals (Sweden)

    Adie Sam

    2011-04-01

    Full Text Available Abstract Background The clinimetric properties of knee goniometry are essential to appreciate in light of its extensive use in the orthopaedic and rehabilitative communities. Intra-observer reliability is thought to be satisfactory, but the validity and inter-rater reliability of knee goniometry often demonstrate unacceptable levels of variation. This study tests the validity and reliability of measuring knee range of motion using goniometry and photographic records. Methods Design: Methodology study assessing the validity and reliability of one method ('Marker Method' which uses a skin marker over the greater trochanter and another method ('Line of Femur Method' which requires estimation of the line of femur. Setting: Radiology and orthopaedic departments of two teaching hospitals. Participants: 31 volunteers (13 arthritic and 18 healthy subjects. Knee range of motion was measured radiographically and photographically using a goniometer. Three assessors were assessed for reliability and validity. Main outcomes: Agreement between methods and within raters was assessed using concordance correlation coefficient (CCCs. Agreement between raters was assessed using intra-class correlation coefficients (ICCs. 95% limits of agreement for the mean difference for all paired comparisons were computed. Results Validity (referenced to radiographs: Each method for all 3 raters yielded very high CCCs for flexion (0.975 to 0.988, and moderate to substantial CCCs for extension angles (0.478 to 0.678. The mean differences and 95% limits of agreement were narrower for flexion than they were for extension. Intra-rater reliability: For flexion and extension, very high CCCs were attained for all 3 raters for both methods with slightly greater CCCs seen for flexion (CCCs varied from 0.981 to 0.998. Inter-rater reliability: For both methods, very high ICCs (min to max: 0.891 to 0.995 were obtained for flexion and extension. Slightly higher coefficients were obtained

  1. What to Do With "Moderate" Reliability and Validity Coefficients?

    NARCIS (Netherlands)

    Post, Marcel W

    Clinimetric studies may use criteria for test-retest reliability and convergent validity such that correlation coefficients as low as .40 are supportive of reliability and validity. It can be argued that moderate (.40-.60) correlations should not be interpreted in this way and that reliability

  2. Development of a valid and reliable test to assess trauma radiograph interpretation performance

    International Nuclear Information System (INIS)

    Neep, M.J.; Steffens, T.; Riley, V.; Eastgate, P.; McPhail, S.M.

    2017-01-01

    Objectives: The purpose of this investigation was to develop and examine the preliminary validity and reliability among radiographers of a test to assess trauma radiograph interpretation performance suitable for use among health professionals. Methods: Stage 1 examined 14,159 consecutive appendicular and axial examinations from a hospital emergency department over a 12 month period to quantify a typical anatomical region case-mix of trauma radiographs. A sample of radiographic cases representative of affected anatomical regions was then developed into the Image Interpretation Test (IIT). Stage 2 involved prospective investigations of the IIT's reliability (inter-rater, intra-rater, internal consistency) and validity (concurrent) among 41 radiographers. Results: The IIT included 60 cases. The median (interquartile range) clinical experience of participants was 5 (2–10) years. Case scores were internally consistent (Cronbach's alpha = 0.90). Favourable inter-rater reliability (kappa > 0.70 for 58/60 cases, Intra-class correlation coefficient (ICC) > 0.99 for total score) and intra-rater reliability (kappa > 0.90 for 60/60 cases, ICC > 0.99 for total score) was observed. There was a positive association between radiographers' confidence in image interpretation and IIT score (coefficient = 1.52, r-squared = 0.60, p < 0.001). Conclusions: The IIT developed during this investigation included a selection of radiographic cases consistent with anatomical regions represented in an adult trauma case-mix. This study has also provided foundational preliminary evidence to support the reliability and validity of the IIT among radiographers. The findings suggest that it is possible to assess image interpretation performance of adult trauma radiographs with this test. - Highlights: • Development of an Image Interpretation Test (IIT). • Cases consistent with anatomical regions represented in a typical adult trauma case-mix. • Development of a

  3. Reliability, validity and minimal detectable change of the Mini-BESTest in Greek participants with chronic stroke.

    Science.gov (United States)

    Lampropoulou, Sofia I; Billis, Evdokia; Gedikoglou, Ingrid A; Michailidou, Christina; Nowicky, Alexander V; Skrinou, Dimitra; Michailidi, Fotini; Chandrinou, Danae; Meligkoni, Margarita

    2018-02-23

    This study aimed to investigate the psychometric characteristics of reliability, validity and ability to detect change of a newly developed balance assessment tool, the Mini-BESTest, in Greek patients with stroke. A prospective, observational design study with test-retest measures was conducted. A convenience sample of 21 Greek patients with chronic stroke (14 male, 7 female; age of 63 ± 16 years) was recruited. Two independent examiners administered the scale, for the inter-rater reliability, twice within 10 days for the test-retest reliability. Bland Altman Analysis for repeated measures assessed the absolute reliability and the Standard Error of Measurement (SEM) and the Minimum Detectable Change at 95% confidence interval (MDC 95% ) were established. The Greek Mini-BESTest (Mini-BESTest GR ) was correlated with the Greek Berg Balance Scale (BBS GR ) for assessing the concurrent validity and with the Timed Up and Go (TUG), the Functional Reach Test (FRT) and the Greek Falls Efficacy Scale-International (FES-I GR ) for the convergent validity. The Mini-BESTestGR demonstrated excellent inter-rater reliability (ICC (95%CI) = 0.997 (0.995-0.999, SEM = 0.46) with the scores of two raters within the limits of agreement (mean dif  = -0.143 ± 0.727, p > 0.05) and test-retest reliability (ICC (95%CI) = 0.966 (0.926-0.988), SEM = 1.53). Additionally, the Mini-BESTest GR yielded very strong to moderate correlations with BBS GR (r = 0.924, p reliability and the equally good validity of the Mini-BESTest GR , strongly support its utility in Greek people with chronic stroke. Its ability to identify clinically meaningful changes and falls risk need further investigation.

  4. Test of gross motor development-2 for Filipino children with intellectual disability: validity and reliability.

    Science.gov (United States)

    Capio, Catherine M; Eguia, Kathlynne F; Simons, Johan

    2016-01-01

    This study aimed to examine aspects of validity and reliability of the Test of Gross Motor Development-2 (TGMD-2) in Filipino children with intellectual disability. Content and construct validity were verified, as well as inter-rater and intra-rater reliability. Two paediatric physiotherapists tested 81 children with intellectual disability (mean age = 9.29 ± 2.71 years) on locomotor and object control skills. Analysis of covariance, confirmatory factor analysis and analysis of variance were used to test validity, while Cronbach's alpha, intra-class correlation coefficients (ICC) and Bland-Altman plots were used to examine reliability. Age was a significant predictor of locomotor and object control scores (P = 0.004). The data fit the hypothesised two-factor model with fit indices as follows: χ(2) = 33.525, DF = 34, P = 0.491, χ(2)/DF = 0.986. As hypothesised, gender was a significant predictor for object control skills (P = 0.038). Participants' mean scores were significantly below mastery (locomotor, P intellectual disability.

  5. The reliability and concurrent validity of measurements used to quantify lumbar spine mobility: an analysis of an iphone® application and gravity based inclinometry.

    Science.gov (United States)

    Kolber, Morey J; Pizzini, Matias; Robinson, Ashley; Yanez, Dania; Hanney, William J

    2013-04-01

    PURPOSEAIM: This purpose of this study was to investigate the reliability, minimal detectable change (MDC), and concurrent validity of active spinal mobility measurements using a gravity-based bubble inclinometer and iPhone® application. MATERIALSMETHODS: Two investigators each used a bubble inclinometer and an iPhone® with inclinometer application to measure total thoracolumbo-pelvic flexion, isolated lumbar flexion, total thoracolumbo-pelvic extension, and thoracolumbar lateral flexion in 30 asymptomatic participants using a blinded repeated measures design. The procedures used in this investigation for measuring spinal mobility yielded good intrarater and interrater reliability with Intraclass Correlation Coefficients (ICC) for bubble inclinometry ≥ 0.81 and the iPhone® ≥ 0.80. The MDC90 for the interrater analysis ranged from 4° to 9°. The concurrent validity between bubble inclinometry and the iPhone® application was good with ICC values of ≥ 0.86. The 95% level of agreement indicates that although these measuring instruments are equivalent individual differences of up to 18° may exist when using these devices interchangeably. The bubble inclinometer and iPhone® possess good intrarater and interrater reliability as well as concurrent validity when strict measurement procedures are adhered to. This study provides preliminary evidence to suggest that smart phone applications may offer clinical utility comparable to inclinometry for quantifying spinal mobility. Clinicians should be aware of the potential disagreement when using these devices interchangeably. 2b (Observational study of reliability).

  6. Toward a Common Language for Measuring Patient Mobility in the Hospital: Reliability and Construct Validity of Interprofessional Mobility Measures.

    Science.gov (United States)

    Hoyer, Erik H; Young, Daniel L; Klein, Lisa M; Kreif, Julie; Shumock, Kara; Hiser, Stephanie; Friedman, Michael; Lavezza, Annette; Jette, Alan; Chan, Kitty S; Needham, Dale M

    2018-02-01

    The lack of common language among interprofessional inpatient clinical teams is an important barrier to achieving inpatient mobilization. In The Johns Hopkins Hospital, the Activity Measure for Post-Acute Care (AM-PAC) Inpatient Mobility Short Form (IMSF), also called "6-Clicks," and the Johns Hopkins Highest Level of Mobility (JH-HLM) are part of routine clinical practice. The measurement characteristics of these tools when used by both nurses and physical therapists for interprofessional communication or assessment are unknown. The purposes of this study were to evaluate the reliability and minimal detectable change of AM-PAC IMSF and JH-HLM when completed by nurses and physical therapists and to evaluate the construct validity of both measures when used by nurses. A prospective evaluation of a convenience sample was used. The test-retest reliability and the interrater reliability of AM-PAC IMSF and JH-HLM for inpatients in the neuroscience department (n = 118) of an academic medical center were evaluated. Each participant was independently scored twice by a team of 2 nurses and 1 physical therapist; a total of 4 physical therapists and 8 nurses participated in reliability testing. In a separate inpatient study protocol (n = 69), construct validity was evaluated via an assessment of convergent validity with other measures of function (grip strength, Katz Activities of Daily Living Scale, 2-minute walk test, 5-times sit-to-stand test) used by 5 nurses. The test-retest reliability values (intraclass correlation coefficients) for physical therapists and nurses were 0.91 and 0.97, respectively, for AM-PAC IMSF and 0.94 and 0.95, respectively, for JH-HLM. The interrater reliability values (intraclass correlation coefficients) between physical therapists and nurses were 0.96 for AM-PAC IMSF and 0.99 for JH-HLM. Construct validity (Spearman correlations) ranged from 0.25 between JH-HLM and right-hand grip strength to 0.80 between AM-PAC IMSF and the Katz Activities of

  7. Validity and reliability of a Malay version of the Lawton instrumental activities of daily living scale among the Malay speaking elderly in Malaysia.

    Science.gov (United States)

    Kadar, Masne; Ibrahim, Suhaili; Razaob, Nor Afifi; Chai, Siaw Chui; Harun, Dzalani

    2018-02-01

    The Lawton Instrumental Activities of Daily Living Scale is a tool often used to assess independence among elderly at home. Its suitability to be used with the elderly population in Malaysia has not been validated. This current study aimed to assess the validity and reliability of the Lawton Instrumental Activities of Daily Living Scale - Malay Version to Malay speaking elderly in Malaysia. This study was divided into three phases: (1) translation and linguistic validity involving both forward and backward translations; (2) establishment of face validity and content validity; and (3) establishment of reliability involving inter-rater, test-retest and internal consistency analyses. Data used for these analyses were obtained by interviewing 65 elderly respondents. Percentages of Content Validity Index for 4 criteria were from 88.89 to 100.0. The Cronbach α coefficient for internal consistency was 0.838. Intra-class Correlation Coefficient of inter-rater reliability and test-retest reliability was 0.957 and 0.950 respectively. The result shows that the Lawton Instrumental Activities of Daily Living Scale - Malay Version has excellent reliability and validity for use with the Malay speaking elderly people in Malaysia. This scale could be used by professionals to assess functional ability of elderly who live independently in community. © 2018 Occupational Therapy Australia.

  8. [Quality assurance in coding expertise of hospital cases in the German DRG system. Evaluation of inter-rater reliability in MDK expertise].

    Science.gov (United States)

    Huber, H; Brambrink, M; Funk, R; Rieger, M

    2012-10-01

    The purpose of this study was to evaluate differences in the D-DRG results of a hospital case by 2 independently coding MKD raters. Calculation of the 2-inter-rater reliability was performed by examination of the coding of individual hospital cases. The reasons for the non-agreement of the expert evaluations and suggestions to improve the process are discussed. From the expert evaluation pool of the MDK-WL a random sample of 0.7% of the 57,375 expertises was taken. Distribution equality with the basic total was tested by the χ² test or, respectively, Fisher's exact test. For the total of 402 individual hospital cases, the G-DRG case sums of 2 experts of the MDK were determined independently and the results checked for each individual case for agreement or non-agreement. The corresponding confidence intervals with standard errors were analysed to test if certain major diagnosis categories (MDC) were statistically significantly more affected by differing expertise results than others. In 280 of the total 402 tested hospital cases, the 2 MDK raters independently reached the same G-DRG results; in 122 cases the G-DRG case sums determined by the 2 raters differed (agreement 70%; CI 65.2-74.1). Different DRG results between the 2 experts occurred regularly in the entire MDC spectrum. No MDC chapter in which significant differences between the 2 raters arose could be identified. The results of our study demonstrate an almost 70% agreement in the evaluation of hospital cost accounts by 2 independently operating MDK. This result leaves room for improvement. Optimisation potentials can be recognised on the basis of the results. Potential for improvement was established in combination with regular further training and the expansion of binding internal code recommendations as well as exchange of code-relevant information among experts in internal forums. The presented model is in principle suitable for cross-border examinations within the MDK system with the advantage that

  9. TWO CRITERIA FOR GOOD MEASUREMENTS IN RESEARCH: VALIDITY AND RELIABILITY

    Directory of Open Access Journals (Sweden)

    Haradhan Kumar Mohajan

    2017-12-01

    Full Text Available Reliability and validity are two most important and fundamental features in the evaluation of any measurement instrument or toll for a good research. The purpose of this research is to discuss the validity and reliability of measurement instruments that are used in research. Validity concerns what an instrument measures, and how well it does so. Reliability concerns the faith that one can have in the data obtained from use of an instrument, that is, the degree to which any measuring tool controls for random error. An attempt has been taken here to review the reliability and validity, and threat to them in some details.

  10. Children's Physical Activity While Gardening: Development of a Valid and Reliable Direct Observation Tool.

    Science.gov (United States)

    Myers, Beth M; Wells, Nancy M

    2015-04-01

    Gardens are a promising intervention to promote physical activity (PA) and foster health. However, because of the unique characteristics of gardening, no extant tool can capture PA, postures, and motions that take place in a garden. The Physical Activity Research and Assessment tool for Garden Observation (PARAGON) was developed to assess children's PA levels, tasks, postures, and motions, associations, and interactions while gardening. PARAGON uses momentary time sampling in which a trained observer watches a focal child for 15 seconds and then records behavior for 15 seconds. Sixty-five children (38 girls, 27 boys) at 4 elementary schools in New York State were observed over 8 days. During the observation, children simultaneously wore Actigraph GT3X+ accelerometers. The overall interrater reliability was 88% agreement, and Ebel was .97. Percent agreement values for activity level (93%), garden tasks (93%), motions (80%), associations (95%), and interactions (91%) also met acceptable criteria. Validity was established by previously validated PA codes and by expected convergent validity with accelerometry. PARAGON is a valid and reliable observation tool for assessing children's PA in the context of gardening.

  11. The Communication Function Classification System: cultural adaptation, validity, and reliability of the Farsi version for patients with cerebral palsy.

    Science.gov (United States)

    Soleymani, Zahra; Joveini, Ghodsiye; Baghestani, Ahmad Reza

    2015-03-01

    This study developed a Farsi language Communication Function Classification System and then tested its reliability and validity. Communication Function Classification System is designed to classify the communication functions of individuals with cerebral palsy. Up until now, there has been no instrument for assessment of this communication function in Iran. The English Communication Function Classification System was translated into Farsi and cross-culturally modified by a panel of experts. Professionals and parents then assessed the content validity of the modified version. A backtranslation of the Farsi version was confirmed by the developer of the English Communication Function Classification System. Face validity was assessed by therapists and parents of 10 patients. The Farsi Communication Function Classification System was administered to 152 individuals with cerebral palsy (age, 2 to 18 years; median age, 10 years; mean age, 9.9 years; standard deviation, 4.3 years). Inter-rater reliability was analyzed between parents, occupational therapists, and speech and language pathologists. The test-retest reliability was assessed for 75 patients with a 14 day interval between tests. The inter-rater reliability of the Communication Function Classification System was 0.81 between speech and language pathologists and occupational therapists, 0.74 between parents and occupational therapists, and 0.88 between parents and speech and language pathologists. The test-retest reliability was 0.96 for occupational therapists, 0.98 for speech and language pathologists, and 0.94 for parents. The findings suggest that the Farsi version of Communication Function Classification System is a reliable and valid measure that can be used in clinical settings to assess communication function in patients with cerebral palsy. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Reliability and Concurrent Validity of the International Personality ...

    African Journals Online (AJOL)

    Reliability and Concurrent Validity of the International Personality item Pool (IPIP) Big-five Factor Markers in Nigeria. ... Nigerian Journal of Psychiatry ... Aims: The aim of this study was to assess the internal consistency and concurrent validity ...

  13. [Reliability and Validity of the Behavioral Check List for Preschool Children to Measure Attention Deficit Hyperactivity Behaviors].

    Science.gov (United States)

    Tsuno, Kanami; Yoshimasu, Kouichi; Hayashi, Takashi; Tatsuta, Nozomi; Ito, Yuki; Kamijima, Michihiro; Nakai, Kunihiko

    2018-01-01

    Nowadays, attention deficit hyperactivity (ADH) problems are observed commonly among school-age children. However, questionnaires specific to ADH behaviors among preschool children are very few. The aim of this study was to investigate the reliability and validity of the 25-item Behavioral Check List (BCL), which was developed from interviews of parents with children who were diagnosed as having Attention-deficit/hyperactivity disorder (ADHD) and measures ADH behaviors in preschool age. We recruited 22 teachers from 10 nurseries/kindergartens in Miyagi Prefecture, Japan. A total of 138 preschool children were assessed using the BCL. To investigate inter-rater reliability, two teachers from each facility assess seven to twenty children in their class, and intraclass correlation coefficients (ICCs) were calculated. The teachers additionally answered questions in the 1/5-5 Caregiver-Teacher Report Form (C-TRF) to investigate the criterion validity of the BCL. To investigate structural validity, exploratory factor analysis with promax rotation and confirmatory factor analysis were performed. The internal consistency reliability of the BCL was good (α = 0.92) and correlation analyses also confirmed its excellent criterion validity. Although exploratory factor analysis for the BCL yielded a five-factor model that consisted of a factor structure different from that of the original one, the results were similar to the original six factors. The ICCs of the BCL were 0.38-0.99 and it was not high enough for inter-rater reliability in some facilities. However, there is a possibility to improve it by giving raters adequate explanations when using BCL. The present study showed acceptable levels of reliability and validity of the BCL among Japanese preschool children.

  14. Is the Bayley Scales of Infant and Toddler Developmental Screening Test, Valid and Reliable for Persian Speaking Children?

    Science.gov (United States)

    Soleimani, Farin; Azari, Nadia; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud

    2016-10-01

    Advances in perinatal and neonatal care have substantially improved the survival of at-risk infants over the past two decades. The purpose of this study was to assess the reliability and validity of the Bayley Scales of infant and toddler developmental Screening test in Persian-speaking children. This was a cross-sectional prospective study of 403 children aged 1 - 42-months. The Bayley scales screening instrument, which consists of five domains (cognitive, receptive, and expressive communication and fine and gross motor items), was used to measure infants' and toddlers' development. The psychometric properties examined included the face and content validity of the scale, in addition to cultural and linguistic modifications to the scale and its test-retest and inter-rater reliability. An expert team changed some of the test items relating to cultural and linguistic issues. In almost all the age groups, cultural or linguistic changes were made to items in the communication domains. According to Cronbach's alpha for internal consistency, the reliability of the cognitive scale was r = 0.79, and the reliability of the receptive scale was r = 0.76. The reliability for expressive communication, fine motor, and gross motor scales was r = 0.81, r = 0.80, and r = 0.81, respectively. The construct validity of the tests was confirmed using a factor analysis and comparison of the mean scores of the age groups. The intra- and inter-rater reliabilities of the Bayley Scales were good-to-excellent. The results indicated that the Bayley Scales had a high level of reliability in the present study. Thus, the scale can be used in a Persian population.

  15. The reliability and validity of video analysis for the assessment of the clinical signs of concussion in Australian football.

    Science.gov (United States)

    Makdissi, Michael; Davis, Gavin

    2016-10-01

    The objective of this study was to determine the reliability and validity of identifying clinical signs of concussion using video analysis in Australian football. Prospective cohort study. All impacts and collisions potentially resulting in a concussion were identified during 2012 and 2013 Australian Football League seasons. Consensus definitions were developed for clinical signs associated with concussion. For intra- and inter-rater reliability analysis, two experienced clinicians independently assessed 102 randomly selected videos on two occasions. Sensitivity, specificity, positive and negative predictive values were calculated based on the diagnosis provided by team medical staff. 212 incidents resulting in possible concussion were identified in 414 Australian Football League games. The intra-rater reliability of the video-based identification of signs associated with concussion was good to excellent. Inter-rater reliability was good to excellent for impact seizure, slow to get up, motor incoordination, ragdoll appearance (2 of 4 analyses), clutching at head and facial injury. Inter-rater reliability for loss of responsiveness and blank and vacant look was only fair and did not reach statistical significance. The feature with the highest sensitivity was slow to get up (87%), but this sign had a low specificity (19%). Other video signs had a high specificity but low sensitivity. Blank and vacant look (100%) and motor incoordination (81%) had the highest positive predictive value. Video analysis may be a useful adjunct to the side-line assessment of a possible concussion. Video analysis however should not replace the need for a thorough multimodal clinical assessment. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  16. Reliability and convergent validity of the five-step test in people with chronic stroke.

    Science.gov (United States)

    Ng, Shamay S M; Tse, Mimi M Y; Tam, Eric W C; Lai, Cynthia Y Y

    2018-01-10

    (i) To estimate the intra-rater, inter-rater and test-retest reliabilities of the Five-Step Test (FST), as well as the minimum detectable change in FST completion times in people with stroke. (ii) To estimate the convergent validity of the FST with other measures of stroke-specific impairments. (iii) To identify the best cut-off times for distinguishing FST performance in people with stroke from that of healthy older adults. A cross-sectional study. University-based rehabilitation centre. Forty-eight people with stroke and 39 healthy controls. None. The FST, along with (for the stroke survivors only) scores on the Fugl-Meyer Lower Extremity Assessment (FMA-LE), the Berg Balance Scale (BBS), Limits of Stability (LOS) tests, and Activities-specific Balance Confidence (ABC) scale were tested. The FST showed excellent intra-rater (intra-class correlation coefficient; ICC = 0.866-0.905), inter-rater (ICC = 0.998), and test-retest (ICC = 0.838-0.842) reliabilities. A minimum detectable change of 9.16 s was found for the FST in people with stroke. The FST correlated significantly with the FMA-LE, BBS, and LOS results in the forward and sideways directions (r = -0.411 to -0.716, p people with stroke and healthy older adults. The FST is a reliable, easy-to-administer clinical test for assessing stroke survivors' ability to negotiate steps and stairs.

  17. Reliability and Validity of the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A).

    Science.gov (United States)

    Bervoets, Liene; Van Noten, Caroline; Van Roosbroeck, Sofie; Hansen, Dominique; Van Hoorenbeeck, Kim; Verheyen, Els; Van Hal, Guido; Vankerckhoven, Vanessa

    2014-01-01

    This study was designed to validate the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A). After adjustment of the original Canadian PAQ-C and PAQ-A (i.e. translation/back-translation and evaluation by expert committee), content validity of both PAQs was assessed and calculated using item-level (I-CVI) and scale-level (S-CVI) content validity indexes. Inter-item and inter-rater reliability of 196 PAQ-C and 95 PAQ-A filled in by both children or adolescents and their parent, were evaluated. Inter-item reliability was calculated by Cronbach's alpha (α) and inter-rater reliability was examined by percent observed agreement and weighted kappa (κ). Concurrent validity of PAQ-A was examined in a subsample of 28 obese and 16 normal-weight children by comparing it with concurrently measured physical activity using a maximal cardiopulmonary exercise test for the assessment of peak oxygen uptake (VO2 peak). For both PAQs, I-CVI ranged 0.67-1.00. S-CVI was 0.89 for PAQ-C and 0.90 for PAQ-A. A total of 192 PAQ-C and 94 PAQ-A were fully completed by both child and parent. Cronbach's α was 0.777 for PAQ-C and 0.758 for PAQ-A. Percent agreement ranged 59.9-74.0% for PAQ-C and 51.1-77.7% for PAQ-A, and weighted κ ranged 0.48-0.69 for PAQ-C and 0.51-0.68 for PAQ-A. The correlation between total PAQ-A score and VO2 peak - corrected for age, gender, height and weight - was 0.516 (p = 0.001). Both PAQs have an excellent content validity, an acceptable inter-item reliability and a moderate to good strength of inter-rater agreement. In addition, total PAQ-A score showed a moderate positive correlation with VO2 peak. Both PAQs have an acceptable to good reliability and validity, however, further validity testing is recommended to provide a more complete assessment of both PAQs.

  18. A Turkish Version of the Critical-Care Pain Observation Tool: Reliability and Validity Assessment.

    Science.gov (United States)

    Aktaş, Yeşim Yaman; Karabulut, Neziha

    2017-08-01

    The study aim was to evaluate the validity and reliability of the Critical-Care Pain Observation Tool in critically ill patients. A repeated measures design was used for the study. A convenience sample of 66 patients who had undergone open-heart surgery in the cardiovascular surgery intensive care unit in Ordu, Turkey, was recruited for the study. The patients were evaluated by using the Critical-Care Pain Observation Tool at rest, during a nociceptive procedure (suctioning), and 20 minutes after the procedure while they were conscious and intubated after surgery. The Turkish version of the Critical-Care Pain Observation Tool has shown statistically acceptable levels of validity and reliability. Inter-rater reliability was supported by moderate-to-high-weighted κ coefficients (weighted κ coefficient = 0.55 to 1.00). For concurrent validity, significant associations were found between the scores on the Critical-Care Pain Observation Tool and the Behavioral Pain Scale scores. Discriminant validity was also supported by higher scores during suctioning (a nociceptive procedure) versus non-nociceptive procedures. The internal consistency of the Critical-Care Pain Observation Tool was 0.72 during a nociceptive procedure and 0.71 during a non-nociceptive procedure. The validity and reliability of the Turkish version of the Critical-Care Pain Observation Tool was determined to be acceptable for pain assessment in critical care, especially for patients who cannot communicate verbally. Copyright © 2016 American Society of PeriAnesthesia Nurses. Published by Elsevier Inc. All rights reserved.

  19. A New Tool for Nutrition App Quality Evaluation (AQEL): Development, Validation, and Reliability Testing.

    Science.gov (United States)

    DiFilippo, Kristen Nicole; Huang, Wenhao; Chapman-Novakofski, Karen M

    2017-10-27

    The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps' educational quality and technical functionality. Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no

  20. Reliability and validity of a novel tool to comprehensively assess food and beverage marketing in recreational sport settings.

    Science.gov (United States)

    Prowse, Rachel J L; Naylor, Patti-Jean; Olstad, Dana Lee; Carson, Valerie; Mâsse, Louise C; Storey, Kate; Kirk, Sara F L; Raine, Kim D

    2018-05-31

    Current methods for evaluating food marketing to children often study a single marketing channel or approach. As the World Health Organization urges the removal of unhealthy food marketing in children's settings, methods that comprehensively explore the exposure and power of food marketing within a setting from multiple marketing channels and approaches are needed. The purpose of this study was to test the inter-rater reliability and the validity of a novel settings-based food marketing audit tool. The Food and beverage Marketing Assessment Tool for Settings (FoodMATS) was developed and its psychometric properties evaluated in five public recreation and sport facilities (sites) and subsequently used in 51 sites across Canada for a cross-sectional analysis of food marketing. Raters recorded the count of food marketing occasions, presence of child-targeted and sports-related marketing techniques, and the physical size of marketing occasions. Marketing occasions were classified by healthfulness. Inter-rater reliability was tested using Cohen's kappa (κ) and intra-class correlations (ICC). FoodMATS scores for each site were calculated using an algorithm that represented the theoretical impact of the marketing environment on food preferences, purchases, and consumption. Higher FoodMATS scores represented sites with higher exposure to, and more powerful (unhealthy, child-targeted, sports-related, large) food marketing. Validity of the scoring algorithm was tested through (1) Pearson's correlations between FoodMATS scores and facility sponsorship dollars, and (2) sequential multiple regression for predicting "Least Healthy" food sales from FoodMATS scores. Inter-rater reliability was very good to excellent (κ = 0.88-1.00, p marketing in recreation facilities, the FoodMATS provides a novel means to comprehensively track changes in food marketing environments that can assist in developing and monitoring the impact of policies and interventions.

  1. Validity and Reliability of a Medicine Ball Explosive Power Test.

    Science.gov (United States)

    Stockbrugger, Barry A.; Haennel, Robert G.

    2001-01-01

    Evaluated the validity and reliability of a medicine ball throw test to evaluate explosive power. Data on competitive sand volleyball players who performed a medicine ball throw and a standard countermovement jump indicated that the medicine ball throw test was a valid and reliable way to assess explosive power for an analogous total-body movement…

  2. Validity and Reliability of the Arabic Token Test for Children

    Science.gov (United States)

    Alkhamra, Rana A.; Al-Jazi, Aya B.

    2016-01-01

    Background: The Token Test for Children (2nd edition) (TTFC) is a measure for assessing receptive language. In this study we describe the translation process, validity and reliability of the Arabic Token Test for Children (A-TTFC). Aims: The aim of this study is to translate, validate and establish the reliability of the Arabic Token Test for…

  3. Conceptualizing Essay Tests' Reliability and Validity: From Research to Theory

    Science.gov (United States)

    Badjadi, Nour El Imane

    2013-01-01

    The current paper on writing assessment surveys the literature on the reliability and validity of essay tests. The paper aims to examine the two concepts in relationship with essay testing as well as to provide a snapshot of the current understandings of the reliability and validity of essay tests as drawn in recent research studies. Bearing in…

  4. Construction of Valid and Reliable Test for Assessment of Students

    Science.gov (United States)

    Osadebe, P. U.

    2015-01-01

    The study was carried out to construct a valid and reliable test in Economics for secondary school students. Two research questions were drawn to guide the establishment of validity and reliability for the Economics Achievement Test (EAT). It is a multiple choice objective test of five options with 100 items. A sample of 1000 students was randomly…

  5. The Validity and Reliability of the Mobbing Scale (MS)

    Science.gov (United States)

    Yaman, Erkan

    2009-01-01

    The aim of this research is to develop the Mobbing Scale and examine its validity and reliability. The sample of the study consisted of 515 persons from Sakarya and Bursa. In this study, construct validity, internal consistency, test-retest reliability, and item analysis of the scale were examined. As a result of factor analysis for construct…

  6. Assessing the environmental characteristics of cycling routes to school: a study on the reliability and validity of a Google Street View-based audit.

    Science.gov (United States)

    Vanwolleghem, Griet; Van Dyck, Delfien; Ducheyne, Fabian; De Bourdeaudhuij, Ilse; Cardon, Greet

    2014-06-10

    Google Street View provides a valuable and efficient alternative to observe the physical environment compared to on-site fieldwork. However, studies on the use, reliability and validity of Google Street View in a cycling-to-school context are lacking. We aimed to study the intra-, inter-rater reliability and criterion validity of EGA-Cycling (Environmental Google Street View Based Audit - Cycling to school), a newly developed audit using Google Street View to assess the physical environment along cycling routes to school. Parents (n = 52) of 11-to-12-year old Flemish children, who mostly cycled to school, completed a questionnaire and identified their child's cycling route to school on a street map. Fifty cycling routes of 11-to-12-year olds were identified and physical environmental characteristics along the identified routes were rated with EGA-Cycling (5 subscales; 37 items), based on Google Street View. To assess reliability, two researchers performed the audit. Criterion validity of the audit was examined by comparing the ratings based on Google Street View with ratings through on-site assessments. Intra-rater reliability was high (kappa range 0.47-1.00). Large variations in the inter-rater reliability (kappa range -0.03-1.00) and criterion validity scores (kappa range -0.06-1.00) were reported, with acceptable inter-rater reliability values for 43% of all items and acceptable criterion validity for 54% of all items. EGA-Cycling can be used to assess physical environmental characteristics along cycling routes to school. However, to assess the micro-environment specifically related to cycling, on-site assessments have to be added.

  7. Validity and reliability of a low-cost digital dynamometer for measuring isometric strength of lower limb.

    Science.gov (United States)

    Romero-Franco, Natalia; Jiménez-Reyes, Pedro; Montaño-Munuera, Juan A

    2017-11-01

    Lower limb isometric strength is a key parameter to monitor the training process or recognise muscle weakness and injury risk. However, valid and reliable methods to evaluate it often require high-cost tools. The aim of this study was to analyse the concurrent validity and reliability of a low-cost digital dynamometer for measuring isometric strength in lower limb. Eleven physically active and healthy participants performed maximal isometric strength for: flexion and extension of ankle, flexion and extension of knee, flexion, extension, adduction, abduction, internal and external rotation of hip. Data obtained by the digital dynamometer were compared with the isokinetic dynamometer to examine its concurrent validity. Data obtained by the digital dynamometer from 2 different evaluators and 2 different sessions were compared to examine its inter-rater and intra-rater reliability. Intra-class correlation (ICC) for validity was excellent in every movement (ICC > 0.9). Intra and inter-tester reliability was excellent for all the movements assessed (ICC > 0.75). The low-cost digital dynamometer demonstrated strong concurrent validity and excellent intra and inter-tester reliability for assessing isometric strength in the main lower limb movements.

  8. Safety, reliability, and validity of a physiologic definition of bronchopulmonary dysplasia.

    Science.gov (United States)

    Walsh, Michele C; Wilson-Costello, Deanna; Zadell, Arlene; Newman, Nancy; Fanaroff, Avroy

    2003-09-01

    Bronchopulmonary dysplasia (BPD) is the focus of many intervention trials, yet the outcome measure when based solely on oxygen administration may be confounded by differing criteria for oxygen administration between physicians. Thus, we wished to define BPD by a standardized oxygen saturation monitoring at 36 weeks corrected age, and compare this physiologic definition with the standard clinical definition of BPD based solely on oxygen administration. A total of 199 consecutive very low birthweight infants (VLBW, 501 to 1500 g birthweight) were assessed prospectively at 36+/-1 weeks corrected age. Neonates on positive pressure support or receiving >30% supplemental oxygen were assigned the outcome BPD. Those receiving or =88% for 60 minutes) or "BPD" (saturation reliability, test-retest reliability, and validity of the physiologic definition vs the clinical definition were assessed. A total of 199 VLBW were assessed, of whom 45 (36%) were diagnosed with BPD by the clinical definition of oxygen use at 36 weeks corrected age. The physiologic definition identified 15 infants treated with oxygen who successfully passed the saturation monitoring test in room air. The physiologic definition diagnosed BPD in 30 (24%) of the cohort. All infants were safely studied. The test was highly reliable (inter-rater reliability, kappa=1.0; test-retest reliability, kappa=0.83) and highly correlated with discharge home in oxygen, length of hospital stay, and hospital readmissions in the first year of life. The physiologic definition of BPD is safe, feasible, reliable, and valid and improves the precision of the diagnosis of BPD. This may be of benefit in future multicenter clinical trials.

  9. Development of a Conservative Model Validation Approach for Reliable Analysis

    Science.gov (United States)

    2015-01-01

    CIE 2015 August 2-5, 2015, Boston, Massachusetts, USA [DRAFT] DETC2015-46982 DEVELOPMENT OF A CONSERVATIVE MODEL VALIDATION APPROACH FOR RELIABLE...obtain a conservative simulation model for reliable design even with limited experimental data. Very little research has taken into account the...3, the proposed conservative model validation is briefly compared to the conventional model validation approach. Section 4 describes how to account

  10. Anxiety Disorders Interview Schedule – Autism Addendum: Reliability and Validity in Children with Autism Spectrum Disorder

    Science.gov (United States)

    Kerns, Connor Morrow; Renno, Patricia; Kendall, Philip C.; Wood, Jeffrey J.; Storch, Eric A.

    2017-01-01

    Objective Assessing anxiety in autism spectrum disorder (ASD) is inherently challenging due to overlapping (e.g., social avoidance) and ambiguous symptoms (e.g., fears of change). An ASD addendum to the Anxiety Disorders Interview Schedule–Child/Parent, Parent Version (ADIS/ASA) was developed to provide a systematic approach for differentiating traditional anxiety disorders from symptoms of ASD and more ambiguous, ASD-related anxiety symptoms. Method Inter-rater reliability and convergent and discriminant validity were examined in a sample of 69 youth with ASD (8–13 years, 75% male, IQ:68–143) seeking treatment for anxiety. The parents of participants completed the ADIS/ASA and a battery of behavioral measures. A second rater independently observed and scored recordings of the original interviews. Results Findings suggest reliable measurement of comorbid (ICC=0.85–0.98; κ =0.67–0.91) as well as ambiguous anxiety-like symptoms (ICC=0.87–95, κ=0.77–0.90) in children with ASD. Convergent and discriminant validity were supported for the traditional anxiety symptoms on the ADIS/ASA, whereas convergent and discriminant validity were partially supported for the ambiguous anxiety-like symptoms. Conclusions Results provide evidence for the reliability and validity of the ADIS/ASA as a measure of traditional anxiety categories in youth with ASD, with partial support for the validity of the ambiguous anxiety-like categories. Unlike other measures, the ADIS/ASA differentiates comorbid anxiety disorders from overlapping and ambiguous anxiety-like symptoms in ASD, allowing for more precise measurement and clinical conceptualization. Ambiguous anxiety-like symptoms appear phenomenologically distinct from comorbid anxiety disorders and may reflect either symptoms of ASD or a novel variant of anxiety in ASD. PMID:27925775

  11. Validity and reliability of The Johns Hopkins Adapted Cognitive Exam for critically ill patients.

    Science.gov (United States)

    Lewin, John J; LeDroux, Shannon N; Shermock, Kenneth M; Thompson, Carol B; Goodwin, Haley E; Mirski, Erin A; Gill, Randeep S; Mirski, Marek A

    2012-01-01

    To validate The Johns Hopkins Adapted Cognitive Exam designed to assess and quantify cognition in critically ill patients. Prospective cohort study. Neurosciences, surgical, and medical intensive care units at The Johns Hopkins Hospital. One hundred six adult critically ill patients. One expert neurologic assessment and four measurements of the Adapted Cognitive Exam (all patients). Four measurements of the Folstein Mini-Mental State Examination in nonintubated patients only. Adapted Cognitive Exam and Mini-Mental State Examination were performed by 76 different raters. One hundred six patients were assessed, 46 intubated and 60 nonintubated, resulting in 424 Adapted Cognitive Exam and 240 Mini-Mental State Examination measurements. Criterion validity was assessed by comparing Adapted Cognitive Exam with a neurointensivist's assessment of cognitive status (ρ = 0.83, p validity was assessed by comparing Adapted Cognitive Exam with Mini-Mental State Examination in nonintubated patients (ρ = 0.81, p validity was assessed by surveying raters who used both the Adapted Cognitive Exam and Mini-Mental State Examination and indicated the Adapted Cognitive Exam was an accurate reflection of the patient's cognitive status, more sensitive a marker of cognition than the Mini-Mental State Examination, and easy to use. The Adapted Cognitive Exam demonstrated excellent interrater reliability (intraclass correlation coefficient = 0.997; 95% confidence interval 0.997-0.998) and interitem reliability of each of the five subscales of the Adapted Cognitive Exam and Mini-Mental State Examination (Cronbach's α: range for Adapted Cognitive Exam = 0.83-0.88; range for Mini-Mental State Examination = 0.72-0.81). The Adapted Cognitive Exam is the first valid and reliable examination for the assessment and quantification of cognition in critically ill patients. It provides a useful, objective tool that can be used by any member of the interdisciplinary critical care team to support

  12. Assessing communication skills in dietetic consultations: the development of the reliable and valid DIET-COMMS tool.

    Science.gov (United States)

    Whitehead, K A; Langley-Evans, S C; Tischler, V A; Swift, J A

    2014-04-01

    There is an increasing emphasis on the development of communication skills for dietitians but few evidence-based assessment tools available. The present study aimed to develop a dietetic-specific, short, reliable and valid assessment tool for measuring communication skills in patient consultations: DIET-COMMS. A literature review and feedback from 15 qualified dietitians were used to establish face and content validity during the development of DIET-COMMS. In total, 113 dietetic students and qualified dietitians were video-recorded undertaking mock consultations, assessed using DIET-COMMS by the lead author, and used to establish intra-rater reliability, as well as construct and predictive validity. Twenty recorded consultations were reassessed by nine qualified dietitians to assess inter-rater reliability: eight of these assessors were interviewed to determine user evaluation. Significant improvements in DIET-COMMS scores were achieved as students and qualified staff progressed through their training and gained experience, demonstrating construct validity, and also by qualified staff attending a training course, indicating predictive validity (P skills in practice was questioned. DIET-COMMS is a short, user-friendly, reliable and valid tool for measuring communication skills in patient consultations with both pre- and post-registration dietitians. Additional work is required to develop a training package for assessors and to identify how DIET-COMMS assessment can acceptably be incorporated into practice. © 2013 The British Dietetic Association Ltd.

  13. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs

    Science.gov (United States)

    Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne

    2014-01-01

    This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent–teacher and 19 mother–father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent–teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother–father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings. PMID:24994985

  14. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs

    Directory of Open Access Journals (Sweden)

    Margarita eStolarova

    2014-06-01

    Full Text Available This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire deve-loped for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent-teacher and 19 mother-father pairs collected for two-year-old children (12 bilingual are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC. Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent-teacher ratings of children’s early vocabulary can achieve agreement and correlation comparable to those of mother-father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters’ agreement. We conclude that future reports of agree-ment, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings.

  15. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs.

    Science.gov (United States)

    Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne

    2014-01-01

    This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent-teacher and 19 mother-father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent-teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother-father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings.

  16. Establishment of the reliability and validity of the Stress Index for Children or Adolescents with Tourette Syndrome (SICATS).

    Science.gov (United States)

    Chao, Kuo-Yu; Wang, Huei-Shyong; Chang, Hsueh-Ling; Wang, Yi-Wen; See, Lai-Chu

    2010-02-01

    The aim of this study was to evaluate the validity and reliability of the stress index for 10-18-years-old children or adolescents with Tourette syndrome. Tourette syndrome is a chronic tic disorder, which occurs in childhood. Children with Tourette syndrome exhibit sudden and unexpected voices or movements that may have influence on their daily activities and cause interaction barriers for children with Tourette syndrome. Therefore, a self-report stress index is necessary for children with Tourette syndrome to quickly measure the stress they have. Eight experts rated appropriateness, comprehensiveness and relevance of the questionnaire to establish content validity. A total of 116 paediatric patients filled out the stress index for 10-18-years-old children or adolescents with Tourette syndrome to evaluate its construct validity using exploratory factor analysis and internal consistency. Data from 90 pairs of paediatric patients and their caregivers were used to evaluate the inter-rater reliability. The criterion validity index ranged from 80-98%. One item was deleted because of a small item-to-total correlation. Therefore, 26 items made up the final stress index for 10-18-years-old children or adolescents with Tourette syndrome. In exploratory factor analysis, four factors (unfairly treated, psychological, symptom control and future concern) were achieved and accounted for 52.3% of the total variance. Cronbach's alphas of the stress index for 10-18-years-old children or adolescents with Tourette syndrome were 0.89. The inter-rater reliability of stress Index for 10-18-years-old children or adolescents with Tourette syndrome (Pearson correlation coefficient between patients and their caregivers) was 0.56. The stress Index for 10-18-years-old children or adolescents with Tourette syndrome is a self-administered tool to assess the stress of children or adolescents with Tourette syndrome. Validity (content and construct) and reliability (internal consistency and inter-rater

  17. Reliability and validity of CODA motion analysis system for measuring cervical range of motion in patients with cervical spondylosis and anterior cervical fusion.

    Science.gov (United States)

    Gao, Zhongyang; Song, Hui; Ren, Fenggang; Li, Yuhuan; Wang, Dong; He, Xijing

    2017-12-01

    The aim of the present study was to evaluate the reliability of the Cartesian Optoelectronic Dynamic Anthropometer (CODA) motion system in measuring the cervical range of motion (ROM) and verify the construct validity of the CODA motion system. A total of 26 patients with cervical spondylosis and 22 patients with anterior cervical fusion were enrolled and the CODA motion analysis system was used to measure the three-dimensional cervical ROM. Intra- and inter-rater reliability was assessed by interclass correlation coefficients (ICCs), standard error of measurement (SEm), Limits of Agreements (LOA) and minimal detectable change (MDC). Independent samples t-tests were performed to examine the differences of cervical ROM between cervical spondylosis and anterior cervical fusion patients. The results revealed that in the cervical spondylosis group, the reliability was almost perfect (intra-rater reliability: ICC, 0.87-0.95; LOA, -12.86-13.70; SEm, 2.97-4.58; inter-rater reliability: ICC, 0.84-0.95; LOA, -13.09-13.48; SEm, 3.13-4.32). In the anterior cervical fusion group, the reliability was high (intra-rater reliability: ICC, 0.88-0.97; LOA, -10.65-11.08; SEm, 2.10-3.77; inter-rater reliability: ICC, 0.86-0.96; LOA, -10.91-13.66; SEm, 2.20-4.45). The cervical ROM in the cervical spondylosis group was significantly higher than that in the anterior cervical fusion group in all directions except for left rotation. In conclusion, the CODA motion analysis system is highly reliable in measuring cervical ROM and the construct validity was verified, as the system was sufficiently sensitive to distinguish between the cervical spondylosis and anterior cervical fusion groups based on their ROM.

  18. Validity and reliability of global operative assessment of laparoscopic skills (GOALS) in novice trainees performing a laparoscopic cholecystectomy.

    Science.gov (United States)

    Kramp, Kelvin H; van Det, Marc J; Hoff, Christiaan; Lamme, Bas; Veeger, Nic J G M; Pierie, Jean-Pierre E N

    2015-01-01

    Global Operative Assessment of Laparoscopic Skills (GOALS) assessment has been designed to evaluate skills in laparoscopic surgery. A longitudinal blinded study of randomized video fragments was conducted to estimate the validity and reliability of GOALS in novice trainees. In total, 10 trainees each performed 6 consecutive laparoscopic cholecystectomies. Sixty procedures were recorded on video. Video fragments of (1) opening of the peritoneum; (2) dissection of Calot's triangle and achievement of critical view of safety; and (3) dissection of the gallbladder from the liver bed were blinded, randomized, and rated by 2 consultant surgeons using GOALS. Also, a grade was given for overall competence. The correlation of GOALS with live observation Objective Structured Assessment of Technical Skills (OSATS) scores was calculated. Construct validity was estimated using the Friedman 2-way analysis of variance by ranks and the Wilcoxon signed-rank test. The interrater reliability was calculated using the absolute and consistency agreement 2-way random-effects model intraclass correlation coefficient. A high correlation was found between mean GOALS score (r = 0.879, p = 0.021) and mean OSATS score. The GOALS score increased significantly across the 6 procedures (p = 0.002). The trainees performed significantly better on their sixth when compared with their first cholecystectomy (p = 0.004). The consistency agreement interrater reliability was 0.37 for the mean GOALS score (p = 0.002) and 0.55 for overall competence (p < 0.001) of the 3 video fragments. The validity observed in this randomized blinded longitudinal study supports the existing evidence that GOALS is a valid tool for assessment of novice trainees. A relatively low reliability was found in this study. Copyright © 2014 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.

  19. Reliability and Validity of Qualitative and Operational Research Paradigm

    Directory of Open Access Journals (Sweden)

    Muhammad Bashir

    2008-01-01

    Full Text Available Both qualitative and quantitative paradigms try to find the same result; the truth. Qualitative studies are tools used in understanding and describing the world of human experience. Since we maintain our humanity throughout the research process, it is largely impossible to escape the subjective experience, even for the most experienced of researchers. Reliability and Validity are the issue that has been described in great deal by advocates of quantitative researchers. The validity and the norms of rigor that are applied to quantitative research are not entirely applicable to qualitative research. Validity in qualitative research means the extent to which the data is plausible, credible and trustworthy; and thus can be defended when challenged. Reliability and validity remain appropriate concepts for attaining rigor in qualitative research. Qualitative researchers have to salvage responsibility for reliability and validity by implementing verification strategies integral and self-correcting during the conduct of inquiry itself. This ensures the attainment of rigor using strategies inherent within each qualitative design, and moves the responsibility for incorporating and maintaining reliability and validity from external reviewers’ judgments to the investigators themselves. There have different opinions on validity with some suggesting that the concepts of validity is incompatible with qualitative research and should be abandoned while others argue efforts should be made to ensure validity so as to lend credibility to the results. This paper is an attempt to clarify the meaning and use of reliability and validity in the qualitative research paradigm.

  20. Assessment of the severity of dementia: validity and reliability of the Chinese (Cantonese) version of the Hierarchic Dementia Scale (CV-HDS).

    Science.gov (United States)

    Poon, Vickie Wan-kei; Lam, Linda Chiu-wa; Wong, Samuel Yeung-shan

    2008-09-01

    With the rapid growth of the older population, early detection of cognitive deficits is crucial in slowing down functional deterioration of the elderly persons. To examine the validity and reliability of the Chinese (Cantonese) version of the Hierarchic Dementia Scale (CV-HDS) for Chinese older persons in Hong Kong. The HDS was translated into Cantonese Chinese. The content and cultural validity were evaluated by six expert panel members. Sixty-two participants with diagnosis of dementia were recruited for evaluation. Inter-rater reliability, test-retest reliability, internal consistency and concurrent validity were examined. The CV-HDS demonstrated satisfactory psychometric properties. inter-rater reliability and test-retest reliability were high (alpha=0.89 and alpha=0.94 respectively). High value of Cronbach's alpha (alpha=0.94) demonstrated good internal consistency. The concurrent validity of CV-HDS, through correlation with its scores with that of the Chinese version of Mini Mental Status Examination, was established (ranged from r=0.58 to r=0.78, pCantonese speaking Chinese people with dementia. It facilitates treatment planning to optimize the effects of functional training and rehabilitation.

  1. Automated bony region identification using artificial neural networks: reliability and validation measurements

    Energy Technology Data Exchange (ETDEWEB)

    Gassman, Esther E.; Kallemeyn, Nicole A.; DeVries, Nicole A.; Shivanna, Kiran H. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); The University of Iowa, Center for Computer-Aided Design, Iowa City, IA (United States); Powell, Stephanie M. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); University of Iowa Hospitals and Clinics, The University of Iowa, Department of Radiology, Iowa City, IA (United States); Magnotta, Vincent A. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); The University of Iowa, Center for Computer-Aided Design, Iowa City, IA (United States); University of Iowa Hospitals and Clinics, The University of Iowa, Department of Radiology, Iowa City, IA (United States); Ramme, Austin J. [University of Iowa Hospitals and Clinics, The University of Iowa, Department of Radiology, Iowa City, IA (United States); Adams, Brian D. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); University of Iowa Hospitals and Clinics, The University of Iowa, Department of Orthopaedics and Rehabilitation, Iowa City, IA (United States); Grosland, Nicole M. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); University of Iowa Hospitals and Clinics, The University of Iowa, Department of Orthopaedics and Rehabilitation, Iowa City, IA (United States); The University of Iowa, Center for Computer-Aided Design, Iowa City, IA (United States)

    2008-04-15

    The objective was to develop tools for automating the identification of bony structures, to assess the reliability of this technique against manual raters, and to validate the resulting regions of interest against physical surface scans obtained from the same specimen. Artificial intelligence-based algorithms have been used for image segmentation, specifically artificial neural networks (ANNs). For this study, an ANN was created and trained to identify the phalanges of the human hand. The relative overlap between the ANN and a manual tracer was 0.87, 0.82, and 0.76, for the proximal, middle, and distal index phalanx bones respectively. Compared with the physical surface scans, the ANN-generated surface representations differed on average by 0.35 mm, 0.29 mm, and 0.40 mm for the proximal, middle, and distal phalanges respectively. Furthermore, the ANN proved to segment the structures in less than one-tenth of the time required by a manual rater. The ANN has proven to be a reliable and valid means of segmenting the phalanx bones from CT images. Employing automated methods such as the ANN for segmentation, eliminates the likelihood of rater drift and inter-rater variability. Automated methods also decrease the amount of time and manual effort required to extract the data of interest, thereby making the feasibility of patient-specific modeling a reality. (orig.)

  2. Automated bony region identification using artificial neural networks: reliability and validation measurements

    International Nuclear Information System (INIS)

    Gassman, Esther E.; Kallemeyn, Nicole A.; DeVries, Nicole A.; Shivanna, Kiran H.; Powell, Stephanie M.; Magnotta, Vincent A.; Ramme, Austin J.; Adams, Brian D.; Grosland, Nicole M.

    2008-01-01

    The objective was to develop tools for automating the identification of bony structures, to assess the reliability of this technique against manual raters, and to validate the resulting regions of interest against physical surface scans obtained from the same specimen. Artificial intelligence-based algorithms have been used for image segmentation, specifically artificial neural networks (ANNs). For this study, an ANN was created and trained to identify the phalanges of the human hand. The relative overlap between the ANN and a manual tracer was 0.87, 0.82, and 0.76, for the proximal, middle, and distal index phalanx bones respectively. Compared with the physical surface scans, the ANN-generated surface representations differed on average by 0.35 mm, 0.29 mm, and 0.40 mm for the proximal, middle, and distal phalanges respectively. Furthermore, the ANN proved to segment the structures in less than one-tenth of the time required by a manual rater. The ANN has proven to be a reliable and valid means of segmenting the phalanx bones from CT images. Employing automated methods such as the ANN for segmentation, eliminates the likelihood of rater drift and inter-rater variability. Automated methods also decrease the amount of time and manual effort required to extract the data of interest, thereby making the feasibility of patient-specific modeling a reality. (orig.)

  3. Reliability and validity of the photogrammetry for scoliosis evaluation: a cross-sectional prospective study.

    Science.gov (United States)

    Saad, Karen Ruggeri; Colombo, Alexandra S; João, Silvia M Amado

    2009-01-01

    The purpose of this study was to investigate the reliability and validity of photogrammetry in measuring the lateral spinal inclination angles. Forty subjects (32 female and 8 males) with a mean age of 23.4 +/- 11.2 years had their scoliosis evaluated by radiographs of their trunk, determined by the Cobb angle method, and by photogrammetry. The statistical methods used included Cronbach alpha, Pearson/Spearman correlation coefficients, and regression analyses. The Cronbach alpha values showed that the photogrammetric measures showed high internal consistency, which indicated that the sample was bias free. The radiograph method showed to be more precise with intrarater reliabilities of 0.936, 0.975, and 0.945 for the thoracic, lumbar, and thoracolumbar curves, respectively, and interrater reliabilities of 0.942 and 0.879 for the angular measures of the thoracic and thoracolumbar segments, respectively. The regression analyses revealed a high determination coefficient although limited to the adjusted linear model between the radiographic and photographic measures. It was found that with more severe scoliosis, the lateral curve measures obtained with the photogrammetry were for the thoracic and lumbar regions (R = 0.619 and 0.551). The photogrammetric measures were found to be reproducible in this study and could be used as supplementary information to decrease the number of radiographs necessary for the monitoring of scoliosis.

  4. Validity and Reliability of Field-Based Measures for Assessing Movement Skill Competency in Lifelong Physical Activities: A Systematic Review.

    Science.gov (United States)

    Hulteen, Ryan M; Lander, Natalie J; Morgan, Philip J; Barnett, Lisa M; Robertson, Samuel J; Lubans, David R

    2015-10-01

    It has been suggested that young people should develop competence in a variety of 'lifelong physical activities' to ensure that they can be active across the lifespan. The primary aim of this systematic review is to report the methodological properties, validity, reliability, and test duration of field-based measures that assess movement skill competency in lifelong physical activities. A secondary aim was to clearly define those characteristics unique to lifelong physical activities. A search of four electronic databases (Scopus, SPORTDiscus, ProQuest, and PubMed) was conducted between June 2014 and April 2015 with no date restrictions. Studies addressing the validity and/or reliability of lifelong physical activity tests were reviewed. Included articles were required to assess lifelong physical activities using process-oriented measures, as well as report either one type of validity or reliability. Assessment criteria for methodological quality were adapted from a checklist used in a previous review of sport skill outcome assessments. Movement skill assessments for eight different lifelong physical activities (badminton, cycling, dance, golf, racquetball, resistance training, swimming, and tennis) in 17 studies were identified for inclusion. Methodological quality, validity, reliability, and test duration (time to assess a single participant), for each article were assessed. Moderate to excellent reliability results were found in 16 of 17 studies, with 71% reporting inter-rater reliability and 41% reporting intra-rater reliability. Only four studies in this review reported test-retest reliability. Ten studies reported validity results; content validity was cited in 41% of these studies. Construct validity was reported in 24% of studies, while criterion validity was only reported in 12% of studies. Numerous assessments for lifelong physical activities may exist, yet only assessments for eight lifelong physical activities were included in this review

  5. Assessment of teacher competence using video portfolios: reliability, construct validity and consequential validity

    NARCIS (Netherlands)

    Admiraal, W.; Hoeksma, M.; van de Kamp, M.-T.; van Duin, G.

    2011-01-01

    The richness and complexity of video portfolios endanger both the reliability and validity of the assessment of teacher competencies. In a post-graduate teacher education program, the assessment of video portfolios was evaluated for its reliability, construct validity, and consequential validity.

  6. Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire

    Science.gov (United States)

    Akmaz, Hazel Ekin; Uyar, Meltem; Kuzeyli Yıldırım, Yasemin; Akın Korhan, Esra

    2018-05-29

    Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Methodological and cross sectional study. A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance of chronic pain.

  7. Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire

    Directory of Open Access Journals (Sweden)

    Hazel Ekin Akmaz

    2018-05-01

    Full Text Available Background: Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. Aims: To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Study Design: Methodological and cross sectional study. Methods: A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. Results: The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. Conclusion: The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance

  8. Reliability and construct validity for scale of rejection of Christianity.

    Science.gov (United States)

    Robbins, Mandy; Francis, Leslie J; Bradford, Amanda

    2003-02-01

    A sample of 16 male and 30 female undergraduates completed the Greer and Francis Scale of Rejection of Christianity. The data support the internal consistency reliability and construct validity of the scale for this sample.

  9. The reliability and validity of a sexual functioning questionnaire.

    Science.gov (United States)

    Corty, E W; Althof, S E; Kurit, D M

    1996-01-01

    The present study assessed the reliability and validity of a measure of sexual functioning, the CMSH-SFQ, for male patients and their partners. The CMSH-SFQ measures erectile and orgasmic functioning, sexual drive, frequency of sexual behavior, and sexual satisfaction. Test-retest reliability was assessed with 19 males and 19 females for the baseline CMSH-SFQ. Criterion validity was measured by comparing the answers of 25 male patients to those of their partners at baseline and follow-up. The majority of items had acceptable levels of reliability and validity. The CMSH-SFQ provides a reliable and valid device that can be used to measure global sexual functioning in men and their partners and may be used to evaluate the efficacy of treatments for sexual dysfunctions. Limitations and suggestions for use of the CMSH-SFQ are addressed.

  10. Reliability and validity of the McDonald Play Inventory.

    Science.gov (United States)

    McDonald, Ann E; Vigen, Cheryl

    2012-01-01

    This study examined the ability of a two-part self-report instrument, the McDonald Play Inventory, to reliably and validly measure the play activities and play styles of 7- to 11-yr-old children and to discriminate between the play of neurotypical children and children with known learning and developmental disabilities. A total of 124 children ages 7-11 recruited from a sample of convenience and a subsample of 17 parents participated in this study. Reliability estimates yielded moderate correlations for internal consistency, total test intercorrelations, and test-retest reliability. Validity estimates were established for content and construct validity. The results suggest that a self-report instrument yields reliable and valid measures of a child's perceived play performance and discriminates between the play of children with and without disabilities. Copyright © 2012 by the American Occupational Therapy Association, Inc.

  11. Modeling, implementation, and validation of arterial travel time reliability.

    Science.gov (United States)

    2013-11-01

    Previous research funded by Florida Department of Transportation (FDOT) developed a method for estimating : travel time reliability for arterials. This method was not initially implemented or validated using field data. This : project evaluated and r...

  12. A clinician-administered severity rating scale for illness anxiety: development, reliability, and validity of the H-YBOCS-M.

    Science.gov (United States)

    Skritskaya, Natalia A; Carson-Wong, Amanda R; Moeller, James R; Shen, Sa; Barsky, Arthur J; Fallon, Brian A

    2012-07-01

    Clinician-administered measures to assess severity of illness anxiety and response to treatment are few. The authors evaluated a modified version of the hypochondriasis-Y-BOCS (H-YBOCS-M), a 19-item, semistructured, clinician-administered instrument designed to rate severity of illness-related thoughts, behaviors, and avoidance. The scale was administered to 195 treatment-seeking adults with DSM-IV hypochondriasis. Test-retest reliability was assessed in a subsample of 20 patients. Interrater reliability was assessed by 27 interviews independently rated by four raters. Sensitivity to change was evaluated in a subsample of 149 patients. Convergent and discriminant validity was examined by comparing H-YBOCS-M scores to other measures administered. Item clustering was examined with confirmatory and exploratory factor analyses. The H-YBOCS-M demonstrated good internal consistency, interrater and test-retest reliability, and sensitivity to symptom change with treatment. Construct validity was supported by significant higher correlations with scores on other measures of hypochondriasis than with nonhypochondriacal measures. Improvement over time in response to treatment correlated with improvement both on measures of hypochondriasis and on measures of somatization, depression, anxiety, and functional status. Confirmatory factor analysis did not show adequate fit for a three-factor model. Exploratory factor analysis revealed a five-factor solution with the first two factors consistent with the separation of the H-YBOCS-M items into the subscales of illness-related avoidance and compulsions. H-YBOCS-M appears to be valid, reliable, and appropriate as an outcome measure for treatment studies of illness anxiety. Study results highlight "avoidance" as a key feature of illness anxiety-with potentially important nosologic and treatment implications. © 2012 Wiley Periodicals, Inc.

  13. Telepsychiatry clinical decision support system used by non-psychiatrists in remote areas: Validity & reliability of diagnostic module

    Science.gov (United States)

    Malhotra, Savita; Chakrabarti, Subho; Shah, Ruchita; Sharma, Minali; Sharma, Kanu Priya; Malhotra, Akanksha; Upadhyaya, Suneet K.; Margoob, Mushtaq A.; Maqbool, Dar; Jassal, Gopal D.

    2017-01-01

    Background & objectives: A knowledge-based, logically-linked online telepsychiatric decision support system for diagnosis and treatment of mental disorders was developed and validated. We evaluated diagnostic accuracy and reliability of the application at remote sites when used by non-psychiatrists who underwent a brief training in its use through video-conferencing. Methods: The study was conducted at a nodal telepsychiatry centre, and three geographically remote peripheral centres. The diagnostic tool of application had a screening followed by detailed criteria-wise diagnostic modules for 18 psychiatric disorders. A total of 100 consecutive consenting adult outpatients attending remote telepsychiatry centres were included. To assess inter-rater reliability, patients were interviewed face to face by non-specialists at remote sites using the application (active interviewer) and simultaneously on online application via video-conferencing by a passive assessor at nodal centre. Another interviewer at the nodal centre rated the patient using Mini-International Neuropsychiatric Interview (MINI) for diagnostic validation. Results: Screening sub-module had high sensitivity (80-100%), low positive predictive values (PPV) (0.10-0.71) but high negative predictive value (NPV) (0.97-1) for most disorders. For the diagnostic sub-modules, Cohen's kappa was >0.4 for all disorders, with kappa of 0.7-1.0 for most disorders. PPV and NPV were high for most disorders. Inter-rater agreement analysis revealed kappa >0.6 for all disorders. Interpretation & conclusions: Diagnostic tool showed acceptable to good validity and reliability when used by non-specialists at remote sites. Our findings show that diagnostic tool of the telepsychiatry application has potential to empower non-psychiatrist doctors and paramedics to diagnose psychiatric disorders accurately and reliably in remote sites. PMID:29265020

  14. Reliability and validity of the upper-body dressing scale in Japanese patients with vascular dementia with hemiparesis.

    Science.gov (United States)

    Endo, Arisa; Suzuki, Makoto; Akagi, Atsumi; Chiba, Naoyuki; Ishizaka, Ikuyo; Matsunaga, Atsuhiko; Fukuda, Michinari

    2015-03-01

    The purpose of this study was to examine the reliability and validity of the Upper-body Dressing Scale (UBDS) for buttoned shirt dressing, which evaluates the learning process of new component actions of upper-body dressing in patients diagnosed with dementia and hemiparesis. This was a preliminary correlational study of concurrent validity and reliability in which 10 vascular dementia patients with hemiparesis were enrolled and assessed repeatedly by six occupational therapists by means of the UBDS and the dressing item of the Functional Independence Measure (FIM). Intraclass correlation coefficient was 0.97 for intra-rater reliability and 0.99 for inter-rater reliability. The level of correlation between UBDS score and FIM dressing item scores was -0.93. UBDS scores for paralytic hand passed into the sleeve and sleeve pulled up beyond the shoulder joint were worse than the scores for the other components of the task. The UBDS has good reliability and validity for vascular dementia patients with hemiparesis. Further research is needed to investigate the relation between UBDS score and the effect of intervention and to clarify sensitivity or responsiveness of the scale to clinical change. Copyright © 2014 John Wiley & Sons, Ltd.

  15. Converting three general-cognitive function scales into Persian and assessment of their validity and reliability

    Directory of Open Access Journals (Sweden)

    Payam Moin

    2011-01-01

    Full Text Available Objectives: Glasgow Outcome Scale Extended (GOSE, Galveston Amnesia and orientation Test (GOAT and Disability Rating Scale (DRS are three popular outcome measure tools used principally in traumatic brain injury (TBI patients. We conducted this study to provide a Farsi version of these outcome scales for use in Iran. Methods: Following a comprehensive literature review, Farsi transcripts were prepared by "forward-backward" translation and reviewed by subject experts. After a pretest on a few patients, the final versions were obtained. 38 patients with closed head injury were interviewed simultaneously by two interviewers. Main statistics used to assess validity and reliability included "Factor analysis" for construct validity, Cronbach′s alpha for internal consistency, and Pearson Correlation and Kappa Coefficient for inter-rater agreement. Results: Factor analysis for Farsi-GOAT (FGOAT revealed 5 independent factors with a total distribution variance of 80.2%. For Farsi-DRS (FDRS, 3 independent factors were found with a 92.3% variance. The Cronbach′s alpha (95% confidence interval was 0.84 (0.763- 0.919 and 0.91 (0.901-0.919 for FGOAT and FDRS, respectively. Pearson Correlation between total scores of two raters was 0.98 and 0.97 for FGOAT and FDRS, in order. Kappa coefficient (95% CI between outcome rankings of raters was 0.73 (0.618-0.852 and 0.68 (0.594-0.770 for FGOAT and FDRS, respectively. As for Farsi-GOSE scale, Kappa value was 0.4 (0.285-0.507 for 8-level outcome ranking and improved to 0.7 (0.585-0.817 for 5-level scale. We found a good correlation between FDRS and FGOSE predicted prognoses (Spearman′s rho= 0.74, 95% CI: 0.676-0.802. Conclusions: FDRS and FGOAT had appropriate validity and reliability. The 8-level outcome FGOSE scale disclosed a low inter-rater agreement, but a suitable observer agreement was achieved when the 5-level outcome was applied.

  16. Eating Disorder Diagnostic Scale: Additional Evidence of Reliability and Validity

    Science.gov (United States)

    Stice, Eric; Fisher, Melissa; Martinez, Erin

    2004-01-01

    The authors conducted 4 studies investigating the reliability and validity of the Eating Disorder Diagnostic Scale (HDDS; E. Stice, C. F. Telch, & S. L. Rizvi, 2000), a brief self-report measure for diagnosing anorexia nervosa, bulimia nervosa, and binge eating disorder. Study 1 found that the HDDS showed criterion validity with interview-based…

  17. The Danish anal sphincter rupture questionnaire: Validity and reliability

    DEFF Research Database (Denmark)

    Due, Ulla; Ottesen, Marianne

    2008-01-01

    Objective. To revise, validate and test for reliability an anal sphincter rupture questionnaire in relation to construct, content and face validity. Setting and background. Since 1996 women with anal sphincter rupture (ASR) at one of the public university hospitals in Copenhagen, Denmark have bee...

  18. Reliability and validity of the Incontinence Quiz-Turkish version.

    Science.gov (United States)

    Kara, Kerime C; Çıtak Karakaya, İlkim; Tunalı, Nur; Karakaya, Mehmet G

    2018-01-01

    The aim of this study was to investigate the reliability and validity of the Turkish version of the Incontinence Quiz, which was developed by Branch et al. (1994), to assess women's knowledge of and attitudes toward urinary incontinence. Comprehensibility of the Turkish version of the 14-item Incontinence Quiz, which was prepared following translation-back translation procedures, was tested on a pilot group of eight women, and its internal reliability, test-retest reliability and construct validity were assessed in 150 women who attended the gynecology clinics of three hospitals in İçel, Turkey. Physical and sociodemographic characteristics and presence of incontinence complaints were also recorded. Data were analyzed at the 0.05 alpha level, using SPSS version 22. The scale had good reliability and validity. The internal reliability coefficient (Cronbach α) was 0.80, test-retest correlation coefficients were 0.83-0.94; and with regard to construct validity, Kaiser-Meyer-Olkin coefficient was 0.76 and Barlett sphericity test was 562.777 (P = 0.000). Turkish version of the Incontinence Quiz had a four-factor structure, with Eigenvalues ranging from 1.17 to 4.08. The Incontinence Quiz-Turkish version is a highly comprehensible, reliable and valid scale, which may be used to assess Turkish-speaking women's knowledge of and attitudes toward urinary incontinence. © 2017 Japan Society of Obstetrics and Gynecology.

  19. Content validity and reliability of the Copenhagen social relations questionnaire

    DEFF Research Database (Denmark)

    Lund, Rikke; Nielsen, Lene Snabe; Henriksen, Pia Wichmann

    2014-01-01

    OBJECTIVE: The aim of the present article is to describe the face and content validity as well as reliability of the Copenhagen Social Relations Questionnaire (CSRQ). METHOD: The face and content validity test was based on focus group discussions and individual interviews with 31 informants...... from the interviews. Two additional themes not covered by CSRQ on dynamics and reciprocity of social relations were identified. DISCUSSION: CSRQ holds satisfactory face and content validity as well as reliability, and is suitable for measuring structure and function of social relations including...

  20. Validity and reliability of a new tool to evaluate handwriting difficulties in Parkinson's disease.

    Directory of Open Access Journals (Sweden)

    Evelien Nackaerts

    Full Text Available Handwriting in Parkinson's disease (PD features specific abnormalities which are difficult to assess in clinical practice since no specific tool for evaluation of spontaneous movement is currently available.This study aims to validate the 'Systematic Screening of Handwriting Difficulties' (SOS-test in patients with PD.Handwriting performance of 87 patients and 26 healthy age-matched controls was examined using the SOS-test. Sixty-seven patients were tested a second time within a period of one month. Participants were asked to copy as much as possible of a text within 5 minutes with the instruction to write as neatly and quickly as in daily life. Writing speed (letters in 5 minutes, size (mm and quality of handwriting were compared. Correlation analysis was performed between SOS outcomes and other fine motor skill measurements and disease characteristics. Intrarater, interrater and test-retest reliability were assessed using the intraclass correlation coefficient (ICC and Spearman correlation coefficient.Patients with PD had a smaller (p = 0.043 and slower (p 0.769 for both groups.The SOS-test is a short and effective tool to detect handwriting problems in PD with excellent reliability. It can therefore be recommended as a clinical instrument for standardized screening of handwriting deficits in PD.

  1. Validity and Reliability of the 8-Item Work Limitations Questionnaire.

    Science.gov (United States)

    Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

    2017-12-01

    Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.

  2. The validity and reliability of the diagnosis of hyperkinetic disorders in the Danish Psychiatric Central Research Registry

    DEFF Research Database (Denmark)

    Jensen, Christina Mohr; Vinkel Koch, S; Lauritsen, Marlene Briciet

    2016-01-01

    were used to validate the diagnosis. Patient files were systematically scored for the presence of ICD-10 criteria for HD and oppositional defiant disorder/conduct disorder (ODD/CD; F91). Further to this, an inter-rater reliability study was also conducted, whereby two experienced child and adolescent......OBJECTIVE: To validate the diagnosis of hyperkinetic disorders (HD) in the Danish Psychiatric Central Research Registry (DPCRR) for children and adolescents aged 4 to 15 given in the years 1995 to 2005. METHOD: From a total of 4568 participants, a representative random subsample of n=387 patients...... it was not possible to reach a conclusion for 5.1% of the cases, 3.8% of the diagnoses were registration errors, and in 4.3% of the files the diagnosis had to be rejected. Inter-rater agreement was high (κ=0.83, z=10.9, Pvalidity of hyperkinetic disorders, unspecified (F90.9) was lower and comorbid CD...

  3. An Analytic Creativity Assessment Scale for Digital Game Story Design: Construct Validity, Internal Consistency and Interrater Reliability

    Science.gov (United States)

    Chuang, Tsung-Yen; Huang, Yun-Hsuan

    2015-01-01

    Mobile technology has rapidly made digital games a popular entertainment to this digital generation, and thus digital game design received considerable attention in both the game industry and design education. Digital game design involves diverse dimensions in which digital game story design (DGSD) particularly attracts our interest, as the…

  4. Reliability and validity of four alternative definitions of rapid-cycling bipolar disorder.

    Science.gov (United States)

    Maj, M; Pirozzi, R; Formicola, A M; Tortorella, A

    1999-09-01

    This study tested the reliability and validity of four definitions of rapid cycling. Two trained psychiatrists, using the Schedule for Affective Disorders and Schizophrenia, independently assessed 210 patients with bipolar disorder. They checked whether each patient met four definitions of rapid cycling: one consistent with DSM-IV criteria, one waiving criteria for duration of affective episodes, one waiving such criteria and requiring at least one switch from mania to depression or vice versa during the reference year, and one waiving duration criteria and requiring at least 8 weeks of fully symptomatic affective illness during the reference year. The interrater reliability was calculated by Cohen's kappa statistic. Patients who met each definition according to both psychiatrists were compared to those who did not meet any definition (nonrapid-cycling group) on demographic and clinical variables. All patients were followed up for 1 year. Kappa values were 0.93, 0.73, 0.75, and 0.80, respectively, for the four definitions of rapid cycling. The groups meeting the second and third definitions included significantly more female and bipolar II patients than did the nonrapid-cycling group. Those two groups also had the lowest proportion of patients with a favorable lithium prophylaxis outcome and the highest stability of the rapid-cycling pattern on follow-up. The four groups of rapid-cycling patients did not differ significantly among themselves on any of the assessed variables. The expression "rapid cycling" encompasses a spectrum of conditions. The DSM-IV definition, although quite reliable, covers only part of this spectrum, and the conditions that are excluded are very typical in terms of key validators and are relatively stable over time.

  5. Korean Version of the Delirium Rating Scale-Revised-98: Reliability and Validity

    Science.gov (United States)

    Ryu, Jian; Lee, Jinyoung; Kim, Hwi-Jung; Shin, Im Hee; Kim, Jeong-Lan; Trzepacz, Paula T.

    2011-01-01

    Objective The aims of the present study were 1) to standardize the validity and reliability of the Korean version of Delirium Rating Scale-Revised-98 (DRS-R98-K) and 2) to establish the optimum cut-off value, sensitivity, and specificity for discriminating delirium from other non-delirious psychiatric conditions. Methods Using DSM-IV criteria, 157 subjects (69 delirium, 29 dementia, 32 schizophrenia, and 27 other psychiatric patients) were enrolled. Subjects were evaluated using DRS-R98-K, DRS-K, Mini-Mental State Examination (MMSE-K), and Clinical Global Impression-Severity (CGI-S) scale. Results DRS-R98-K total and severity scores showed high correlations with DRS-K. They were significantly different across all groups (p=0.000). However, neither MMSE-K nor CGI-S distinguished delirium from dementia. All DRS-R98-K diagnostic items (#14-16) and items #1 and 2 significantly discriminated delirium from dementia. Cronbach's alpha coefficient revealed high internal consistency for DRS-R98-K total (r=0.91) and severity (r=0.89) scales. Interrater reliability (ICC between 0.96 and 1) was very high. Using receiver operating characteristic analysis, the area under the curve of DRS-R98-K total score was 0.948 between the delirium group and all other groups and 0.873 between the delirium and dementia groups. The best cut-off scores in DRS-R98-K total score were 18.5 and 19.5 between the delirium and the other three groups and 20.5 between the delirium and dementia groups. Conclusion We demonstrated that DRS-R98-K is a valid and reliable instrument for assessing delirium severity and diagnosis and discriminating delirium from dementia and other psychiatric disorders in Korean patients. PMID:21519534

  6. The Danish anal sphincter rupture questionnaire: Validity and reliability

    DEFF Research Database (Denmark)

    Due, Ulla; Ottesen, Marianne

    2008-01-01

    Objective. To revise, validate and test for reliability an anal sphincter rupture questionnaire in relation to construct, content and face validity. Setting and background. Since 1996 women with anal sphincter rupture (ASR) at one of the public university hospitals in Copenhagen, Denmark have been...... main questions but one. Two questions needed further explanation. Seven women made minor errors. Conclusion. The validated Danish questionnaire has a good construct, content and face validity. It is a well accepted, reliable, simple and clinically relevant screening tool. It reveals physical problems...... offered pelvic floor muscle examination and instruction by a specialist physiotherapist. In relation to that, a non-validated questionnaire about anal and urinary incontinence was to be answered six months after childbirth. Method. The original questionnaire was revised and a pilot test was performed...

  7. Reliability and cross-cultural validation of the Turkish version of Manual Ability Classification System (MACS) for children with cerebral palsy.

    Science.gov (United States)

    Akpinar, Pinar; Tezel, Canan G; Eliasson, Ann-Christin; Icagasioglu, Afitap

    2010-01-01

    To determine the reliability and cross-cultural validation of the Turkish translation of the Manual Ability Classification System (MACS) for children with cerebral palsy (CP) and to investigate the relation to gross motor function and other comorbidities. After the forward and backward translation procedures, inter-rater and test-retest reliability was assessed between parents, physiotherapists and physicians using the intra-class correlation coefficient (ICC). Children (N = 118, 4 to 18 years, mean age 9 years 4 months; 68 boys, 50 girls) with various types of CP were classified. Additional data on the Gross Motor Function Classification System (GMFCS), intellectual delay, visual acuity, and epilepsy were collected. The inter-rater reliability was high; the ICC ranged from 0.89 to 0.96 among different professionals and parents. Between two persons of the same profession it ranged from 0.97 to 0.98. For the test-retest reliability it ranged from 0.91 to 0.98. Total agreement between the GMFCS and the MACS occurred in only 45% of the children. The level of the MACS was found to correlate with the accompanying comorbidities, namely intellectual delay and epilepsy. The Turkish version of the MACS is found to be valid and reliable, and is suggested to be appropriate for the assessment of manual ability within the Turkish population.

  8. Validity and Reliability of the Upper Extremity Work Demands Scale.

    Science.gov (United States)

    Jacobs, Nora W; Berduszek, Redmar J; Dijkstra, Pieter U; van der Sluis, Corry K

    2017-12-01

    Purpose To evaluate validity and reliability of the upper extremity work demands (UEWD) scale. Methods Participants from different levels of physical work demands, based on the Dictionary of Occupational Titles categories, were included. A historical database of 74 workers was added for factor analysis. Criterion validity was evaluated by comparing observed and self-reported UEWD scores. To assess structural validity, a factor analysis was executed. For reliability, the difference between two self-reported UEWD scores, the smallest detectable change (SDC), test-retest reliability and internal consistency were determined. Results Fifty-four participants were observed at work and 51 of them filled in the UEWD twice with a mean interval of 16.6 days (SD 3.3, range = 10-25 days). Criterion validity of the UEWD scale was moderate (r = .44, p = .001). Factor analysis revealed that 'force and posture' and 'repetition' subscales could be distinguished with Cronbach's alpha of .79 and .84, respectively. Reliability was good; there was no significant difference between repeated measurements. An SDC of 5.0 was found. Test-retest reliability was good (intraclass correlation coefficient for agreement = .84) and all item-total correlations were >.30. There were two pairs of highly related items. Conclusion Reliability of the UEWD scale was good, but criterion validity was moderate. Based on current results, a modified UEWD scale (2 items removed, 1 item reworded, divided into 2 subscales) was proposed. Since observation appeared to be an inappropriate gold standard, we advise to investigate other types of validity, such as construct validity, in further research.

  9. Reliability and Validity of the Clinical Dementia Rating for Community-Living Elderly Subjects without an Informant

    Directory of Open Access Journals (Sweden)

    Ma Shwe Zin Nyunt

    2013-10-01

    Full Text Available Background: The Clinical Dementia Rating (CDR scale is widely used to assess cognitive impairment in Alzheimer's disease. It requires collateral information from a reliable informant who is not available in many instances. We adapted the original CDR scale for use with elderly subjects without an informant (CDR-NI and evaluated its reliability and validity for assessing mild cognitive impairment (MCI and dementia among community-dwelling elderly subjects. Method: At two consecutive visits 1 week apart, nurses trained in CDR assessment interviewed, observed and rated cognitive and functional performance according to a protocol in 90 elderly subjects with suboptimal cognitive performance [Mini-Mental State Examination (MMSE Results: The CDR-NI scores (0, 0.5, 1 showed good internal consistency (Crohnbach's a 0.83-0.84, inter-rater reliability (κ 0.77-1.00 for six domains and 0.95 for global rating and test-retest reliability (κ 0.75-1.00 for six domains and 0.80 for global rating, good agreement (κ 0.79 with the clinical assessment status of MCI (n = 37 and dementia (n = 4 and significant differences in the mean scores for MMSE, MOCA and Instrumental Activities of Daily Living (ANOVA global p Conclusion: Owing to the protocol of the interviews, assessments and structured observations gathered during the two visits, CDR-NI provides valid and reliable assessment of MCI and dementia in community-living elderly subjects without an informant.

  10. Reliable and Valid Assessment of Point-of-care Ultrasonography

    DEFF Research Database (Denmark)

    Todsen, Tobias; Tolsgaard, Martin Grønnebæk; Olsen, Beth Härstedt

    2015-01-01

    physicians' OSAUS scores with diagnostic accuracy. RESULTS: The generalizability coefficient was high (0.81) and a D-study demonstrated that 1 assessor and 5 cases would result in similar reliability. The construct validity of the OSAUS scale was supported by a significant difference in the mean scores......OBJECTIVE: To explore the reliability and validity of the Objective Structured Assessment of Ultrasound Skills (OSAUS) scale for point-of-care ultrasonography (POC US) performance. BACKGROUND: POC US is increasingly used by clinicians and is an essential part of the management of acute surgical...... conditions. However, the quality of performance is highly operator-dependent. Therefore, reliable and valid assessment of trainees' ultrasonography competence is needed to ensure patient safety. METHODS: Twenty-four physicians, representing novices, intermediates, and experts in POC US, scanned 4 different...

  11. The reliability, minimal detectable change and concurrent validity of a gravity-based bubble inclinometer and iphone application for measuring standing lumbar lordosis.

    Science.gov (United States)

    Salamh, Paul A; Kolber, Morey

    2014-01-01

    To investigate the reliability, minimal detectable change (MDC90) and concurrent validity of a gravity-based bubble inclinometer (inclinometer) and iPhone® application for measuring standing lumbar lordosis. Two investigators used both an inclinometer and an iPhone® with an inclinometer application to measure lumbar lordosis of 30 asymptomatic participants. ICC models 3,k and 2,k were used for the intrarater and interrater analysis, respectively. Good interrater and intrarater reliability was present for the inclinometer with Intraclass Correlation Coefficients (ICC) of 0.90 and 0.85, respectively and the iPhone® application with ICC values of 0.96 and 0.81. The minimal detectable change (MDC90) indicates that a change greater than or equal to 7° and 6° is needed to exceed the threshold of error using the iPhone® and inclinometer, respectively. The concurrent validity between the two instruments was good with a Pearson product-moment coefficient of correlation (r) of 0.86 for both raters. Ninety-five percent limits of agreement identified differences ranging from 9° greater in regards to the iPhone® to 8° less regarding the inclinometer. Both the inclinometer and iPhone® application possess good interrater reliability, intrarater reliability and concurrent validity for measuring standing lumbar lordosis. This investigation provides preliminary evidence to suggest that smart phone applications may offer clinical utility comparable to inclinometry for quantifying standing lumbar lordosis. Clinicians should recognize potential individual differences when using these devices interchangeably.

  12. Reliable and valid assessment of performance in thoracoscopy

    DEFF Research Database (Denmark)

    Konge, Lars; Lehnert, Per; Hansen, Henrik Jessen

    2012-01-01

    BACKGROUND: As we move toward competency-based education in medicine, we have lagged in developing competency-based evaluation methods. In the era of minimally invasive surgery, there is a need for a reliable and valid tool dedicated to measure competence in video-assisted thoracoscopic surgery....... The purpose of this study is to create such an assessment tool, and to explore its reliability and validity. METHODS: An expert group of physicians created an assessment tool consisting of 10 items rated on a five-point rating scale. The following factors were included: economy and confidence of movement...

  13. Reliability and Validity of Digital Imagery Methodology for Measuring Starting Portions and Plate Waste from School Salad Bars.

    Science.gov (United States)

    Bean, Melanie K; Raynor, Hollie A; Thornton, Laura M; Sova, Alexandra; Dunne Stewart, Mary; Mazzeo, Suzanne E

    2018-04-12

    Scientifically sound methods for investigating dietary consumption patterns from self-serve salad bars are needed to inform school policies and programs. To examine the reliability and validity of digital imagery for determining starting portions and plate waste of self-serve salad bar vegetables (which have variable starting portions) compared with manual weights. In a laboratory setting, 30 mock salads with 73 vegetables were made, and consumption was simulated. Each component (initial and removed portion) was weighed; photographs of weighed reference portions and pre- and post-consumption mock salads were taken. Seven trained independent raters visually assessed images to estimate starting portions to the nearest ¼ cup and percentage consumed in 20% increments. These values were converted to grams for comparison with weighed values. Intraclass correlations between weighed and digital imagery-assessed portions and plate waste were used to assess interrater reliability and validity. Pearson's correlations between weights and digital imagery assessments were also examined. Paired samples t tests were used to evaluate mean differences (in grams) between digital imagery-assessed portions and measured weights. Interrater reliabilities were excellent for starting portions and plate waste with digital imagery. For accuracy, intraclass correlations were moderate, with lower accuracy for determining starting portions of leafy greens compared with other vegetables. However, accuracy of digital imagery-assessed plate waste was excellent. Digital imagery assessments were not significantly different from measured weights for estimating overall vegetable starting portions or waste; however, digital imagery assessments slightly underestimated starting portions (by 3.5 g) and waste (by 2.1 g) of leafy greens. This investigation provides preliminary support for use of digital imagery in estimating starting portions and plate waste from school salad bars. Results might inform

  14. Discomfort Intolerance Scale: A Study of Reliability and Validity

    Directory of Open Access Journals (Sweden)

    Kadir ÖZDEL

    2012-03-01

    Full Text Available Objective: Discomfort Intolerance Scale was developed by Norman B. Schmidt et al. to assess the individual differences of capacity to withstand physical perturbations or uncomfortable bodily states (2006. The aim of this study is to investigate the validity and reliability of Discomfort Intolerance Scale-Turkish Version (RDÖ. Method: From two different universities, total of 225 students (male=167, female=58 were participated in this study. In order to determine the criterion validity, Beck Anxiety Inventory (BAI and State-Trait Anxiety Inventory (STAI were used. Construct validity was evaluated by factor analysis after the Kaiser-Meyer-Olkin (KMO and Barlett test had been performed. To assess the test-retest reliability the scale was re-applied to 54 participants 6 weeks later. Results: To assess construct validity of DIS, factor analyses were performed using varimax principal components analysis with varimax rotation. The factor analysis resulted in two factors named “discomfort (in tolerance” and “discomfort avoidance”. The Cronbach’s alpha coefficient for the entire scale, discomfort-(intolerance subscale, discomfortavoidance subscale were, .592, .670, .600 respectively. Correlations between two factors of DIS, discomfort intolerance and discomfort avoidance, and Trait Anxiety Inventory of STAI (State-Trait Anxiety Inventory were statistically significant at the level of 0.05. Test-retest reliability was statistically significant at the level of 0.01. Conclusion: Analysis demonstrated that DIS had a satisfactory level of reliability and validity in Turkish university students.

  15. Reliable and valid assessment of Lichtenstein hernia repair skills

    DEFF Research Database (Denmark)

    Carlsen, C G; Lindorff Larsen, Karen; Funch-Jensen, P

    2014-01-01

    PURPOSE: Lichtenstein hernia repair is a common surgical procedure and one of the first procedures performed by a surgical trainee. However, formal assessment tools developed for this procedure are few and sparsely validated. The aim of this study was to determine the reliability and validity...... of an assessment tool designed to measure surgical skills in Lichtenstein hernia repair. METHODS: Key issues were identified through a focus group interview. On this basis, an assessment tool with eight items was designed. Ten surgeons and surgical trainees were video recorded while performing Lichtenstein hernia...... a significant difference between the three groups which indicates construct validity, p skills can be assessed blindly by a single rater in a reliable and valid fashion with the new procedure-specific assessment tool. We recommend this tool for future assessment...

  16. KAMUTHE video microanalysis system for use in Brazil: translation, cross-cultural adaptation and evidence of validity and reliability

    Directory of Open Access Journals (Sweden)

    Gustavo Schulz Gattino

    2016-11-01

    Full Text Available Background KAMUTHE is a video microanalysis system which observes preverbal communication within the music therapy setting. This system is indicated for children with autism spectrum disorder (ASD or multiple disabilities. The purpose of this study was to translate, adapt to Brazilian Portuguese language and analyze some psychometric properties (reliability and validity evidence of KAMUTHE administration in Brazil for individuals with ASD. Participants and procedure Translation, back translation, analysis by judges, and pilot application were performed to obtain evidence of content and face validity. The second part of this study was to administer KAMUTHE in 39 consecutive children with ASD. An individual session of improvisational music therapy was applied to assess the different behaviors included in KAMUTHE. The intra-rater reliability, concurrent validity and convergent validity were analyzed. Results Translation and cross-cultural adaptation were followed and some cultural adaptations were needed. Inter-rater reliability was very good (ICCs 0.95-0.99 for the three child’s behaviors analyzed. Criteria validity with a moderate negative association was found (r = –.38, p = .017 comparing the behavior “Gazes at therapist” and the level of ASD along with the Childhood Autism Rating Scale (CARS. Convergent validity was established between the behavior “Gazes at therapist” and the two nonlinguistic communication scales (social interaction and interests of the Children’s Communication Checklist (CCC with a moderate correlation (r = –.43, p = .005. Conclusions The administration of the KAMUTHE video microanalysis system showed positive results in children with ASD. Further studies are needed to improve the reliability and validity of the instrument in Brazil.

  17. DEĞERLENDİRİCİLER ARASI GÜVENİLİRLİK VE TATMİN BAĞLAMINDA 360 DERECE PERFORMANS DEĞERLENDİRME - 360-DEGREE PERFORMANCE APPRAISAL IN THE CONTEXT OF INTERRATER RELIABILITY AND SATISFACTION

    Directory of Open Access Journals (Sweden)

    Adem BALTACI

    2014-03-01

    Full Text Available ÖzetGünümüzün en popüler değerlendirme sistemi olarak kabul edilen 360 derece değerlendirme sistemi gücünü, farklı kaynaklardan elde edilecek olan sonuçların daha objektif ve kapsayıcı olacağı görüşünden almaktadır. Ancak burada hangi değerlendiricinin daha geçerli ve güvenilir bilgi sağladığı halen belirsizliğini koruyan bir konudur. Bu belirsizliğe rağmen 360 derece değerlendirme sistemi çalışana kendini ve diğerlerini değerlendirme şansı tanıyor olması nedeniyle sistemden duyulan tatmini arttırmaktadır. Bu bağlamda yapılan bu çalışmada, değerlendirme sisteminden duyulan tatmin ve değerlendiriciler arası güvenilirlik özelinde 360 derece değerlendirme sistemi ele alınmıştır. Bu amaçla bu sistemi uygulayan bir işletmenin çalışanlarının değerlendirme sonuçları incelenmiş ve ayrıca çalışanlara sistemden duydukları tatmini ölçen bir anket uygulanmıştır. Analizler sonucunda demografik değişkenlerin performans puanları üzerinde olmasa da farklı kaynaklardan gelen değerlendirmeler üzerinde etkili olabildiği görülmüştür. Ayrıca üstlerin çalışanların gerçek performans puanlarına en yakın değerlendirmeleri yaptığı incelemeler sonucunda ortaya çıkmıştır. Bunun yanı sıra sisteme karşı duyulan tatmin ile çalışanların performansları arasında kuvvetli bir ilişki tespit edilmiştir.AbstractThe 360-degree appraisal system, viewed as today’s most popular appraisal system, gets its strength from the view that results from different sources would be much more objective and inclusive. Yet, the question of exactly which rating source provides relatively more valid and reliable information remains to be answered. This uncertainty notwithstanding, the 360-degree performance appraisal system leads to higher satisfaction with the system as it allows employees to assess both themselves and others. Against this background, this study addresses the

  18. Reliability and Validity Assessment of a Linear Position Transducer

    Science.gov (United States)

    Garnacho-Castaño, Manuel V.; López-Lastra, Silvia; Maté-Muñoz, José L.

    2015-01-01

    The objectives of the study were to determine the validity and reliability of peak velocity (PV), average velocity (AV), peak power (PP) and average power (AP) measurements were made using a linear position transducer. Validity was assessed by comparing measurements simultaneously obtained using the Tendo Weightlifting Analyzer Systemi and T-Force Dynamic Measurement Systemr (Ergotech, Murcia, Spain) during two resistance exercises, bench press (BP) and full back squat (BS), performed by 71 trained male subjects. For the reliability study, a further 32 men completed both lifts using the Tendo Weightlifting Analyzer Systemz in two identical testing sessions one week apart (session 1 vs. session 2). Intraclass correlation coefficients (ICCs) indicating the validity of the Tendo Weightlifting Analyzer Systemi were high, with values ranging from 0.853 to 0.989. Systematic biases and random errors were low to moderate for almost all variables, being higher in the case of PP (bias ±157.56 W; error ±131.84 W). Proportional biases were identified for almost all variables. Test-retest reliability was strong with ICCs ranging from 0.922 to 0.988. Reliability results also showed minimal systematic biases and random errors, which were only significant for PP (bias -19.19 W; error ±67.57 W). Only PV recorded in the BS showed no significant proportional bias. The Tendo Weightlifting Analyzer Systemi emerged as a reliable system for measuring movement velocity and estimating power in resistance exercises. The low biases and random errors observed here (mainly AV, AP) make this device a useful tool for monitoring resistance training. Key points This study determined the validity and reliability of peak velocity, average velocity, peak power and average power measurements made using a linear position transducer The Tendo Weight-lifting Analyzer Systemi emerged as a reliable system for measuring movement velocity and power. PMID:25729300

  19. Reliability and validity of television food advertising questionnaire in Malaysia.

    Science.gov (United States)

    Zalma, Abdul Razak; Safiah, Md Yusof; Ajau, Danis; Khairil Anuar, Md Isa

    2015-09-01

    Interventions to counter the influence of television food advertising amongst children are important. Thus, reliable and valid instrument to assess its effect is needed. The objective of this study was to determine the reliability and validity of such a questionnaire. The questionnaire was administered twice on 32 primary schoolchildren aged 10-11 years in Selangor, Malaysia. The interval between the first and second administration was 2 weeks. Test-retest method was used to examine the reliability of the questionnaire. Intra-rater reliability was determined by kappa coefficient and internal consistency by Cronbach's alpha coefficient. Construct validity was evaluated using factor analysis. The test-retest correlation showed moderate-to-high reliability for all scores (r = 0.40*, p = 0.02 to r = 0.95**, p = 0.00), with one exception, consumption of fast foods (r = 0.24, p = 0.20). Kappa coefficient showed acceptable-to-strong intra-rater reliability (K = 0.40-0.92), except for two items under knowledge on television food advertising (K = 0.26 and K = 0.21) and one item under preference for healthier foods (K = 0.33). Cronbach's alpha coefficient indicated acceptable internal consistency for all scores (0.45-0.60). After deleting two items under Consumption of Commonly Advertised Food, the items showed moderate-to-high loading (0.52, 0.84, 0.42 and 0.42) with the Scree plot showing that there was only one factor. The Kaiser-Meyer-Olkin was 0.60, showing that the sample was adequate for factor analysis. The questionnaire on television food advertising is reliable and valid to assess the effect of media literacy education on television food advertising on schoolchildren. © The Author (2013). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. Rating of Everyday Arm-Use in the Community and Home (REACH scale for capturing affected arm-use after stroke: development, reliability, and validity.

    Directory of Open Access Journals (Sweden)

    Lisa A Simpson

    Full Text Available To develop a brief, valid and reliable tool [the Rating of Everyday Arm-use in the Community and Home (REACH scale] to classify affected upper limb use after stroke outside the clinical setting.Focus groups with clinicians, patients and caregivers (n = 33 and a literature review were employed to develop the REACH scale. A sample of community-dwelling individuals with stroke was used to assess the validity (n = 96 and inter-rater reliability (n = 73 of the new scale.The REACH consists of separate scales for dominant and non-dominant affected upper limbs, and takes five minutes to administer. Each scale consists of six categories that capture 'no use' to 'full use'. The intraclass correlation coefficient and weighted kappa for inter-rater reliability were 0.97 (95% confidence interval: 0.95-0.98 and 0.91 (0.89-0.93 respectively. REACH scores correlated with external measures of upper extremity use, function and impairment (rho = 0.64-0.94.The REACH scale is a reliable, quick-to-administer tool that has strong relationships to other measures of upper limb use, function and impairment. By providing a rich description of how the affected upper limb is used outside of the clinical setting, the REACH scale fills an important gap among current measures of upper limb use and is useful for understanding the long term effects of stroke rehabilitation.

  1. Factor validity and reliability of the aberrant behavior checklist-community (ABC-C) in an Indian population with intellectual disability.

    Science.gov (United States)

    Lehotkay, R; Saraswathi Devi, T; Raju, M V R; Bada, P K; Nuti, S; Kempf, N; Carminati, G Galli

    2015-03-01

    In this study realised in collaboration with the department of psychology and parapsychology of Andhra University, validation of the Aberrant Behavior Checklist-Community (ABC-C) in Telugu, the official language of Andhra Pradesh, one of India's 28 states, was carried out. To assess the factor validity and reliability of this Telugu version, 120 participants with moderate to profound intellectual disability (94 men and 26 women, mean age 25.2, SD 7.1) were rated by the staff of the Lebenshilfe Institution for Mentally Handicapped in Visakhapatnam, Andhra Pradesh, India. Rating data were analysed with a confirmatory factor analysis. The internal consistency was estimated by Cronbach's alpha. To confirm the test-retest reliability, 50 participants were rated twice with an interval of 4 weeks, and 50 were rated by pairs of raters to assess inter-rater reliability. Confirmatory factor analysis revealed that the root mean square error of approximation (RMSEA) was equal to 0.06, the comparative fit index (CFI) was equal to 0.77, and the Tucker Lewis index (TLI) was equal to 0.77, which indicated that the model with five correlated factors had a good fit. Coefficient alpha ranged from 0.85 to 0.92 across the five subscales. Spearman's rank correlation coefficients for inter-rater reliability tests ranged from 0.65 to 0.75, and the correlations for test-retest reliability ranged from 0.58 to 0.76. All reliability coefficients were statistically significant (P reliability of Telugu version of the ABC-C evidenced factor validity and reliability comparable to the original English version and appears to be useful for assessing behaviour disorders in Indian people with intellectual disabilities. © 2014 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.

  2. The reliability and validity of the standardized Mensendieck test in relation to disability in patients with chronic pain.

    Science.gov (United States)

    Keessen, Paul; Maaskant, Jolanda; Visser, Bart

    2018-08-01

    The standardized Mensendieck test (SMT) was developed to quantify posture, movement, gait, and respiration. In the hands of an experienced therapist, the SMT is proven to be a reliable tool. It is unclear whether posture, movement, gait, and respiration are related to the degree of functional disability in patients with chronic pain. The objective of this study was to assess the reliability and convergent validity of the SMT in a heterogeneous sample of 50 patients with chronic pain. Internal consistency was determined by Cronbach's α and interrater reliability by the intraclass correlation coefficient (ICC). Convergent validity was assessed by determining the Spearman rank correlation coefficient between the movement quality measured in the SMT and functional limitation measured on the disability rating index (DRI). The internal consistency was Cronbach's α 0.91. Substantial reliability was found for the items: movement (ICC = 0.68), gait (ICC = 0.69), sitting posture (ICC = 0.63), and respiration (ICC = 0.64). Insufficient reliability was found for standing posture (ICC = 0.23). A moderate correlation was found between average test score SMT and the DRI (r = -0.37) and respiration and DRI (r = -0.45). The SMT is a reasonably reliable tool to assess movement, gait, sitting posture, and respiration. None of the items in the domain standing posture has sufficient reliability. A thorough study of this domain should be considered. The results show little evidence for convergent validity. Several items of the SMT correlated moderately with functional limitation with the DRI. These items were global movement, hip flexion, pelvis rotation, and all respiration items.

  3. Development, content validity and test-retest reliability of the Lifelong Physical Activity Skills Battery in adolescents.

    Science.gov (United States)

    Hulteen, Ryan M; Barnett, Lisa M; Morgan, Philip J; Robinson, Leah E; Barton, Christian J; Wrotniak, Brian H; Lubans, David R

    2018-03-28

    Numerous skill batteries assess fundamental motor skill (e.g., kick, hop) competence. Few skill batteries examine lifelong physical activity skill competence (e.g., resistance training). This study aimed to develop and assess the content validity, test-retest and inter-rater reliability of the "Lifelong Physical Activity Skills Battery". Development of the skill battery occurred in three stages: i) systematic reviews of lifelong physical activity participation rates and existing motor skill assessment tools, ii) practitioner consultation and iii) research expert consultation. The final battery included eight skills: grapevine, golf swing, jog, push-up, squat, tennis forehand, upward dog and warrior I. Adolescents (28 boys, 29 girls; M = 15.8 years, SD = 0.4 years) completed the Lifelong Physical Activity Skills Battery on two occasions two weeks apart. The skill battery was highly reliable (ICC = 0.84, 95% CI = 0.72-0.90) with individual skill reliability scores ranging from moderate (warrior I; ICC = 0.56) to high (tennis forehand; ICC = 0.82). Typical error (4.0; 95% CI 3.4-5.0) and proportional bias (r = -0.21, p = .323) were low. This study has provided preliminary evidence for the content validity and reliability of the Lifelong Physical Activity Skills Battery in an adolescent population.

  4. Validity and reliability of the Portuguese version of the quality of life in epilepsy inventory (QOLIE-31) for Brazil.

    Science.gov (United States)

    da Silva, Tatiana Indelicato; Ciconelli, Rozana Mesquita; Alonso, Neide Barreira; Azevedo, Auro Mauro; Westphal-Guitti, Ana Carolina; Pascalicchio, Tatiana Frascarelli; Marques, Carolina Mattos; Caboclo, Luís Otávio Sales Ferreira; Cramer, Joyce A; Sakamoto, Américo Ceiki; Yacubian, Elza Márcia Targas

    2007-03-01

    We report the cultural adaptation and psychometric properties of the Quality of Life in Epilepsy-31 Inventory (QOLIE-31) for the Portuguese language and Brazilian culture. This study involved 150 outpatients: 50 presurgical patients with refractory temporal lobe epilepsy (TLE) related to mesial temporal sclerosis (MTS), 50 patients with juvenile myoclonic epilepsy (JME), and 50 seizure-free patients with TLE. They completed the QOLIE-31, Nottingham Health Profile (NHP), Beck Depression Inventory (BDI), and Adverse Events Profile (AEP) and underwent a neuropsychological evaluation (NE). Internal consistency reliability, interrater and test-retest reliability, and construct validity were assessed. QOLIE-31 mean scores were 33.1 (Social Function), 68.9 (Overall Quality of Life), 56.5 (Seizure Worry), 64.1 (Emotional Well-Being), 63.7 (Energy/Fatigue), 38.9 (Cognitive Function), and 49.7 (Medication Effects). Internal consistency was high (Cronbach's alpha), as were the associations between QOLIE-31 and the BDI, NHP, AEP, and NE. The Portuguese/Brazilian version of the QOLIE-31 inventory showed good reliability, validity, and construct validity.

  5. Reliability and validity of the international dementia alliance schedule for the assessment and staging of care in China.

    Science.gov (United States)

    Wang, Xiao; Sun, Zhenghai; Xiong, Lingchuan; Semrau, Maya; He, Jianhua; Li, Yang; Zhu, Jianzhong; Zhang, Nan; Wang, Aimin; Jiang, Qinpu; Mu, Nan; Zhao, Yuping; Chen, Wei; Wu, Donghui; Zheng, Zhanjie; Sun, Yongan; Zhang, Jing; Xu, Jun; Meng, Xue; Zhao, Mei; Zhang, Haifeng; Lv, Xiaozhen; Sartorius, Norman; Li, Tao; Yu, Xin; Wang, Huali

    2017-11-21

    Clinical and social services both are important for dementia care. The International Dementia Alliance (IDEAL) Schedule for the Assessment and Staging of Care was developed to guide clinical and social care for dementia. Our study aimed to assess the validity and reliability of the IDEAL schedule in China. Two hundred eighty-two dementia patients and their caregivers were recruited from 15 hospitals in China. Each patient-caregiver dyad was assessed with the IDEAL schedule by a rater and an observer simultaneously. The Clinical Dementia Rating (CDR), Mini-Mental Status Examination (MMSE), and Caregiver Burden Inventory (CBI) were assessed for criterion validity. IDEAL repeated assessment was conducted 7-10 days after the initial interview for 62 dyads. Two hundred seventy-seven patient-caregiver dyads completed the IDEAL assessment. Inter-rater reliability for the total score of the IDEAL schedule was 0.93 (95%CI = 0.92-0.95). The inter-class coefficient for the total score of IDEAL was 0.95 for the interviewers and 0.93 for the silent raters. The IDEAL total score correlated with the global CDR score (ρ = 0.72, p valid and reliable tool for the staging of care for dementia in the Chinese population.

  6. Optimal number of tests to achieve and validate product reliability

    International Nuclear Information System (INIS)

    Ahmed, Hussam; Chateauneuf, Alaa

    2014-01-01

    The reliability validation of engineering products and systems is mandatory for choosing the best cost-effective design among a series of alternatives. Decisions at early design stages have a large effect on the overall life cycle performance and cost of products. In this paper, an optimization-based formulation is proposed by coupling the costs of product design and validation testing, in order to ensure the product reliability with the minimum number of tests. This formulation addresses the question about the number of tests to be specified through reliability demonstration necessary to validate the product under appropriate confidence level. The proposed formulation takes into account the product cost, the failure cost and the testing cost. The optimization problem can be considered as a decision making system according to the hierarchy of structural reliability measures. The numerical examples show the interest of coupling design and testing parameters. - Highlights: • Coupled formulation for design and testing costs, with lifetime degradation. • Cost-effective testing optimization to achieve reliability target. • Solution procedure for nested aleatoric and epistemic variable spaces

  7. Test of Creative Imagination: Validity and Reliability Study

    Science.gov (United States)

    Gundogan, Aysun; Ari, Meziyet; Gonen, Mubeccel

    2013-01-01

    The purpose of this study was to investigate validity and reliability of the test of creative imagination. This study was conducted with the participation of 1000 children, aged between 9-14 and were studying in six primary schools in the city center of Denizli Province, chosen by cluster ratio sampling. In the study, it was revealed that the…

  8. Validity and Reliability of Internalized Stigma of Mental Illness (Cantonese)

    Science.gov (United States)

    Young, Daniel Kim-Wan; Ng, Petrus Y. N.; Pan, Jia-Yan; Cheng, Daphne

    2017-01-01

    Purpose: This study aims to translate and test the reliability and validity of the Internalized Stigma of Mental Illness-Cantonese (ISMI-C). Methods: The original English version of ISMI is translated into the ISMI-C by going through forward and backward translation procedure. A cross-sectional research design is adopted that involved 295…

  9. Two ankle joint laxity testers: reliability and validity

    NARCIS (Netherlands)

    Kerkhoffs, Gino M. M. J.; Blankevoort, Leendert; Sierevelt, Inger N.; Corvelein, Ruby; Janssen, Guido H. W.; van Dijk, C. Niek

    2005-01-01

    Two test devices were manufactured to objectively measure ankle joint laxity: the dynamic anterior ankle tester (DAAT) and the quasi-static anterior ankle tester (QAAT). The primary aim was to analyse the reliability of both testers; The secondary aim was to assess validity in correlation with TELOS

  10. Reliability and Validity of 10 Different Standard Setting Procedures.

    Science.gov (United States)

    Halpin, Glennelle; Halpin, Gerald

    Research indicating that different cut-off points result from the use of different standard-setting techniques leaves decision makers with a disturbing dilemma: Which standard-setting method is best? This investigation of the reliability and validity of 10 different standard-setting approaches was designed to provide information that might help…

  11. Basic School Skills Inventory-3: Validity and Reliability Study

    Science.gov (United States)

    Yildiz, F. Ülkü; Çagdas, Aysel; Kayili, Gökhan

    2017-01-01

    The purpose of this study is to perform the validity-reliability analysis of the three subtests of Basic School Skills Inventory 3--Mathematics, Classroom Behavior and Daily Life skills--and do its adaptation for four to six year-old Turkish children. The sample of the study included 595 four to six year-old Turkish children attending public and…

  12. Valid and Reliable Science Content Assessments for Science Teachers

    Science.gov (United States)

    Tretter, Thomas R.; Brown, Sherri L.; Bush, William S.; Saderholm, Jon C.; Holmes, Vicki-Lynn

    2013-01-01

    Science teachers' content knowledge is an important influence on student learning, highlighting an ongoing need for programs, and assessments of those programs, designed to support teacher learning of science. Valid and reliable assessments of teacher science knowledge are needed for direct measurement of this crucial variable. This paper…

  13. Palliative Sedation: Reliability and Validity of Sedation Scales

    NARCIS (Netherlands)

    Arevalo Romero, J.; Brinkkemper, T.; van der Heide, A.; Rietjens, J.A.; Ribbe, M.W.; Deliens, L.; Loer, S.A.; Zuurmond, W.W.A.; Perez, R.S.G.M.

    2012-01-01

    Context: Observer-based sedation scales have been used to provide a measurable estimate of the comfort of nonalert patients in palliative sedation. However, their usefulness and appropriateness in this setting has not been demonstrated. Objectives: To study the reliability and validity of

  14. Reliability and validity of emergency department triage systems

    NARCIS (Netherlands)

    van der Wulp, I.

    2010-01-01

    Reliability and validity of triage systems is important because this can affect patient safety. In this thesis, these aspects of two emergency department (ED) triage systems were studied as well as methodological aspects in these types of studies. The consistency, reproducibility, and criterion

  15. Development, Reliability, and Validity of a Child Dissociation Scale.

    Science.gov (United States)

    Putnam, Frank W.; And Others

    1993-01-01

    Evaluation of the Child Dissociative Checklist found it to be a reliable and valid observer report measure of dissociation in children, including sexually abused girls and children with dissociative disorder and with multiple personality disorder. The checklist, which is appended, is intended as a clinical screening instrument and research measure…

  16. Reliability and validity of ten consumer activity trackers

    NARCIS (Netherlands)

    Kooiman, Thea; Dontje, Manon L.; Sprenger, Siska; Krijnen, Wim; van der Schans, Cees; de Groot, Martijn

    2015-01-01

    Background: Activity trackers can potentially stimulate users to increase their physical activity behavior. The aim of this study was to examine the reliability and validity of ten consumer activity trackers for measuring step count in both laboratory and free-living conditions. Method: Healthy

  17. Reliability and validity of subjective assessment of lumbar lordosis in ...

    African Journals Online (AJOL)

    Radiological assessment of lumbar lordotic curve aids in early diagnosis of conditions even before neurologic changes set in. Objective: To ascertain the level of reliability and validity of subjective assessment of lumbar lordosis in conventional radiography. Design: A blinded, repeated-measures diagnostic test was carried ...

  18. Construct validity and reliability of automated body reaction test ...

    African Journals Online (AJOL)

    Automated Body Reaction Test (ABRT) is a new device for skills and physical assessment instrument to measure ability on react, move quickly and accurately in accordance with stimulus. A total of 474 subjects aged 7-17 years old were randomly selected for the construct validity (n=330) and reliability (n=144). The ABRT ...

  19. Turkish Metalinguistic Awareness Scale: A Validity and Reliability Study

    Science.gov (United States)

    Varisoglu, Behice

    2018-01-01

    The aim of this study is to develop a useful, valid and reliable measurement tool that will help teacher candidates determine their Turkish metalinguistic awareness. During the development of the scale, a pool of items was created by scanning the relevant literature and examining other awareness scales. The materials prepared were re-examined…

  20. Health Service Quality Scale: Brazilian Portuguese translation, reliability and validity.

    Science.gov (United States)

    Rocha, Luiz Roberto Martins; Veiga, Daniela Francescato; e Oliveira, Paulo Rocha; Song, Elaine Horibe; Ferreira, Lydia Masako

    2013-01-17

    The Health Service Quality Scale is a multidimensional hierarchical scale that is based on interdisciplinary approach. This instrument was specifically created for measuring health service quality based on marketing and health care concepts. The aim of this study was to translate and culturally adapt the Health Service Quality Scale into Brazilian Portuguese and to assess the validity and reliability of the Brazilian Portuguese version of the instrument. We conducted a cross-sectional, observational study, with public health system patients in a Brazilian university hospital. Validity was assessed using Pearson's correlation coefficient to measure the strength of the association between the Brazilian Portuguese version of the instrument and the SERVQUAL scale. Internal consistency was evaluated using Cronbach's alpha coefficient; the intraclass (ICC) and Pearson's correlation coefficients were used for test-retest reliability. One hundred and sixteen consecutive postoperative patients completed the questionnaire. Pearson's correlation coefficient for validity was 0.20. Cronbach's alpha for the first and second administrations of the final version of the instrument were 0.982 and 0.986, respectively. For test-retest reliability, Pearson's correlation coefficient was 0.89 and ICC was 0.90. The culturally adapted, Brazilian Portuguese version of the Health Service Quality Scale is a valid and reliable instrument to measure health service quality.

  1. Factorial validation and reliability analysis of the brain fag syndrome ...

    African Journals Online (AJOL)

    Results: Two valid factors emerged with items 1-3 and items 4, 5 & 7 loading on respectively, making the BFSS a twodimensional (multidimensional) scale which measures 2 aspects of brain fag [labeled burning sensation and crawling sensation respectively]. The reliability analysis yielded a Cronbach Alpha coefficient of ...

  2. Reliability and Validity of Curriculum-Based Informal Reading Inventories.

    Science.gov (United States)

    Fuchs, Lynn; And Others

    A study was conducted to explore the reliability and validity of three prominent procedures used in informal reading inventories (IRIs): (1) choosing a 95% word recognition accuracy standard for determining student instructional level, (2) arbitrarily selecting a passage to represent the difficulty level of a basal reader, and (3) employing…

  3. Health service quality scale: Brazilian Portuguese translation, reliability and validity

    Science.gov (United States)

    2013-01-01

    Background The Health Service Quality Scale is a multidimensional hierarchical scale that is based on interdisciplinary approach. This instrument was specifically created for measuring health service quality based on marketing and health care concepts. The aim of this study was to translate and culturally adapt the Health Service Quality Scale into Brazilian Portuguese and to assess the validity and reliability of the Brazilian Portuguese version of the instrument. Methods We conducted a cross-sectional, observational study, with public health system patients in a Brazilian university hospital. Validity was assessed using Pearson’s correlation coefficient to measure the strength of the association between the Brazilian Portuguese version of the instrument and the SERVQUAL scale. Internal consistency was evaluated using Cronbach’s alpha coefficient; the intraclass (ICC) and Pearson’s correlation coefficients were used for test-retest reliability. Results One hundred and sixteen consecutive postoperative patients completed the questionnaire. Pearson’s correlation coefficient for validity was 0.20. Cronbach's alpha for the first and second administrations of the final version of the instrument were 0.982 and 0.986, respectively. For test-retest reliability, Pearson’s correlation coefficient was 0.89 and ICC was 0.90. Conclusions The culturally adapted, Brazilian Portuguese version of the Health Service Quality Scale is a valid and reliable instrument to measure health service quality. PMID:23327598

  4. Reliability and validity of logotest among Nigerian population ...

    African Journals Online (AJOL)

    In facilitating cross-cultural study in the field of psychology and Logotherapy, the reliability and validity of the logotest which measures inner meaning fulfillment was carried out among 885 University of Ibadan students, 439 males and 434 females, aged between 15 and 60 years old with mean X age of 6.0. Data analyses ...

  5. Reliability and validity of a treatment fidelity assessment for motivational interviewing targeting sexual risk behaviors in people living with HIV/AIDS.

    Science.gov (United States)

    Seng, Elizabeth K; Lovejoy, Travis I

    2013-12-01

    This study psychometrically evaluates the Motivational Interviewing Treatment Integrity Code (MITI) to assess fidelity to motivational interviewing to reduce sexual risk behaviors in people living with HIV/AIDS. 74 sessions from a pilot randomized controlled trial of motivational interviewing to reduce sexual risk behaviors in people living with HIV were coded with the MITI. Participants reported sexual behavior at baseline, 3-month, and 6-months. Regarding reliability, excellent inter-rater reliability was achieved for measures of behavior frequency across the 12 sessions coded by both coders; global scales demonstrated poor intraclass correlations, but adequate percent agreement. Regarding validity, principle components analyses indicated that a two-factor model accounted for an adequate amount of variance in the data. These factors were associated with decreases in sexual risk behaviors after treatment. The MITI is a reliable and valid measurement of treatment fidelity for motivational interviewing targeting sexual risk behaviors in people living with HIV/AIDS.

  6. Validity and reliability of the NAB Naming Test.

    Science.gov (United States)

    Sachs, Bonnie C; Rush, Beth K; Pedraza, Otto

    2016-05-01

    Confrontation naming is commonly assessed in neuropsychological practice, but few standardized measures of naming exist and those that do are susceptible to the effects of education and culture. The Neuropsychological Assessment Battery (NAB) Naming Test is a 31-item measure used to assess confrontation naming. Despite adequate psychometric information provided by the test publisher, there has been limited independent validation of the test. In this study, we investigated the convergent and discriminant validity, internal consistency, and alternate forms reliability of the NAB Naming Test in a sample of adults (Form 1: n = 247, Form 2: n = 151) clinically referred for neuropsychological evaluation. Results indicate adequate-to-good internal consistency and alternate forms reliability. We also found strong convergent validity as demonstrated by relationships with other neurocognitive measures. We found preliminary evidence that the NAB Naming Test demonstrates a more pronounced ceiling effect than other commonly used measures of naming. To our knowledge, this represents the largest published independent validation study of the NAB Naming Test in a clinical sample. Our findings suggest that the NAB Naming Test demonstrates adequate validity and reliability and merits consideration in the test arsenal of clinical neuropsychologists.

  7. Validity and Reliability of Agoraphobic Cognitions Questionnaire-Turkish Version

    Directory of Open Access Journals (Sweden)

    Ayşegül KART

    2013-11-01

    Full Text Available Validity and Reliability of Agoraphobic Cognitions Questionnaire-Turkish Version Objective: The aim of this study is to investigate the validity and reliability of Agoraphobic Cognitions Questionnaire -Turkish Version (ACQ. Method: ACQ was administered to 92 patients with agoraphobia or panic disorder with agoraphobia. BSQ Turkish version completed by translation, back-translation and pilot assessment. Reliability of ACQ was analyzed by test-retest correlation, split-half technique, Cronbach’s alpha coefficient. Construct validity was evaluated by factor analysis after the Kaiser-Meyer-Olkin (KMO and Bartlett test had been performed. Principal component analysis and varimax rotation used for factor analysis. Results: 64% of patients evaluated in the study were female and 36% were male. Age interval was between 18 and 58, mean age was 31.5±10.4. The Cronbach’s alpha coefficient was 0.91. Analysis of test-retest evaluations revealed that there were statistically significant correlations ranging between 24% and 84% concerning questionnaire components. In analysis performed by split-half method reliability coefficients of half questionnaires were found as 0.77 and 0.91. Again Spearmen-Brown coefficient was found as 0.87 by the same analysis. To assess construct validity of ACQ, factor analysis was performed and two basic factors found. These two factors explained 57.6% of the total variance. (Factor 1: 34.6%, Factor 2: 23% Conclusion: Our findings support that ACQ-Turkish version had a satisfactory level of reliability and validity

  8. Validity evidence and reliability of a simulated patient feedback instrument.

    Science.gov (United States)

    Schlegel, Claudia; Woermann, Ulrich; Rethans, Jan-Joost; van der Vleuten, Cees

    2012-01-27

    In the training of healthcare professionals, one of the advantages of communication training with simulated patients (SPs) is the SP's ability to provide direct feedback to students after a simulated clinical encounter. The quality of SP feedback must be monitored, especially because it is well known that feedback can have a profound effect on student performance. Due to the current lack of valid and reliable instruments to assess the quality of SP feedback, our study examined the validity and reliability of one potential instrument, the 'modified Quality of Simulated Patient Feedback Form' (mQSF). Content validity of the mQSF was assessed by inviting experts in the area of simulated clinical encounters to rate the importance of the mQSF items. Moreover, generalizability theory was used to examine the reliability of the mQSF. Our data came from videotapes of clinical encounters between six simulated patients and six students and the ensuing feedback from the SPs to the students. Ten faculty members judged the SP feedback according to the items on the mQSF. Three weeks later, this procedure was repeated with the same faculty members and recordings. All but two items of the mQSF received importance ratings of > 2.5 on a four-point rating scale. A generalizability coefficient of 0.77 was established with two judges observing one encounter. The findings for content validity and reliability with two judges suggest that the mQSF is a valid and reliable instrument to assess the quality of feedback provided by simulated patients.

  9. Assessing movement quality in persons with severe mental illness - Reliability and validity of the Body Awareness Scale Movement Quality and Experience.

    Science.gov (United States)

    Hedlund, Lena; Gyllensten, Amanda Lundvik; Waldegren, Tomas; Hansson, Lars

    2016-05-01

    Motor disturbances and disturbed self-recognition are common features that affect mobility in persons with schizophrenia spectrum disorder and bipolar disorder. Physiotherapists in Scandinavia assess and treat movement difficulties in persons with severe mental illness. The Body Awareness Scale Movement Quality and Experience (BAS MQ-E) is a new and shortened version of the commonly used Body Awareness Scale-Health (BAS-H). The purpose of this study was to investigate the inter-rater reliability and the concurrent validity of BAS MQ-E in persons with severe mental illness. The concurrent validity was examined by investigating the relationships between neurological soft signs, alexithymia, fatigue, anxiety, and mastery. Sixty-two persons with severe mental illness participated in the study. The results showed a satisfactory inter-rater reliability (n = 53) and a concurrent validity (n = 62) with neurological soft signs, especially cognitive and perceptual based signs. There was also a concurrent validity linked to physical fatigue and aspects of alexithymia. The scores of BAS MQ-E were in general higher for persons with schizophrenia compared to persons with other diagnoses within the schizophrenia spectrum disorders and bipolar disorder. The clinical implications are presented in the discussion.

  10. Reliability and validity of the Parenting Scale of Inconsistency.

    Science.gov (United States)

    Yoshizumi, Takahiro; Murase, Satomi; Murakami, Takashi; Takai, Jiro

    2006-08-01

    The purposes of the present study were to develop a Parenting Scale of Inconsistency and to evaluate its initial reliability and validity. The 12 items assess the inconsistency among parents' moods, behaviors, and attitudes toward children. In the primary study, 517 participants completed three measures: the new Parenting Scale of Inconsistency, the Parental Bonding Instrument, and the Depression Scale of the General Health Questionnaire. The Parenting Scale of Inconsistency had good test-retest reliability of .85 and internal consistency of .88 (Cronbach coefficient alpha). Construct validity was good as Inconsistency scores were significantly correlated with the Care and Overprotection scores of the Parental Bonding Instrument and with the Depression scores. Moreover, Inconsistency scores' relation with a dimension of parenting style distinct from Care and Overprotection suggested that the Parenting Scale of Inconsistency had factorial validity. This scale seems a potential measure for examining the relationships between inconsistent parenting and the mental health of children.

  11. Calf-raise senior: a new test for assessment of plantar flexor muscle strength in older adults: protocol, validity, and reliability.

    Science.gov (United States)

    André, Helô-Isa; Carnide, Filomena; Borja, Edgar; Ramalho, Fátima; Santos-Rocha, Rita; Veloso, António P

    2016-01-01

    This study aimed to develop a new field test protocol with a standardized measurement of strength and power in plantar flexor muscles targeted to functionally independent older adults, the calf-raise senior (CRS) test, and also evaluate its reliability and validity. Forty-one subjects aged 65 years and older of both sexes participated in five different cross-sectional studies: 1) pilot (n=12); 2) inter- and intrarater agreement (n=12); 3) construct (n=41); 4) criterion validity (n=33); and 5) test-retest reliability (n=41). Different motion parameters were compared in order to define a specifically designed protocol for seniors. Two raters evaluated each participant twice, and the results of the same individual were compared between raters and participants to assess the interrater and intrarater agreement. The validity and reliability studies involved three testing sessions that lasted 2 weeks, including a battery of functional fitness tests, CRS test in two occasions, accelerometry, and strength assessments in an isokinetic dynamometer. The CRS test presented an excellent test-retest reliability (intraclass correlation coefficient [ICC] =0.90, standard error of measurement =2.0) and interrater reliability (ICC =0.93-0.96), as well as a good intrarater agreement (ICC =0.79-0.84). Participants with better results in the CRS test were younger and presented higher levels of physical activity and functional fitness. A significant association between test results and all strength parameters (isometric, r =0.87, r 2 =0.75; isokinetic, r =0.86, r 2 =0.74; and rate of force development, r =0.77, r 2 =0.59) was shown. This study was successful in demonstrating that the CRS test can meet the scientific criteria of validity and reliability. The test can be a good indicator of ankle strength in older adults and proved to discriminate significantly between individuals with improved functionality and levels of physical activity.

  12. Reliable and valid assessment of competence in endoscopic ultrasonography and fine-needle aspiration for mediastinal staging of non-small cell lung cancer.

    Science.gov (United States)

    Konge, L; Vilmann, P; Clementsen, P; Annema, J T; Ringsted, C

    2012-10-01

    Fine-needle aspiration (FNA) guided by endoscopic ultrasonography (EUS) is important in mediastinal staging of non-small cell lung cancer (NSCLC). Training standards and implementation strategies of this technique are currently under discussion. The aim of this study was to explore the reliability and validity of a newly developed EUS Assessment Tool (EUSAT) designed to measure competence in EUS - FNA for mediastinal staging of NSCLC. A total of 30 patients with proven or suspected NSCLC underwent EUS - FNA for mediastinal staging by three trainees and three experienced physicians. Their performances were assessed prospectively by three experts in EUS under direct observation and again 2 months later in a blinded fashion using digital video-recordings. Based on the assessments, intra-rater reliability, inter-rater reliability, and construct validity were explored. The intra-rater reliability was good (Cronbach's α = 0.80), but comparison of results based on direct observations and blinded video-recordings indicated a significant bias favoring consultants (P = 0.022). Inter-rater reliability was very good (Cronbach's α = 0.93). However, one rater assessing five procedures or two raters each assessing four procedures were necessary to secure a generalizability coefficient of 0.80. The assessment tool demonstrated construct validity by discriminating between trainees and experienced physicians (P = 0.034). Competency in mediastinal staging of NSCLC using EUS and EUS - FNA can be assessed in a reliable and valid way using the EUSAT assessment tool. Measuring and defining competency and training requirements could improve EUS quality and benefit patient care. © Georg Thieme Verlag KG Stuttgart · New York.

  13. Assessing physiotherapists' communication skills for promoting patient autonomy for self-management: reliability and validity of the communication evaluation in rehabilitation tool.

    Science.gov (United States)

    Murray, Aileen; Hall, Amanda; Williams, Geoffrey C; McDonough, Suzanne M; Ntoumanis, Nikos; Taylor, Ian; Jackson, Ben; Copsey, Bethan; Hurley, Deirdre A; Matthews, James

    2018-02-27

    To assess the inter-rater reliability and concurrent validity of the Communication Evaluation in Rehabilitation Tool, which aims to externally assess physiotherapists competency in using Self-Determination Theory-based communication strategies in practice. Audio recordings of initial consultations between 24 physiotherapists and 24 patients with chronic low back pain in four hospitals in Ireland were obtained as part of a larger randomised controlled trial. Three raters, all of whom had Ph.Ds in psychology and expertise in motivation and physical activity, independently listened to the 24 audio recordings and completed the 18-item Communication Evaluation in Rehabilitation Tool. Inter-rater reliability between all three raters was assessed using intraclass correlation coefficients. Concurrent validity was assessed using Pearson's r correlations with a reference standard, the Health Care Climate Questionnaire. The total score for the Communication Evaluation in Rehabilitation Tool is an average of all 18 items. Total scores demonstrated good inter-rater reliability (Intraclass Correlation Coefficient (ICC) = 0.8) and concurrent validity with the Health Care Climate Questionnaire total score (range: r = 0.7-0.88). Item-level scores of the Communication Evaluation in Rehabilitation Tool identified five items that need improvement. Results provide preliminary evidence to support future use and testing of the Communication Evaluation in Rehabilitation Tool. Implications for Rehabilitation Promoting patient autonomy is a learned skill and while interventions exist to train clinicians in these skills there are no tools to assess how well clinicians use these skills when interacting with a patient. The lack of robust assessment has severe implications regarding both the fidelity of clinician training packages and resulting outcomes for promoting patient autonomy. This study has developed a novel measurement tool Communication Evaluation in Rehabilitation Tool and a

  14. Reliability and concurrent validity of the iPhone® Compass application to measure thoracic rotation range of motion (ROM) in healthy participants

    Science.gov (United States)

    Schram, Ben; Cox, Alistair J.; Anderson, Sarah L.; Keogh, Justin

    2018-01-01

    Background Several water-based sports (swimming, surfing and stand up paddle boarding) require adequate thoracic mobility (specifically rotation) in order to perform the appropriate activity requirements. The measurement of thoracic spine rotation is problematic for clinicians due to a lack of convenient and reliable measurement techniques. More recently, smartphones have been used to quantify movement in various joints in the body; however, there appears to be a paucity of research using smartphones to assess thoracic spine movement. Therefore, the aim of this study is to determine the reliability (intra and inter rater) and validity of the iPhone® app (Compass) when assessing thoracic spine rotation ROM in healthy individuals. Methods A total of thirty participants were recruited for this study. Thoracic spine rotation ROM was measured using both the current clinical gold standard, a universal goniometer (UG) and the Smart Phone Compass app. Intra-rater and inter-rater reliability was determined with a Intraclass Correlation Coefficient (ICC) and associated 95% confidence intervals (CI). Validation of the Compass app in comparison to the UG was measured using Pearson’s correlation coefficient and levels of agreement were identified with Bland–Altman plots and 95% limits of agreement. Results Both the UG and Compass app measurements both had excellent reproducibility for intra-rater (ICC 0.94–0.98) and inter-rater reliability (ICC 0.72–0.89). However, the Compass app measurements had higher intra-rater reliability (ICC = 0.96 − 0.98; 95% CI [0.93–0.99]; vs. ICC = 0.94 − 0.98; 95% CI [0.88–0.99]) and inter-rater reliability (ICC = 0.87 − 0.89; 95% CI [0.74–0.95] vs. ICC = 0.72 − 0.82; 95% CI [0.21–0.94]). A strong and significant correlation was found between the UG and the Compass app, demonstrating good concurrent validity (r = 0.835, p reliable tool for measuring thoracic spine rotation which produces greater

  15. A Systematic Review of the Reliability and Validity of Behavioural Tests Used to Assess Behavioural Characteristics Important in Working Dogs.

    Science.gov (United States)

    Brady, Karen; Cracknell, Nina; Zulch, Helen; Mills, Daniel Simon

    2018-01-01

    Working dogs are selected based on predictions from tests that they will be able to perform specific tasks in often challenging environments. However, withdrawal from service in working dogs is still a big problem, bringing into question the reliability of the selection tests used to make these predictions. A systematic review was undertaken aimed at bringing together available information on the reliability and predictive validity of the assessment of behavioural characteristics used with working dogs to establish the quality of selection tests currently available for use to predict success in working dogs. The search procedures resulted in 16 papers meeting the criteria for inclusion. A large range of behaviour tests and parameters were used in the identified papers, and so behaviour tests and their underpinning constructs were grouped on the basis of their relationship with positive core affect (willingness to work, human-directed social behaviour, object-directed play tendencies) and negative core affect (human-directed aggression, approach withdrawal tendencies, sensitivity to aversives). We then examined the papers for reports of inter-rater reliability, within-session intra-rater reliability, test-retest validity and predictive validity. The review revealed a widespread lack of information relating to the reliability and validity of measures to assess behaviour and inconsistencies in terminologies, study parameters and indices of success. There is a need to standardise the reporting of these aspects of behavioural tests in order to improve the knowledge base of what characteristics are predictive of optimal performance in working dog roles, improving selection processes and reducing working dog redundancy. We suggest the use of a framework based on explaining the direct or indirect relationship of the test with core affect.

  16. Preliminary findings on the reliability and validity of the Cantonese Birmingham Cognitive Screen in patients with acute ischemic stroke.

    Science.gov (United States)

    Pan, Xiaoping; Chen, Haobo; Bickerton, Wai-Ling; Lau, Johnny King Lam; Kong, Anthony Pak Hin; Rotshtein, Pia; Guo, Aihua; Hu, Jianxi; Humphreys, Glyn W

    2015-01-01

    There are no currently effective cognitive assessment tools for patients who have suffered stroke in the People's Republic of China. The Birmingham Cognitive Screen (BCoS) has been shown to be a promising tool for revealing patients' poststroke cognitive deficits in specific domains, which facilitates more individually designed rehabilitation in the long run. Hence we examined the reliability and validity of a Cantonese version BCoS in patients with acute ischemic stroke, in Guangzhou. A total of 98 patients with acute ischemic stroke were assessed with the Cantonese version of the BCoS, and an additional 133 healthy individuals were recruited as controls. Apart from the BCoS, the patients also completed a number of external cognitive tests, including the Montreal Cognitive Assessment Test (MoCA), Mini Mental State Examination (MMSE), Albert's cancellation test, the Rey-Osterrieth Complex Figure Test, and six gesture matching tasks. Cutoff scores for failing each subtest, ie, deficits, were computed based on the performance of the controls. The validity and reliability of the Cantonese BCoS were examined, as well as interrater and test-retest reliability. We also compared the proportions of cases being classified as deficits in controlled attention, memory, character writing, and praxis, between patients with and without spoken language impairment. Analyses showed high test-retest reliability and agreement across independent raters on the qualitative aspects of measurement. Significant correlations were observed between the subtests of the Cantonese BCoS and the other external cognitive tests, providing evidence for convergent validity of the Cantonese BCoS. The screen was also able to generate measures of cognitive functions that were relatively uncontaminated by the presence of aphasia. This study suggests good reliability and validity of the Cantonese version of the BCoS. The Cantonese BCoS is a very promising tool for the detection of cognitive problems in

  17. Reliability and validity of the workplace social distance scale.

    Science.gov (United States)

    Yoshii, Hatsumi; Mandai, Nozomu; Saito, Hidemitsu; Akazawa, Kouhei

    2014-10-29

    Self-stigma, defined by a negative attitude toward oneself combined with the consciousness of being a target of prejudice, is a critical problem for psychiatric patients. Self-stigma studies among psychiatric patients have indicated that high stigma is predictive of detrimental effects such as the delay of treatment and decreases in social participation in patients, and levels of self-stigma should be statistically evaluated. In this study, we developed the Workplace Social Distance Scale (WSDS), rephrasing the eight items of the Japanese version of the Social Distance Scale (SDSJ) to apply to the work setting in Japan. We examined the reliability and validity of the WSDS among 83 psychiatric patients. Factor analysis extracted three factors from the scale items: "work relations," "shallow relationships," and "employment." These factors are similar to the assessment factors of the SDSJ. Cronbach's alpha coefficient for the WSDS was 0.753. The split-half reliability for the WSDS was 0.801, indicating significant correlations. In addition, the WSDS was significantly correlated with the SDSJ. These findings suggest that the WSDS represents an approximation of self-stigma in the workplace among psychiatric patients. Our study assessed the reliability and validity of the WSDS for measuring self-stigma in Japan. Future studies should investigate the reliability and validity of the scale in other countries.

  18. Reliability and Validity of Athletes Disability Index Questionnaire.

    Science.gov (United States)

    Noormohammadpour, Pardis; Hosseini Khezri, Alireza; Farahbakhsh, Farzin; Mansournia, Mohammad Ali; Smuck, Matthew; Kordi, Ramin

    2018-03-01

    The purpose of this study was to evaluate validity and reliability of a new proposed questionnaire for assessment of functional disability in athletes with low back pain (LBP). Validity and reliability study. Elite athletes participating in different fields of sports. Participants were 165 male and female athletes (between 12 and 50 years old) with LBP. Athlete Disability Index (ADI) Questionnaire which is developed by the authors for assessing LBP-related disability in athletes, Oswestry Disability Index (ODI), and the Roland-Morris Disability Questionnaire (RDQ). Self-reported responses were collected regarding LBP-related disability through ADI, ODI, and RDQ. The test-retest reliability was strong, and intraclass correlation value ranged between 0.74 and 0.94. The Cronbach alpha coefficient value of 0.91 (P visual analog scale was r = 0.626 (P disability levels were mild in the large majority of subjects (91.5% and 86.0%, respectively). Alternatively, disability assessments by the ADI did not cluster at the mild level and ranged more broadly from mild to very high. The ADI is a reliable and valid instrument for assessing disability in athletes with LBP. Compared with the available LBP disability questionnaires used in the general population, ADI can more precisely stratify the disability levels of athletes due to LBP.

  19. Development of a Standardized Kalamazoo Communication Skills Assessment Tool for Radiologists: Validation, Multisource Reliability, and Lessons Learned.

    Science.gov (United States)

    Brown, Stephen D; Rider, Elizabeth A; Jamieson, Katherine; Meyer, Elaine C; Callahan, Michael J; DeBenedectis, Carolynn M; Bixby, Sarah D; Walters, Michele; Forman, Sara F; Varrin, Pamela H; Forbes, Peter; Roussin, Christopher J

    2017-08-01

    The purpose of this study was to develop and test a standardized communication skills assessment instrument for radiology. The Delphi method was used to validate the Kalamazoo Communication Skills Assessment instrument for radiology by revising and achieving consensus on the 43 items of the preexisting instrument among an interdisciplinary team of experts consisting of five radiologists and four nonradiologists (two men, seven women). Reviewers assessed the applicability of the instrument to evaluation of conversations between radiology trainees and trained actors portraying concerned parents in enactments about bad news, radiation risks, and diagnostic errors that were video recorded during a communication workshop. Interrater reliability was assessed by use of the revised instrument to rate a series of enactments between trainees and actors video recorded in a hospital-based simulator center. Eight raters evaluated each of seven different video-recorded interactions between physicians and parent-actors. The final instrument contained 43 items. After three review rounds, 42 of 43 (98%) items had an average rating of relevant or very relevant for bad news conversations. All items were rated as relevant or very relevant for conversations about error disclosure and radiation risk. Reliability and rater agreement measures were moderate. The intraclass correlation coefficient range was 0.07-0.58; mean, 0.30; SD, 0.13; and median, 0.30. The range of weighted kappa values was 0.03-0.47; mean, 0.23; SD, 0.12; and median, 0.22. Ratings varied significantly among conversations (χ 2 6 = 1186; p communication skills assessment instrument is highly relevant for radiology, having moderate interrater reliability. These findings have important implications for assessing the relational competencies of radiology trainees.

  20. Validity and Reliability of Baseline Testing in a Standardized Environment.

    Science.gov (United States)

    Higgins, Kathryn L; Caze, Todd; Maerlender, Arthur

    2017-08-11

    The Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) is a computerized neuropsychological test battery commonly used to determine cognitive recovery from concussion based on comparing post-injury scores to baseline scores. This model is based on the premise that ImPACT baseline test scores are a valid and reliable measure of optimal cognitive function at baseline. Growing evidence suggests that this premise may not be accurate and a large contributor to invalid and unreliable baseline test scores may be the protocol and environment in which baseline tests are administered. This study examined the effects of a standardized environment and administration protocol on the reliability and performance validity of athletes' baseline test scores on ImPACT by comparing scores obtained in two different group-testing settings. Three hundred-sixty one Division 1 cohort-matched collegiate athletes' baseline data were assessed using a variety of indicators of potential performance invalidity; internal reliability was also examined. Thirty-one to thirty-nine percent of the baseline cases had at least one indicator of low performance validity, but there were no significant differences in validity indicators based on environment in which the testing was conducted. Internal consistency reliability scores were in the acceptable to good range, with no significant differences between administration conditions. These results suggest that athletes may be reliably performing at levels lower than their best effort would produce. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  1. Learning Style Scales: a valid and reliable questionnaire

    Directory of Open Access Journals (Sweden)

    Abdolghani Abdollahimohammad

    2014-08-01

    Full Text Available Purpose: Learning-style instruments assist students in developing their own learning strategies and outcomes, in eliminating learning barriers, and in acknowledging peer diversity. Only a few psychometrically validated learning-style instruments are available. This study aimed to develop a valid and reliable learning-style instrument for nursing students. Methods: A cross-sectional survey study was conducted in two nursing schools in two countries. A purposive sample of 156 undergraduate nursing students participated in the study. Face and content validity was obtained from an expert panel. The LSS construct was established using principal axis factoring (PAF with oblimin rotation, a scree plot test, and parallel analysis (PA. The reliability of LSS was tested using Cronbach’s α, corrected item-total correlation, and test-retest. Results: Factor analysis revealed five components, confirmed by PA and a relatively clear curve on the scree plot. Component strength and interpretability were also confirmed. The factors were labeled as perceptive, solitary, analytic, competitive, and imaginative learning styles. Cronbach’s α was > 0.70 for all subscales in both study populations. The corrected item-total correlations were > 0.30 for the items in each component. Conclusion: The LSS is a valid and reliable inventory for evaluating learning style preferences in nursing students in various multicultural environments.

  2. Learning Style Scales: a valid and reliable questionnaire.

    Science.gov (United States)

    Abdollahimohammad, Abdolghani; Ja'afar, Rogayah

    2014-01-01

    Learning-style instruments assist students in developing their own learning strategies and outcomes, in eliminating learning barriers, and in acknowledging peer diversity. Only a few psychometrically validated learning-style instruments are available. This study aimed to develop a valid and reliable learning-style instrument for nursing students. A cross-sectional survey study was conducted in two nursing schools in two countries. A purposive sample of 156 undergraduate nursing students participated in the study. Face and content validity was obtained from an expert panel. The LSS construct was established using principal axis factoring (PAF) with oblimin rotation, a scree plot test, and parallel analysis (PA). The reliability of LSS was tested using Cronbach's α, corrected item-total correlation, and test-retest. Factor analysis revealed five components, confirmed by PA and a relatively clear curve on the scree plot. Component strength and interpretability were also confirmed. The factors were labeled as perceptive, solitary, analytic, competitive, and imaginative learning styles. Cronbach's α was >0.70 for all subscales in both study populations. The corrected item-total correlations were >0.30 for the items in each component. The LSS is a valid and reliable inventory for evaluating learning style preferences in nursing students in various multicultural environments.

  3. Distress Tolerance Scale: A Study of Reliability and Validity

    Directory of Open Access Journals (Sweden)

    Ahmet Emre SARGIN

    2012-11-01

    Full Text Available Objective: Distress Tolerance Scale (DTS is developed by Simons and Gaher in order to measure individual differences in the capacity of distress tolerance.The aim of this study is to assess the reliability and validity of the Turkish version of DTS. Method: One hundred and sixty seven university students (male=66, female=101 participated in this study. Beck Anxiety Inventory (BAI, State-trait Anxiety Inventory (STAI and Discomfort Intolerance Scale (DIS were used to determine the criterion validity. Construct validity was evaluated with factor analysis after the Kaiser-Meyer-Olkin (KMO and Barlett test had been performed. To assess the test-retest reliability, the scale was re-applied to 79 participants six weeks later. Results: To assess construct validity, factor analyses were performed using varimax principal components analysis with varimax rotation. While there were factors in the original study, our factor analysis resulted in three factors. Cronbach’s alpha coefficients for the entire scale and tolerance, regulation, self-efficacy subscales were .89, .90, .80 and .64 respectively. There were correlations at the level of 0.01 between the Trait Anxiety Inventory of STAI and BAI, and all the subscales of DTS and also between the State Anxiety Inventory and regulation subscale. Both of the subscales of DIS were correlated with the entire subscale and all the subscales except regulation at the level of 0.05.Test-retest reliability was statistically significant at the level of 0.01. Conclusion: Analysis demonstrated that DTS had a satisfactory level of reliability and validity in Turkish university students.

  4. Validity, Reliability, and Sensitivity of a Volleyball Intermittent Endurance Test.

    Science.gov (United States)

    Rodríguez-Marroyo, Jose A; Medina-Carrillo, Javier; García-López, Juan; Morante, Juan C; Villa, José G; Foster, Carl

    2017-03-01

    To analyze the concurrent and construct validity of a volleyball intermittent endurance test (VIET). The VIET's test-retest reliability and sensitivity to assess seasonal changes was also studied. During the preseason, 71 volleyball players of different competitive levels took part in this study. All performed the VIET and a graded treadmill test with gas-exchange measurement (GXT). Thirty-one of the players performed an additional VIET to analyze the test-retest reliability. To test the VIET's sensitivity, 28 players repeated the VIET and GXT at the end of their season. Significant (P volleyball players.

  5. Short-interval test-retest interrater reliability of the Dutch version of the structured clinical interview for DSM-IV personality disorders (SCID-II)

    NARCIS (Netherlands)

    Weertman, A; ArntZ, A; Dreessen, L; van Velzen, C; Vertommen, S

    2003-01-01

    This study examined the short-interval test-retest reliability of the Structured Clinical Interview (SCID-II: First, Spitzer, Gibbon, & Williams, 1995) for DSM-IV personality disorders (PDs). The SCID-II was administered to 69 in- and outpatients on two occasions separated by 1 to 6 weeks. The

  6. Inter-rater reliability of direct observations of the physical and psychosocial working conditions in eldercare: An evaluation in the DOSES project

    NARCIS (Netherlands)

    Karstad, K. (Kristina); Rugulies, R. (Reiner); Skotte, J. (Jørgen); Munch, P.K. (Pernille Kold); Greiner, B.A. (Birgit A.); Burdorf, A. (Alex); Søgaard, K. (Karen); A. Holtermann (Andreas)

    2018-01-01

    textabstractThe aim of the study was to develop and evaluate the reliability of the “Danish observational study of eldercare work and musculoskeletal disorders” (DOSES) observation instrument to assess physical and psychosocial risk factors for musculoskeletal disorders (MSD) in eldercare work.

  7. Validity and Reliability of the Achilles Tendon Total Rupture Score

    DEFF Research Database (Denmark)

    Ganestam, Ann; Barfod, Kristoffer; Klit, Jakob

    2013-01-01

    study was to validate a Danish translation of the ATRS. The ATRS was translated into Danish according to internationally adopted standards. Of 142 patients, 90 with previous rupture of the Achilles tendon participated in the validity study and 52 in the reliability study. The ATRS showed moderately......The best treatment of acute Achilles tendon rupture remains debated. Patient-reported outcome measures have become cornerstones in treatment evaluations. The Achilles tendon total rupture score (ATRS) has been developed for this purpose but requires additional validation. The purpose of the present...... = .07). The limits of agreement were ±18.53. A strong correlation was found between test and retest (intercorrelation coefficient .908); the standard error of measurement was 6.7, and the minimal detectable change was 18.5. The Danish version of the ATRS showed moderately strong criterion validity...

  8. The reliability and validity of using the urine dipstick test by patient self-assessment for urinary tract infection screening in spinal cord injury patients.

    Science.gov (United States)

    Duanngai, Krit; Sirasaporn, Patpiya; Ngaosinchai, Siriwan Surapaitoon

    2017-01-01

    The aim of this is to evaluate the reliability of the urine dipstick test by patients' self-assessment for urinary tract infection (UTI) screening and to determine the validity of urine dipstick test. Rehabilitation Department, Srinagarind Hospital, Thailand. A diagnostic study. This study compared the urine dipstick test (index test) with the National Institute on Disability and Rehabilitation Research (NIDRR) criteria (gold standard test) in spinal cord injury (SCI) patients. The urine dipstick test informed positive and negative results. Besides the NIDRR criteria classified as UTI and no UTI. The interrater reliability was measured in the sense of Kappa whereas the validity of urine dipstick test was reported in terms of sensitivity, specificity, positive likelihood ratio (LR) (+LR), negative LR (-LR), positive predictive value (PPV), and negative predictive value (NPV). Out of the 56 participants, the kappa of urine dipstick test for leukocyte esterase, nitrite, and combined leukocyte esterase and nitrite were 0.09, 0.21, and 0.52, respectively. The nitrite urine dipstick test showed the highest sensitivity (90%). The combined leukocyte esterase and nitrite urine dipstick test gave the highest specificity (87%), PPV (60%), NPV (93%), and +LR (5.63). The interrater reliability of combined leukocyte esterase and nitrite urine dipstick test was moderate agreement. The combined leukocyte esterase and nitrite urine dipstick test showed high level of both sensitivity and specificity. The combined leukocyte esterase and nitrite urine dipstick test should be promoted for patients' self-assessment for UTI screening in SCI patients.

  9. Development, reliability, and validity of the Posttraumatic Stress Disorder Interview for Vietnamese refugees: a diagnostic instrument for Vietnamese refugees.

    Science.gov (United States)

    Dao, Tam K; Poritz, Julia M P; Moody, Rachel P; Szeto, Kim

    2012-08-01

    The Posttraumatic Stress Disorder Interview for Vietnamese Refugees (PTSD-IVR) was created specifically to assess for the presence of current and lifetime history of premigration, migration, encampment, and postmigration traumas in Vietnamese refugees. The purpose of the present study was to describe the development of and investigate the interrater and test-retest reliability of the PTSD-IVR and its validity in relation to the diagnoses obtained from the Longitudinal, Expert, and All Data (LEAD; Spitzer, 1983) standard. Clinicians conducted the diagnosis process with 127 Vietnamese refugees using the LEAD standard and the PTSD-IVR. Assessment of the reliability and validity of the PTSD-IVR yielded good to excellent AUC (area under the receiver operating characteristic curve; .86, .87) and κ values (.66, .74) indicating the reliability of the PTSD-IVR and the agreement between the LEAD procedure and the PTSD-IVR. The results of the present study suggest that the PTSD-IVR performs successfully as a diagnostic instrument specifically created for Vietnamese refugees in their native language. Copyright © 2012 International Society for Traumatic Stress Studies.

  10. Validity and reliability of chronic tic disorder and obsessive-compulsive disorder diagnoses in the Swedish National Patient Register.

    Science.gov (United States)

    Rück, Christian; Larsson, K Johan; Lind, Kristina; Perez-Vigil, Ana; Isomura, Kayoko; Sariaslan, Amir; Lichtenstein, Paul; Mataix-Cols, David

    2015-06-22

    The usefulness of cases diagnosed in administrative registers for research purposes is dependent on diagnostic validity. This study aimed to investigate the validity and inter-rater reliability of recorded diagnoses of tic disorders and obsessive-compulsive disorder (OCD) in the Swedish National Patient Register (NPR). Chart review of randomly selected register cases and controls. 100 tic disorder cases and 100 OCD cases were randomly selected from the NPR based on codes from the International Classification of Diseases (ICD) 8th, 9th and 10th editions, together with 50 epilepsy and 50 depression control cases. The obtained psychiatric records were blindly assessed by 2 senior psychiatrists according to the criteria of the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) and ICD-10. Positive predictive value (PPV; cases diagnosed correctly divided by the sum of true positives and false positives). Between 1969 and 2009, the NPR included 7286 tic disorder and 24,757 OCD cases. The vast majority (91.3% of tic cases and 80.1% of OCD cases) are coded with the most recent ICD version (ICD-10). For tic disorders, the PPV was high across all ICD versions (PPV=89% in ICD-8, 86% in ICD-9 and 97% in ICD-10). For OCD, only ICD-10 codes had high validity (PPV=91-96%). None of the epilepsy or depression control cases were wrongly diagnosed as having tic disorders or OCD, respectively. Inter-rater reliability was outstanding for both tic disorders (κ=1) and OCD (κ=0.98). The validity and reliability of ICD codes for tic disorders and OCD in the Swedish NPR is generally high. We propose simple algorithms to further increase the confidence in the validity of these codes for epidemiological research. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  11. Reliability, Construct Validity and Interpretability of the Brazilian version of the Rapid Upper Limb Assessment (RULA) and Strain Index (SI).

    Science.gov (United States)

    Valentim, Daniela Pereira; Sato, Tatiana de Oliveira; Comper, Maria Luiza Caíres; Silva, Anderson Martins da; Boas, Cristiana Villas; Padula, Rosimeire Simprini

    There are very few observational methods for analysis of biomechanical exposure available in Brazilian-Portuguese. This study aimed to cross-culturally adapt and test the measurement properties of the Rapid Upper Limb Assessment (RULA) and Strain Index (SI). The cross-cultural adaptation and measurement properties test were established according to Beaton et al. and COSMIN guidelines, respectively. Several tasks that required static posture and/or repetitive motion of upper limbs were evaluated (n>100). The intra-raters' reliability for the RULA ranged from poor to almost perfect (k: 0.00-0.93), and SI from poor to excellent (ICC 2.1 : 0.05-0.99). The inter-raters' reliability was very poor for RULA (k: -0.12 to 0.13) and ranged from very poor to moderate for SI (ICC 2.1 : 0.00-0.53). The agreement was good for RULA (75-100% intra-raters, and 42.24-100% inter-raters) and to SI (EPM: -1.03% to 1.97%; intra-raters, and -0.17% to 1.51% inter-raters). The internal consistency was appropriate for RULA (α=0.88), and low for SI (α=0.65). Moderate construct validity were observed between RULA and SI, in wrist/hand-wrist posture (rho: 0.61) and strength/intensity of exertion (rho: 0.39). The adapted versions of the RULA and SI presented semantic and cultural equivalence for the Brazilian Portuguese. The RULA and SI had reliability estimates ranged from very poor to almost perfect. The internal consistency for RULA was better than the SI. The correlation between methods was moderate only of muscle request/movement repetition. Previous training is mandatory to use of observations methods for biomechanical exposure assessment, although it does not guarantee good reproducibility of these measures. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.

  12. The Trojan Lifetime Champions Health Survey: Development, Validity, and Reliability

    Science.gov (United States)

    Sorenson, Shawn C.; Romano, Russell; Scholefield, Robin M.; Schroeder, E. Todd; Azen, Stanley P.; Salem, George J.

    2015-01-01

    Context Self-report questionnaires are an important method of evaluating lifespan health, exercise, and health-related quality of life (HRQL) outcomes among elite, competitive athletes. Few instruments, however, have undergone formal characterization of their psychometric properties within this population. Objective To evaluate the validity and reliability of a novel health and exercise questionnaire, the Trojan Lifetime Champions (TLC) Health Survey. Design Descriptive laboratory study. Setting A large National Collegiate Athletic Association Division I university. Patients or Other Participants A total of 63 university alumni (age range, 24 to 84 years), including former varsity collegiate athletes and a control group of nonathletes. Intervention(s) Participants completed the TLC Health Survey twice at a mean interval of 23 days with randomization to the paper or electronic version of the instrument. Main Outcome Measure(s) Content validity, feasibility of administration, test-retest reliability, parallel-form reliability between paper and electronic forms, and estimates of systematic and typical error versus differences of clinical interest were assessed across a broad range of health, exercise, and HRQL measures. Results Correlation coefficients, including intraclass correlation coefficients (ICCs) for continuous variables and κ agreement statistics for ordinal variables, for test-retest reliability averaged 0.86, 0.90, 0.80, and 0.74 for HRQL, lifetime health, recent health, and exercise variables, respectively. Correlation coefficients, again ICCs and κ, for parallel-form reliability (ie, equivalence) between paper and electronic versions averaged 0.90, 0.85, 0.85, and 0.81 for HRQL, lifetime health, recent health, and exercise variables, respectively. Typical measurement error was less than the a priori thresholds of clinical interest, and we found minimal evidence of systematic test-retest error. We found strong evidence of content validity, convergent

  13. Validity and reliability of a novel 3D scanner for assessment of the shape and volume of amputees' residual limb models.

    Directory of Open Access Journals (Sweden)

    Elena Seminati

    Full Text Available Objective assessment methods to monitor residual limb volume following lower-limb amputation are required to enhance practitioner-led prosthetic fitting. Computer aided systems, including 3D scanners, present numerous advantages and the recent Artec Eva scanner, based on laser free technology, could potentially be an effective solution for monitoring residual limb volumes.The aim of this study was to assess the validity and reliability of the Artec Eva scanner (practical measurement against a high precision laser 3D scanner (criterion measurement for the determination of residual limb model shape and volume.Three observers completed three repeat assessments of ten residual limb models, using both the scanners. Validity of the Artec Eva scanner was assessed (mean percentage error <2% and Bland-Altman statistics were adopted to assess the agreement between the two scanners. Intra and inter-rater reliability (repeatability coefficient <5% of the Artec Eva scanner was calculated for measuring indices of residual limb model volume and shape (i.e. residual limb cross sectional areas and perimeters.Residual limb model volumes ranged from 885 to 4399 ml. Mean percentage error of the Artec Eva scanner (validity was 1.4% of the criterion volumes. Correlation coefficients between the Artec Eva and the Romer determined variables were higher than 0.9. Volume intra-rater and inter-rater reliability coefficients were 0.5% and 0.7%, respectively. Shape percentage maximal error was 2% at the distal end of the residual limb, with intra-rater reliability coefficients presenting the lowest errors (0.2%, both for cross sectional areas and perimeters of the residual limb models.The Artec Eva scanner is a valid and reliable method for assessing residual limb model shapes and volumes. While the method needs to be tested on human residual limbs and the results compared with the current system used in clinical practice, it has the potential to quantify shape and volume

  14. Pain Assessment in Critically İll Adult Patients: Validity and Reliability Research of the Turkish Version of the Critical-Care Pain Observation Tool

    Directory of Open Access Journals (Sweden)

    Onur Gündoğan

    2016-12-01

    Full Text Available Objective: Critical-Care Pain Observation Tool (CPOT and the Behavioral Pain Scale (BPS are behavioral pain assessment scales for unconscious intensive care unit (ICU patients. The aim is to determine the validation and reliability of the CPOT in Turkish in mechanically ventilated adult ICU patients. Material and Method: This prospective observational cohort study included 50 mechanically ventilated mixed ICU patients who were unable to report pain. CPOT and BPS was translated into Turkish and language validity was performed by ten intensive care specialists. Pain was assessed in the course of painless and painful routine care procedures using the CPOT and the BPS by a resident and an intensivist concomitantly. Tests reliability, interrater reliability, and validity of the CPOT and the BPS were evaluated. Results: The mean age was 57.4 years and the mean APACHE II score was 18.7. A total of 100 assessments were recorded from 50 patients using CPOT and BPS. Scores of CPOT and BPS during the painful procedures were both significantly higher than painless procedures. The agreement between CPOT and BPS during painful and painless stimuli was ranged as; sensitivity 66.7%-90.3%; specificity 89.7%-97.9%; kappa value 0.712-0.892. The agreement between resident and intensivist during painful and painless stimuli was ranged from 97% to 100% and the kappa value was between 0.904 and 1.0. Conclusion: The Turkish version of the CPOT showed good correlation with the BPS. Interrater reliability between resident and intensivist was good. The study showed that the Turkish version of BPS and CPOT are reliable and valid tools to assess pain in daily clinical practice for intubated and unconscious ICU patients who are mechanically ventilated.

  15. The reliability and validity of the Saliba Postural Classification System.

    Science.gov (United States)

    Collins, Cristiana Kahl; Johnson, Vicky Saliba; Godwin, Ellen M; Pappas, Evangelos

    2016-07-01

    To determine the reliability and validity of the Saliba Postural Classification System (SPCS). Two physical therapists classified pictures of 100 volunteer participants standing in their habitual posture for inter and intra-tester reliability. For validity, 54 participants stood on a force plate in a habitual and a corrected posture, while a vertical force was applied through the shoulders until the clinician felt a postural give. Data were extracted at the time the give was felt and at a time in the corrected posture that matched the peak vertical ground reaction force (VGRF) in the habitual posture. Inter-tester reliability demonstrated 75% agreement with a Kappa = 0.64 (95% CI = 0.524-0.756, SE = 0.059). Intra-tester reliability demonstrated 87% agreement with a Kappa = 0.8, (95% CI = 0.702-0.898, SE = 0.05) and 80% agreement with a Kappa = 0.706, (95% CI = 0.594-0818, SE = 0.057). The examiner applied a significantly higher (p < 0.001) peak vertical force in the corrected posture prior to a postural give when compared to the habitual posture. Within the corrected posture, the %VGRF was higher when the test was ongoing vs. when a postural give was felt (p < 0.001). The %VGRF was not different between the two postures when comparing the peaks (p = 0.214). The SPCS has substantial agreement for inter- and intra-tester reliability and is largely a valid postural classification system as determined by the larger vertical forces in the corrected postures. Further studies on the correlation between the SPCS and diagnostic classifications are indicated.

  16. Reliability and Validity Assessment of a Linear Position Transducer

    Directory of Open Access Journals (Sweden)

    Manuel V. Garnacho-Castaño

    2015-03-01

    Full Text Available The objectives of the study were to determine the validity and reliability of peak velocity (PV, average velocity (AV, peak power (PP and average power (AP measurements were made using a linear position transducer. Validity was assessed by comparing measurements simultaneously obtained using the Tendo Weightlifting Analyzer Systemi and T-Force Dynamic Measurement Systemr (Ergotech, Murcia, Spain during two resistance exercises, bench press (BP and full back squat (BS, performed by 71 trained male subjects. For the reliability study, a further 32 men completed both lifts using the Tendo Weightlifting Analyzer Systemz in two identical testing sessions one week apart (session 1 vs. session 2. Intraclass correlation coefficients (ICCs indicating the validity of the Tendo Weightlifting Analyzer Systemi were high, with values ranging from 0.853 to 0.989. Systematic biases and random errors were low to moderate for almost all variables, being higher in the case of PP (bias ±157.56 W; error ±131.84 W. Proportional biases were identified for almost all variables. Test-retest reliability was strong with ICCs ranging from 0.922 to 0.988. Reliability results also showed minimal systematic biases and random errors, which were only significant for PP (bias -19.19 W; error ±67.57 W. Only PV recorded in the BS showed no significant proportional bias. The Tendo Weightlifting Analyzer Systemi emerged as a reliable system for measuring movement velocity and estimating power in resistance exercises. The low biases and random errors observed here (mainly AV, AP make this device a useful tool for monitoring resistance training.

  17. Reliability and validity of internalized stigmatization scale in psoriasis

    OpenAIRE

    Erkan Alpsoy; Yeşim Şenol; Aslı Bilgiç Temel; G. Özge Baysal; Ayşe Akman Karakaş

    2015-01-01

    Backround and design. Internalized stigma involves endorsing negative feelings and beliefs such as insignificance, shame and withdrawal triggered by applying these negative stereotypes to one self. Internalized Stigma Scale has not been applied to psoriasis patients. We aimed to evaluate the reliability and validity of Internalized Stigma Scale in psoriasis patients. Materials and Methods. 100 consecutive, volunteer psoriasis patients (48 female, 52 male; aged, 40.59±15.44 years) were enro...

  18. Reliability, Validity and Factor Structure of Drug Abuse Screening Test

    OpenAIRE

    Sayed Hadi Sayed Alitabar; Mojtaba Habibi; Maryam Falahatpisheh; Musa Arvin

    2016-01-01

    Background and Objective: According to the increasing of substance use in the country, more researches about this phenomenon are necessary. This Study Investigates the Validity, Reliability and Confirmatory Factor Structure of the Drug Abuse Screening test (DAST). Materials and Methods: The Sample Consisted of 381 Patients (143 Women and 238 Men) with a Multi-Stage Cluster Sampling of Areas 2, 6 and 12 of Tehran Were Selected from Each Region, 6 Randomly Selected Drug Rehabilitation Center. T...

  19. The birth satisfaction scale: Turkish adaptation, validation and reliability study

    Science.gov (United States)

    Cetin, Fatma Cosar; Sezer, Ayse; Merih, Yeliz Dogan

    2015-01-01

    OBJECTIVE: The objective of this study is to investigate the validity and the reliability of Birth Satisfaction Scale (BSS) and to adapt it into the Turkish language. This scale is used for measuring maternal satisfaction with birth in order to evaluate women’s birth perceptions. METHODS: In this study there were 150 women who attended to inpatient postpartum clinic. The participants filled in an information form and the BSS questionnaire forms. The properties of the scale were tested by conducting reliability and validation analyses. RESULTS: BSS entails 30 Likert-type questions. It was developed by Hollins Martin and Fleming. Total scale scores ranged between 30–150 points. Higher scores from the scale mean increases in birth satisfaction. Three overarching themes were identified in Scale: service provision (home assessment, birth environment, support, relationships with health care professionals); personal attributes (ability to cope during labour, feeling in control, childbirth preparation, relationship with baby); and stress experienced during labour (distress, obstetric injuries, receiving sufficient medical care, obstetric intervention, pain, prolonged labour and baby’s health). Cronbach’s alfa coefficient was 0.62. CONCLUSION: According to the present study, BSS entails 30 Likert-type questions and evaluates women’s birth perceptions. The Turkish version of BSS has been proven to be a valid and a reliable scale. PMID:28058355

  20. Mammography image assessment; validity and reliability of current scheme

    International Nuclear Information System (INIS)

    Hill, C.; Robinson, L.

    2015-01-01

    Mammographers currently score their own images according to criteria set out by Regional Quality Assurance. The criteria used are based on the ‘Perfect, Good, Moderate, Inadequate’ (PGMI) marking criteria established by the National Health Service Breast Screening Programme (NHSBSP) in their Quality Assurance Guidelines of 2006 1 . This document discusses the validity and reliability of the current mammography image assessment scheme. Commencing with a critical review of the literature this document sets out to highlight problems with the national approach to the use of marking schemes. The findings suggest that ‘PGMI’ scheme is flawed in terms of reliability and validity and is not universally applied across the UK. There also appear to be differences in schemes used by trainees and qualified mammographers. Initial recommendations are to be made in collaboration with colleagues within the National Health Service Breast Screening Programme (NHSBSP), Higher Education Centres, College of Radiographers and the Royal College of Radiologists in order to identify a mammography image appraisal scheme that is fit for purpose. - Highlights: • Currently no robust evidence based marking tools in use for the assessment of images in mammography. • Is current system valid, reliable and robust? • How can the current image assessment tool be improved? • Should students and qualified mammographers use the same tool? • What marking criteria are available for image assessment?

  1. Assessment of the nursing care product (APROCENF): a reliability and construct validity study.

    Science.gov (United States)

    Cucolo, Danielle Fabiana; Perroca, Márcia Galan

    2017-04-06

    to verify the reliability and construct validity estimates of the "Assessment of nursing care product" scale (APROCENF) and its applicability. this validation study included a sample of 40 (inter-rater reliability) and 172 (construct validity) assessments performed by nurses at the end of the work shift at nine inpatient services of a teaching hospital in the Brazilian Southeast. The data were collected between February and September/2014 with interruptions. Cronbach's alpha and Spearman's correlation coefficients were calculated, as well as the intraclass correlation and the weighted kappa index (inter-rater reliability). Exploratory factor analysis was used with principal component extraction and varimax rotation (construct validity). the internal consistency revealed an alpha coefficient of 0.85, item-item correlation ranging between 0.13 and 0.61 and item-total correlation between 0.43 and 0.69. Inter-rater equivalence was obtained and all items evidenced significant factor loadings. this research evidenced the reliability and construct validity of the scale to assess the nursing care product. Its application in nursing practice permits identifying improvements needed in the production process, contributing to management and care decisions. verificar as estimativas de confiabilidade e validade de construto da escala "Avaliação do produto do cuidar em enfermagem" (APROCENF) e sua aplicabilidade. este estudo de validação incluiu em sua amostra 40 (confiabilidade interavaliadores) e 172 (validade de construto) avaliações realizadas por enfermeiros ao final do turno de trabalho em nove unidades de internação de um hospital universitário do sudeste brasileiro. A coleta de dados ocorreu entre fevereiro e setembro de 2014 de forma interrupta. Foram calculados os coeficientes alfa de Cronbach e correlação de Spearman (consistência interna), a correlação intraclasse e Kappa ponderado (confiabilidade interavaliadores) e a análise fatorial exploratória foi

  2. Emergency Severity Index version 4: a valid and reliable tool in pediatric emergency department triage.

    Science.gov (United States)

    Green, Nicole A; Durani, Yamini; Brecher, Deena; DePiero, Andrew; Loiselle, John; Attia, Magdy

    2012-08-01

    The Emergency Severity Index version 4 (ESI v.4) is the most recently implemented 5-level triage system. The validity and reliability of this triage tool in the pediatric population have not been extensively established. The goals of this study were to assess the validity of ESI v.4 in predicting hospital admission, emergency department (ED) length of stay (LOS), and number of resources utilized, as well as its reliability in a prospective cohort of pediatric patients. The first arm of the study was a retrospective chart review of 780 pediatric patients presenting to a pediatric ED to determine the validity of ESI v.4. Abstracted data included acuity level assigned by the triage nurse using ESI v.4 algorithm, disposition (admission vs discharge), LOS, and number of resources utilized in the ED. To analyze the validity of ESI v.4, patients were divided into 2 groups for comparison: higher-acuity patients (ESI levels 1, 2, and 3) and lower-acuity patients (ESI levels 4 and 5). Pearson χ analysis was performed for categorical variables. For continuous variables, we conducted a comparison of means based on parametric distribution of variables. The second arm was a prospective cohort study to determine the interrater reliability of ESI v.4 among and between pediatric triage (PT) nurses and pediatric emergency medicine (PEM) physicians. Three raters (2 PT nurses and 1 PEM physician) independently assigned triage scores to 100 patients; k and interclass correlation coefficient were calculated among PT nurses and between the primary PT nurses and physicians. In the validity arm, the distribution of ESI score levels among the 780 cases are as follows: ESI 1: 2 (0.25%); ESI 2: 73 (9.4%); ESI 3: 289 (37%); ESI 4: 251 (32%); and ESI 5: 165 (21%). Hospital admission rates by ESI level were 1: 100%, 2: 42%, 3: 14.9%, 4: 1.2%, and 5: 0.6%. The admission rate of the higher-acuity group (76/364, 21%) was significantly greater than the lower-acuity group (4/415, 0.96%), P group was

  3. Validity and reliability of the Achilles tendon total rupture score.

    Science.gov (United States)

    Ganestam, Ann; Barfod, Kristoffer; Klit, Jakob; Troelsen, Anders

    2013-01-01

    The best treatment of acute Achilles tendon rupture remains debated. Patient-reported outcome measures have become cornerstones in treatment evaluations. The Achilles tendon total rupture score (ATRS) has been developed for this purpose but requires additional validation. The purpose of the present study was to validate a Danish translation of the ATRS. The ATRS was translated into Danish according to internationally adopted standards. Of 142 patients, 90 with previous rupture of the Achilles tendon participated in the validity study and 52 in the reliability study. The ATRS showed moderately strong correlations with the physical subscores of the Medical Outcomes Study 36-item Short-Form Health Survey (r = .70 to .75; p questionnaire (r = .71; p validity. For study and follow-up purposes, the ATRS seems reliable for comparisons of groups of patients. Its usability is limited for repeated assessment of individual patients. The development of analysis guidelines would be desirable. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  4. Validity and reliability testing of the Prenatal Psychosocial Profile.

    Science.gov (United States)

    Curry, M A; Campbell, R A; Christian, M

    1994-04-01

    Two studies of low-income pregnant women (N = 179) were done to examine the validity and reliability of the Prenatal Psychosocial Profile (PPP). The PPP, a composite of the Rosenberg Self-Esteem Scale, the Support Behaviors Inventory, and a newly developed measure of stress, is a brief, comprehensive clinical assessment of psychosocial risk during pregnancy. Construct validity of the stress scale was supported by theoretically predicted negative correlations with self-esteem, partner support, and support from others (N = 91). Convergent validity of the stress scale was demonstrated by a correlation of .71 with the Difficult Life Circumstances Scale. Adequate levels of internal consistency were found. Interrelationships between the four subscales were consistent with the underlying conceptualization, and there was beginning evidence of the factorial independence of the subscales.

  5. The Quality of Life Scale (QOLS: Reliability, Validity, and Utilization

    Directory of Open Access Journals (Sweden)

    Anderson Kathryn L

    2003-10-01

    Full Text Available Abstract The Quality of Life Scale (QOLS, created originally by American psychologist John Flanagan in the 1970's, has been adapted for use in chronic illness groups. This paper reviews the development and psychometric testing of the QOLS. A descriptive review of the published literature was undertaken and findings summarized in the frequently asked questions format. Reliability, content and construct validity testing has been performed on the QOLS and a number of translations have been made. The QOLS has low to moderate correlations with physical health status and disease measures. However, content validity analysis indicates that the instrument measures domains that diverse patient groups with chronic illness define as quality of life. The QOLS is a valid instrument for measuring quality of life across patient groups and cultures and is conceptually distinct from health status or other causal indicators of quality of life.

  6. Modified sphygmomanometer test for the assessment of strength of the trunk, upper and lower limbs muscles in subjects with subacute stroke: reliability and validity.

    Science.gov (United States)

    Aguiar, Larissa T; Lara, Eliza M; Martins, Julia C; Teixeira-Salmela, Luci F; Quintino, Ludmylla F; Christo, Paulo P; DE Morais Fairaa, Christina

    2016-10-01

    Limitations in activities have been related to weakness of the upper limbs (UL), lower limbs (LL) and trunk muscles after stroke. Therefore, the measurement of strength after stroke becomes essential. The Modified Sphygmomanometer Test (MST) is an alternative method for the measurement of strength, since it is cheap and provides objective values. However, no studies have investigated the measurement properties of the MST in sub-acute stroke. To investigate the test-retest and inter-rater reliabilities and criterion-related validity of the MST for the measurement of strength of the UL, LL, and trunk muscles in subjects with sub-acute stroke, and verify whether the number of trials would affect the results. Diagnostic accuracy. Local community, out-patient clinics, and university laboratory. Sixty- five subjects with sub-acute stroke (62±14 years) participated of the present study. The strength of 36 muscular groups was measured with the MST and dynamometers (criterion standard). To investigate whether the number of trials would affect the results, analysis of variance was applied. For the test-retest and inter-rater reliabilities and criterion-related validity of the MST, intra-class correlation coefficients (ICC), Pearson correlation coefficients, and coefficients of determination were calculated. Similar results were found for all muscular groups and number of trials (0.01≤F≤0.14; 0.87≤p≤0.99) with significant and adequate values of test-retest (0.57≤ICC≥0.98) (exception: first trial of the non-paretic ankle dorsiflexors) and inter-rater (0.50≤ICC≥0.99) (exception: non-paretic ankle plantar flexors) reliabilities and validity (0.70≤r≥0.95; p≤0.001). The values obtained with the MST were good predictors of those obtained with the dynamometers (0.54≤r2≤0.90). In general, the MST showed adequate reliabilities and criterion-related validity for measuring strength of subjects with sub-acute stroke, and only one trial, after familiarization

  7. Reliability

    OpenAIRE

    Condon, David; Revelle, William

    2017-01-01

    Separating the signal in a test from the irrelevant noise is a challenge for all measurement. Low test reliability limits test validity, attenuates important relationships, and can lead to regression artifacts. Multiple approaches to the assessment and improvement of reliability are discussed. The advantages and disadvantages of several different approaches to reliability are considered. Practical advice on how to assess reliability using open source software is provided.

  8. Reliability and Validity of the Temperament and Character Inventory

    Directory of Open Access Journals (Sweden)

    Mahboubeh Dadfar

    2010-10-01

    Full Text Available Objective: The Temperament and Character Inventory (TCI was developed to assess temperament including Novelty Seeking (NS, Harm Avoidance (HA, Reward Dependence (RD, Persistence (PS, and Character including Self-Directedness (SD, Cooperativeness (CO and Self Transcendence (ST dimensions of Cloninger's biopsychosocial model of personality in adults. The purpose of this study was to evaluate the reliability and validity of this inventory. Materials & Methods: In this validity test and standardization study, after translation of TCI into Farsi and back translation, the final form was prepared and administered to 220 students who were selected via simple sampling. Cronbach's alpha procedure and test-retest method was used to assess the reliability, and factor analysis of promax rotation was utilized to determine the validity of the inventory. Correlation of interscales and age with scales of TCI was calculated by Pearson correlation. A comparison of TCI scores between sex and also cross-cultural was down using independent t-test. Results: The alpha cofficients for the inventory ranged from 0.44 for the Persistence scale to 0.81 for the ST scale with a median 0f 0.68. The overall alpha cofficients for the whole inventory was 0.74. The Pearson correlation cofficient for the test-retest on 31 students after two months ranged from 0.53 for Novelty Seeking and Persistence to 0.82 for Harm Avoidance scales and from 0.24 for disorderliness vs regimentation (NS4 to 0.86 for fear of uncertainty vs self-confidene (HA2 subscales. The factor analysis showed six factors. Significant correlations were obtained between scales of Self–Directedness with Harm Avoidance (0.57, Self–Directedness with Cooperativeness (0.46. Conclusion: The current study confirms that Persian version of the Temperament and Character Inventory has satisfactory psychometric properties and acceptable reliability and validity for the use students of university population.

  9. Reliability and validity of the Mywellness Key physical activity monitor

    Directory of Open Access Journals (Sweden)

    Sieverdes JC

    2013-01-01

    Full Text Available John C Sieverdes,1 Eric E Wickel,2 Gregory A Hand,3 Marco Bergamin,4 Robert R Moran,5 Steven N Blair3,51Medical University of South Carolina, College of Nursing and Medicine, Charleson, SC, 2University of Tulsa, Exercise and Sport Science, Tulsa, OK, 3University of South Carolina, Department of Exercise Science, Division of Health Aspects of Physical Activity, Arnold School of Public Health, Columbia, SC, USA; 4University of Padova, Department of Medicine, Sports Medicine Division, Padova, Italy; 5University of South Carolina, Department of Epidemiology and Biostatistics, Arnold School of Public Health, Columbia, SC, USABackground: This study evaluated the reliability and criterion validity of the Mywellness Key accelerometer (MWK using treadmill protocols and indirect calorimetry.Methods: Twenty-five participants completed two four-stage 20-minute treadmill protocols while wearing two MWK accelerometers. Reliability was assessed using raw counts. Validity was assessed by comparing the estimated VO2 calculated from the MWK with values from respiratory gas exchange.Results: Good overall and point estimates of reliability were found for the MWK (all intraclass correlations > 0.93. Generalizability theory coefficients showed lower values for running speed (0.70 versus walking speed (all > 0.84, with the majority of the overall percentage of variability derived from the participant (68%–88% of the total 100%. Acceptable validity was found overall (Pearson’s r = 0.895–0.902, P < 0.0001, with an overall mean absolute error of 16.22% and a coefficient of variance of 16.92%. Bland-Altman plots showed an overestimation of energy expenditure during the running speed, but total kilocalories were underestimated during the protocol by approximately 10%.Conclusion: Good validity was found during light and moderate walking, while running was slightly overestimated. The MWK may be useful for clinicians and researchers interested in promotion or assessment

  10. Reliability and Validity of a Survey of Cat Caregivers on Their Cats’ Socialization Level in the Cat’s Normal Environment

    Directory of Open Access Journals (Sweden)

    Margaret Slater

    2013-12-01

    Full Text Available Stray cats routinely enter animal welfare organizations each year and shelters are challenged with determining the level of human socialization these cats may possess as quickly as possible. However, there is currently no standard process to guide this determination. This study describes the development and validation of a caregiver survey designed to be filled out by a cat’s caregiver so it accurately describes a cat’s personality, background, and full range of behavior with people when in its normal environment. The results from this survey provided the basis for a socialization score that ranged from unsocialized to well socialized with people. The quality of the survey was evaluated based on inter-rater and test-retest reliability and internal consistency and estimates of construct and criterion validity. In general, our results showed moderate to high levels of inter-rater (median of 0.803, range 0.211–0.957 and test-retest agreement (median 0.92, range 0.211–0.999. Cronbach’s alpha showed high internal consistency (0.962. Estimates of validity did not highlight any major shortcomings. This survey will be used to develop and validate an effective assessment process that accurately differentiates cats by their socialization levels towards humans based on direct observation of cats’ behavior in an animal shelter.

  11. Reliability, Validity and Factor Structure of Drug Abuse Screening Test

    Directory of Open Access Journals (Sweden)

    Sayed Hadi Sayed Alitabar

    2016-05-01

    Full Text Available Background and Objective: According to the increasing of substance use in the country, more researches about this phenomenon are necessary. This Study Investigates the Validity, Reliability and Confirmatory Factor Structure of the Drug Abuse Screening test (DAST. Materials and Methods: The Sample Consisted of 381 Patients (143 Women and 238 Men with a Multi-Stage Cluster Sampling of Areas 2, 6 and 12 of Tehran Were Selected from Each Region, 6 Randomly Selected Drug Rehabilitation Center. The DAST Was Used as Instrument. Divergent & Convergent Validity of this Scale Was Assessed with Problems Assessment for Substance Using Psychiatric Patients (PASUPP and Relapse Prediction Scale (RPS.Results: The DAST after the First Time Factor Structure of Using Confirmatory Factor Analysis Was Confirmed. The DAST Had a Good Internal Consistency (Cranach’s Alpha, and the Reliability of the Test Within a Week, 0.9, 0.8. Also this Scale Had a Positive Correlation with Problems Assessment for Substance Using Psychiatric Patients and Relapse Prediction Scale (P<0.01.Conclusion: The Overall Results Showed that the Drug Abuse Screening Test in Iranian Society Is Valid. It Can Be Said that Self-Report Scale Tool Is Useful for Research Purposes and Addiction.

  12. Autism detection in early childhood (ADEC): reliability and validity data for a Level 2 screening tool for autistic disorder.

    Science.gov (United States)

    Nah, Yong-Hwee; Young, Robyn L; Brewer, Neil; Berlingeri, Genna

    2014-03-01

    The Autism Detection in Early Childhood (ADEC; Young, 2007) was developed as a Level 2 clinician-administered autistic disorder (AD) screening tool that was time-efficient, suitable for children under 3 years, easy to administer, and suitable for persons with minimal training and experience with AD. A best estimate clinical Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV-TR; American Psychiatric Association, 2000) diagnosis of AD was made for 70 children using all available information and assessment results, except for the ADEC data. A screening study compared these children on the ADEC with 57 children with other developmental disorders and 64 typically developing children. Results indicated high internal consistency (α = .91). Interrater reliability and test-retest reliability of the ADEC were also adequate. ADEC scores reliably discriminated different diagnostic groups after controlling for nonverbal IQ and Vineland Adaptive Behavior Composite scores. Construct validity (using exploratory factor analysis) and concurrent validity using performance on the Autism Diagnostic Observation Schedule (Lord et al., 2000), the Autism Diagnostic Interview-Revised (Le Couteur, Lord, & Rutter, 2003), and DSM-IV-TR criteria were also demonstrated. Signal detection analysis identified the optimal ADEC cutoff score, with the ADEC identifying all children who had an AD (N = 70, sensitivity = 1.0) but overincluding children with other disabilities (N = 13, specificity ranging from .74 to .90). Together, the reliability and validity data indicate that the ADEC has potential to be established as a suitable and efficient screening tool for infants with AD. 2014 APA

  13. Validation and reliability of a modified sphygmomanometer for the assessment of handgrip strength in Parkinson´s disease

    Directory of Open Access Journals (Sweden)

    Soraia M. Silva

    2015-04-01

    Full Text Available BACKGROUND: Handgrip strength is currently considered a predictor of overall muscle strength and functional capacity. Therefore, it is important to find reliable and affordable instruments for this analysis, such as the modified sphygmomanometer test (MST. OBJECTIVES: To assess the concurrent criterion validity of the MST, to compare the MST with the Jamar dynamometer, and to analyze the reproducibility (i.e. reliability and agreement of the MST in individuals with Parkinson's disease (PD. METHOD: The authors recruited 50 subjects, 24 with PD (65.5±6.2 years of age and 26 healthy elderly subjects (63.4±7.2 years of age. The handgrip strength was measured using the Jamar dynamometer and modified sphygmomanometer. The concurrent criterion validity was analyzed using Pearson's correlation coefficient and a simple linear regression test. The reproducibility of the MST was evaluated with the coefficient of intra-class correlation (ICC2,1, the standard error of measurement (SEM, the minimal detectable change (MDC, and the Bland-Altman plot. For all of the analyses, α≤0.05 was considered a risk. RESULTS: There was a significant correlation of moderate magnitude (r≥0.45 between the MST and the Jamar dynamometer. The MST had excellent reliability (ICC2,1≥0.7. The SEM and the MDC were adequate; however, the Bland-Altman plot indicated an unsatisfactory interrater agreement. CONCLUSIONS: The MST exhibited adequate validity and excellent reliability and is, therefore, suitable for monitoring the handgrip strength in PD. However, if the goal is to compare the measurements between examiners, the authors recommend that the data be interpreted with caution.

  14. Development of the Modified Four Square Step Test and its reliability and validity in people with stroke.

    Science.gov (United States)

    Roos, Margaret A; Reisman, Darcy S; Hicks, Gregory; Rose, William; Rudolph, Katherine S

    2016-01-01

    Adults with stroke have difficulty avoiding obstacles when walking, especially when a time constraint is imposed. The Four Square Step Test (FSST) evaluates dynamic balance by requiring individuals to step over canes in multiple directions while being timed, but many people with stroke are unable to complete it. The purposes of this study were to (1) modify the FSST by replacing the canes with tape so that more persons with stroke could successfully complete the test and (2) examine the reliability and validity of the modified version. Fifty-five subjects completed the Modified FSST (mFSST) by stepping over tape in all four directions while being timed. The mFSST resulted in significantly greater numbers of subjects completing the test than the FSST (39/55 [71%] and 33/55 [60%], respectively) (p < 0.04). The test-retest, intrarater, and interrater reliability of the mFSST were excellent (intraclass correlation coefficient ranges: 0.81-0.99). Construct and concurrent validity of the mFSST were also established. The minimal detectable change was 6.73 s. The mFSST, an ideal measure of dynamic balance, can identify progress in people with stroke in varied settings and can be completed by a wide range of people with stroke in approximately 5 min with the use of minimal equipment (tape, stop watch).

  15. Workplace Bullying Scale: The Study of Validity and Reliability

    Directory of Open Access Journals (Sweden)

    Nizamettin Doğar

    2015-01-01

    Full Text Available The aim of this research is to adapt the Workplace Bullying Scale (Tınaz, Gök & Karatuna, 2013 to Albanian language and to examine its psychometric properties. The research was conducted on 386 person from different sectors of Albania. Results of exploratory and confirmatory factor analysis demonstrated that Albanian scale yielded 2 factors different from original form because of cultural differences. Internal consistency coefficients are,890 -,801 and split-half test reliability coefficients, 864 -,808. Comfirmatory Factor Analysis results change from,40 to,73. Corrected item-total correlations ranged,339 to,672 and according to t-test results differences between each item’s means of upper 27% and lower 27% points were significant. Thus Workplace Bullying Scale can be use as a valid and reliable instrument in social sciences in Albania.

  16. Reliability, validity, and minimal detectable change of the push-off test scores in assessing upper extremity weight-bearing ability.

    Science.gov (United States)

    Mehta, Saurabh P; George, Hannah R; Goering, Christian A; Shafer, Danielle R; Koester, Alan; Novotny, Steven

    2017-11-01

    Clinical measurement study. The push-off test (POT) was recently conceived and found to be reliable and valid for assessing weight bearing through injured wrist or elbow. However, further research with larger sample can lend credence to the preliminary findings supporting the use of the POT. This study examined the interrater reliability, construct validity, and measurement error for the POT in patients with wrist conditions. Participants with musculoskeletal (MSK) wrist conditions were recruited. The performance on the POT, grip isometric strength of wrist extensors was assessed. The shortened version of the Disabilities of the Arm, Shoulder and Hand and numeric pain rating scale were completed. The intraclass correlation coefficient assessed interrater reliability of the POT. Pearson correlation coefficients (r) examined the concurrent relationships between the POT and other measures. The standard error of measurement and the minimal detectable change at 90% confidence interval were assessed as measurement error and index of true change for the POT. A total of 50 participants with different elbow or wrist conditions (age: 48.1 ± 16.6 years) were included in this study. The results of this study strongly supported the interrater reliability (intraclass correlation coefficient: 0.96 and 0.93 for the affected and unaffected sides, respectively) of the POT in patients with wrist MSK conditions. The POT showed convergent relationships with the grip strength on the injured side (r = 0.89) and the wrist extensor strength (r = 0.7). The POT showed smaller standard error of measurement (1.9 kg). The minimal detectable change at 90% confidence interval for the POT was 4.4 kg for the sample. This study provides additional evidence to support the reliability and validity of the POT. This is the first study that provides the values for the measurement error and true change on the POT scores in patients with wrist MSK conditions. Further research should examine the

  17. Validity, Reliability, and Inertia of Four Different Temperature Capsule Systems.

    Science.gov (United States)

    Bongers, Coen C W G; Daanen, Hein A M; Bogerd, Cornelis P; Hopman, Maria T E; Eijsvogels, Thijs M H

    2018-01-01

    Telemetric temperature capsule systems are wireless, relatively noninvasive, and easily applicable in field conditions and have therefore great advantages for monitoring core body temperature. However, the accuracy and responsiveness of available capsule systems have not been compared previously. Therefore, the aim of this study was to examine the validity, reliability, and inertia characteristics of four ingestible temperature capsule systems (i.e., CorTemp, e-Celsius, myTemp, and VitalSense). Ten temperature capsules were examined for each system in a temperature-controlled water bath during three trials. The water bath temperature gradually increased from 33°C to 44°C in trials 1 and 2 to assess the validity and reliability, and from 36°C to 42°C in trial 3 to assess the inertia characteristics of the temperature capsules. A systematic difference between capsule and water bath temperature was found for CorTemp (0.077°C ± 0.040°C), e-Celsius (-0.081°C ± 0.055°C), myTemp (-0.003°C ± 0.006°C), and VitalSense (-0.017°C ± 0.023°C; P 0.05). Comparable inertia characteristics were found for CorTemp (25 ± 4 s), e-Celsius (21 ± 13 s), and myTemp (19 ± 2 s), whereas the VitalSense system responded more slowly (39 ± 6 s) to changes in water bath temperature (P inertia were observed between capsule systems, an excellent validity, test-retest reliability, and inertia was found for each system between 36°C and 44°C after removal of outliers.

  18. The scoring of arousal in sleep: reliability, validity, and alternatives.

    Science.gov (United States)

    Bonnet, Michael H; Doghramji, Karl; Roehrs, Timothy; Stepanski, Edward J; Sheldon, Stephen H; Walters, Arthur S; Wise, Merrill; Chesson, Andrew L

    2007-03-15

    The reliability and validity of EEG arousals and other types of arousal are reviewed. Brief arousals during sleep had been observed for many years, but the evolution of sleep medicine in the 1980s directed new attention to these events. Early studies at that time in animals and humans linked brief EEG arousals and associated fragmentation of sleep to daytime sleepiness and degraded performance. Increasing interest in scoring of EEG arousals led the ASDA to publish a scoring manual in 1992. The current review summarizes numerous studies that have examined scoring reliability for these EEG arousals. Validity of EEG arousals was explored by review of studies that empirically varied arousals and found deficits similar to those found after total sleep deprivation depending upon the rate and extent of sleep fragmentation. Additional data from patients with clinical sleep disorders prior to and after effective treatment has also shown a continuing relationship between reduction in pathology-related arousals and improved sleep and daytime function. Finally, many suggestions have been made to refine arousal scoring to include additional elements (e.g., CAP), change the time frame, or focus on other physiological responses such as heart rate or blood pressure changes. Evidence to support the reliability and validity of these measures is presented. It was concluded that the scoring of EEG arousals has added much to our understanding of the sleep process but that significant work on the neurophysiology of arousal needs to be done. Additional refinement of arousal scoring will provide improved insight into sleep pathology and recovery.

  19. Validity and Reliability of Nintendo Wii Fit Balance Scores

    Science.gov (United States)

    Wikstrom, Erik A.

    2012-01-01

    Context: Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. Objective: To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Design: Descriptive laboratory study. Setting: Sports medicine research laboratory. Patients or Other Participants: Forty-five recreationally active participants (age  =  27.0 ± 9.8 years, height  =  170.9 ± 9.2 cm, mass  =  72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Intervention(s): Participants completed a single-limb–stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Main Outcome Measure(s): Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. Results: All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC]  =  0.80) to poor (ICC  =  0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with

  20. Validity and reliability of Nintendo Wii Fit balance scores.

    Science.gov (United States)

    Wikstrom, Erik A

    2012-01-01

    Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Descriptive laboratory study. Sports medicine research laboratory. Forty-five recreationally active participants (age = 27.0 ± 9.8 years, height = 170.9 ± 9.2 cm, mass = 72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Participants completed a single-limb-stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC] = 0.80) to poor (ICC = 0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with scores ranging from fair (ICC = 0.74) to poor (ICC = 0.29). Wii Fit balance activity scores had poor concurrent validity relative to COP outcomes and SEBT

  1. Measuring older adults' sedentary time: reliability, validity, and responsiveness.

    Science.gov (United States)

    Gardiner, Paul A; Clark, Bronwyn K; Healy, Genevieve N; Eakin, Elizabeth G; Winkler, Elisabeth A H; Owen, Neville

    2011-11-01

    With evidence that prolonged sitting has deleterious health consequences, decreasing sedentary time is a potentially important preventive health target. High-quality measures, particularly for use with older adults, who are the most sedentary population group, are needed to evaluate the effect of sedentary behavior interventions. We examined the reliability, validity, and responsiveness to change of a self-report sedentary behavior questionnaire that assessed time spent in behaviors common among older adults: watching television, computer use, reading, socializing, transport and hobbies, and a summary measure (total sedentary time). In the context of a sedentary behavior intervention, nonworking older adults (n = 48, age = 73 ± 8 yr (mean ± SD)) completed the questionnaire on three occasions during a 2-wk period (7 d between administrations) and wore an accelerometer (ActiGraph model GT1M) for two periods of 6 d. Test-retest reliability (for the individual items and the summary measure) and validity (self-reported total sedentary time compared with accelerometer-derived sedentary time) were assessed during the 1-wk preintervention period, using Spearman (ρ) correlations and 95% confidence intervals (CI). Responsiveness to change after the intervention was assessed using the responsiveness statistic (RS). Test-retest reliability was excellent for television viewing time (ρ (95% CI) = 0.78 (0.63-0.89)), computer use (ρ (95% CI) = 0.90 (0.83-0.94)), and reading (ρ (95% CI) = 0.77 (0.62-0.86)); acceptable for hobbies (ρ (95% CI) = 0.61 (0.39-0.76)); and poor for socializing and transport (ρ < 0.45). Total sedentary time had acceptable test-retest reliability (ρ (95% CI) = 0.52 (0.27-0.70)) and validity (ρ (95% CI) = 0.30 (0.02-0.54)). Self-report total sedentary time was similarly responsive to change (RS = 0.47) as accelerometer-derived sedentary time (RS = 0.39). The summary measure of total sedentary time has good repeatability and modest validity and is

  2. Rating scales for dystonia in cerebral palsy: reliability and validity

    OpenAIRE

    Monbaliu, Elegast; Ortibus, Els; Roelens, F; Desloovere, Kaat; Declerck, Jan; Prinzie, Peter; De Cock, Paul; Feys, Hilde

    2010-01-01

    AIM: This study investigated the reliability and validity of the Barry-Albright Dystonia Scale (BADS), the Burke-Fahn-Marsden Movement Scale (BFMMS), and the Unified Dystonia Rating Scale (UDRS) in patients with bilateral dystonic cerebral palsy (CP). METHOD: Three raters independently scored videotapes of 10 patients (five males, five females; mean age 13 y 3 mo, SD 5 y 2 mo, range 5-22 y). One patient each was classified at levels I-IV in the Gross Motor Function Classification System a...

  3. DEVELOPING VISUAL PRESENTATION ATTITUDE RUBRIC: VALIDITY AND RELIABILITY STUDY

    OpenAIRE

    ATEŞ, Hatice KADIOĞLU; ADA, Sefer; BAYSAL, Z. Nurdan

    2015-01-01

    Abstract The aim of this study is to develop visual presentation attitude rubric which is valid and reliable for the 4th grade students. 218 students took part in this study from Engin Can Güre which located in Istanbul, Esenler. While preparing this assessment tool with 34 criterias , 6 university lecturers view have been taken who are experts in their field. The answer key sheet has 4 (likert )type options. The rubric has been first tested by Kaiser-Meyer Olkin and Bartletts tests an...

  4. The reliability and validity of the Tokyo Autistic Behaviour Scale.

    Science.gov (United States)

    Kurita, H; Miyake, Y

    1990-03-01

    The Tokyo Autistic Behavior Scale (TABS) consisting of 39 items provisionally grouped in four areas--interpersonal-social relationship, language-communication, habit-mannerism and others--is an instrument used by a child's caretaker to rate the child's autistic behaviors on a 3-point scale. Test-retest reliability was satisfactory (i.e., an r for a total score was .94). Among six DSM-III diagnostic groups, infantile autism showed a significantly higher total TABS score than the other five groups, and a taxonomic validity coefficient was .54. An r between total scores of the TABS and the Childhood Autism Rating Scale--Tokyo Version was .59. The area scores showed a lower validity than the total score. The TABS appears to be a useful instrument to assess autistic behavior.

  5. Feelings about culture scales: development, factor structure, reliability, and validity.

    Science.gov (United States)

    Maffini, Cara S; Wong, Y Joel

    2015-04-01

    Although measures of cultural identity, values, and behavior exist in the multicultural psychological literature, there is currently no measure that explicitly assesses ethnic minority individuals' positive and negative affect toward culture. Therefore, we developed 2 new measures called the Feelings About Culture Scale--Ethnic Culture and Feelings About Culture Scale--Mainstream American Culture and tested their psychometric properties. In 6 studies, we piloted the measures, conducted factor analyses to clarify their factor structure, and examined reliability and validity. The factor structure revealed 2 dimensions reflecting positive and negative affect for each measure. Results provided evidence for convergent, discriminant, criterion-related, and incremental validity as well as the reliability of the scales. The Feelings About Culture Scales are the first known measures to examine both positive and negative affect toward an individual's ethnic culture and mainstream American culture. The focus on affect captures dimensions of psychological experiences that differ from cognitive and behavioral constructs often used to measure cultural orientation. These measures can serve as a valuable contribution to both research and counseling by providing insight into the nuanced affective experiences ethnic minority individuals have toward culture. (c) 2015 APA, all rights reserved).

  6. The reliability, validity, sensitivity, specificity and predictive values of the Chinese version of the Rowland Universal Dementia Assessment Scale.

    Science.gov (United States)

    Chen, Chia-Wei; Chu, Hsin; Tsai, Chia-Fen; Yang, Hui-Ling; Tsai, Jui-Chen; Chung, Min-Huey; Liao, Yuan-Mei; Chi, Mei-Ju; Chou, Kuei-Ru

    2015-11-01

    The purpose of this study was to translate the Rowland Universal Dementia Assessment Scale into Chinese and to evaluate the psychometric properties (reliability and validity) and the diagnostic properties (sensitivity, specificity and predictive values) of the Chinese version of the Rowland Universal Dementia Assessment Scale. The accurate detection of early dementia requires screening tools with favourable cross-cultural linguistic and appropriate sensitivity, specificity, and predictive values, particularly for Chinese-speaking populations. This was a cross-sectional, descriptive study. Overall, 130 participants suspected to have cognitive impairment were enrolled in the study. A test-retest for determining reliability was scheduled four weeks after the initial test. Content validity was determined by five experts, whereas construct validity was established by using contrasted group technique. The participants' clinical diagnoses were used as the standard in calculating the sensitivity, specificity, positive predictive value and negative predictive value. The study revealed that the Chinese version of the Rowland Universal Dementia Assessment Scale exhibited a test-retest reliability of 0.90, an internal consistency reliability of 0.71, an inter-rater reliability (kappa value) of 0.88 and a content validity index of 0.97. Both the patients and healthy contrast group exhibited significant differences in their cognitive ability. The optimal cut-off points for the Chinese version of the Rowland Universal Dementia Assessment Scale in the test for mild cognitive impairment and dementia were 24 and 22, respectively; moreover, for these two conditions, the sensitivities of the scale were 0.79 and 0.76, the specificities were 0.91 and 0.81, the areas under the curve were 0.85 and 0.78, the positive predictive values were 0.99 and 0.83 and the negative predictive values were 0.96 and 0.91 respectively. The Chinese version of the Rowland Universal Dementia Assessment Scale

  7. Preliminary findings on the reliability and validity of the Cantonese Birmingham Cognitive Screen in patients with acute ischemic stroke

    Directory of Open Access Journals (Sweden)

    Pan X

    2015-09-01

    Full Text Available Xiaoping Pan,1,* Haobo Chen,1,2,* Wai-Ling Bickerton,2 Johnny King Lam Lau,2 Anthony Pak Hin Kong,3 Pia Rotshtein,2 Aihua Guo,1 Jianxi Hu,1 Glyn W Humphreys4 1Department of Neurology, Guangzhou First People’s Hospital, Guangzhou Medical University, Guangzhou, People’s Republic of China; 2School of Psychology, University of Birmingham, Birmingham, UK; 3Department of Communication Sciences and Disorders, University of Central Florida, Orlando, FL, USA; 4Department of Experimental Psychology, University of Oxford, Oxford, UK *These authors contributed equally to this work Background: There are no currently effective cognitive assessment tools for patients who have suffered stroke in the People’s Republic of China. The Birmingham Cognitive Screen (BCoS has been shown to be a promising tool for revealing patients’ poststroke cognitive deficits in specific domains, which facilitates more individually designed rehabilitation in the long run. Hence we examined the reliability and validity of a Cantonese version BCoS in patients with acute ischemic stroke, in Guangzhou.Method: A total of 98 patients with acute ischemic stroke were assessed with the Cantonese version of the BCoS, and an additional 133 healthy individuals were recruited as controls. Apart from the BCoS, the patients also completed a number of external cognitive tests, including the Montreal Cognitive Assessment Test (MoCA, Mini Mental State Examination (MMSE, Albert’s cancellation test, the Rey–Osterrieth Complex Figure Test, and six gesture matching tasks. Cutoff scores for failing each subtest, ie, deficits, were computed based on the performance of the controls. The validity and reliability of the Cantonese BCoS were examined, as well as interrater and test–retest reliability. We also compared the proportions of cases being classified as deficits in controlled attention, memory, character writing, and praxis, between patients with and without spoken language impairment

  8. Reliability and validity of two isometric squat tests.

    Science.gov (United States)

    Blazevich, Anthony J; Gill, Nicholas; Newton, Robert U

    2002-05-01

    The purpose of the present study was first to examine the reliability of isometric squat (IS) and isometric forward hack squat (IFHS) tests to determine if repeated measures on the same subjects yielded reliable results. The second purpose was to examine the relation between isometric and dynamic measures of strength to assess validity. Fourteen male subjects performed maximal IS and IFHS tests on 2 occasions and 1 repetition maximum (1-RM) free-weight squat and forward hack squat (FHS) tests on 1 occasion. The 2 tests were found to be highly reliable (intraclass correlation coefficient [ICC](IS) = 0.97 and ICC(IFHS) = 1.00). There was a strong relation between average IS and 1-RM squat performance, and between IFHS and 1-RM FHS performance (r(squat) = 0.77, r(FHS) = 0.76; p squat and FHS test performances (r squat and FHS test performance can be attributed to differences in the movement patterns of the tests

  9. Cultural Responsive Teaching Readiness Scale Validity and Reliability Study

    Directory of Open Access Journals (Sweden)

    Kasım KARATAŞ

    2017-12-01

    Full Text Available The aim of this research is to develop a measurement instrument that will determine the cultural responsive teaching readiness level of teacher candidates. The study group consisted of a total of 231 candidate teachers, of which 83 were males and 148 were females, who were attending their final year of class teacher education programs at various Turkish universities during the 2016-2017 education year. In the first phase, a 33-item draft form was presented to experts to be reviewed. Based on the feedback received, revisions were made and the final scale was applied to a group of 231 candidate teachers. In the analysis of the data obtained as the result of the application, Exploratory Factor Analysis (EFA was performed. The EFA produced 21 items within a two-factor structure as, “Personal Readiness” and “Professional Readiness.” It was observed that the sub-factors were components of the “cultural responsive teaching readiness” dimension, and that the goodness of fit measures obtained as a result of the First and Second Level Confirmatory Factor Analyzes (CFA were high. In addition, reliability coefficients were found to be high as a result of reliability measurements. With the help of these findings, this study concludes that the Cultural Responsive Teaching Readiness scale is both valid and reliable.

  10. Reliability and validity of the Haitian Creole PHQ-9.

    Science.gov (United States)

    Marc, Linda G; Henderson, Whitney R; Desrosiers, Astrid; Testa, Marcia A; Jean, Samuel E; Akom, Eniko Edit

    2014-12-01

    There is limited information on depression in Haitians and this is partly attributable to the absence of culturally and linguistically adapted measures for depression. To perform a psychometric evaluation of the Haitian-Creole version of the PHQ-9 administered to men who have sex with men (MSM) in the Republic of Haiti. This study uses a cross-sectional design and data are from the Integrated Behavioral and Biological HIV Survey (IBBS) for MSM in Haiti. Inclusion criteria required that participants be male, ≥ 18 years, report sexual relations with a male partner in the last 12 months, and lived in Haiti during the past 3 months. Respondent Driven Sampling was used for participant recruitment. A structured questionnaire was verbally administered in Haitian-Creole capturing information on sociodemographics, sexual behaviors, human immunodeficiency virus (HIV) status and depressive symptomatology using the PHQ-9. Psychometric analyses of the translated PHQ-9 assessed unidimensionality, factor structure, reliability, construct validity, and differential item functioning (DIF) across subgroups (age, educational level, sexual orientation and HIV status). In a study population of 1,028 MSM, the Haitian-Creole version of the PHQ-9 is unidimensional, has moderately high internal consistency reliability (α = 0.78), and shows evidence of construct validity where HIV-positive subjects have greater depression (p = 0.002). There is no evidence of DIF across age, education, sexual orientation or HIV status. HIV-positive MSM are twice as likely to screen positive for moderately severe and severe depressive symptoms compared to their HIV-negative counterparts. There is strong evidence for the psychometric adequacy of the translated PHQ-9 screening tool as a measure of depression with MSM in Haiti. Future research is necessary to examine the predictive validity of depression for subsequent health behaviors or clinical outcomes among Haitian MSM.

  11. The Validity and Reliability of Autism Behavior Checklist

    Directory of Open Access Journals (Sweden)

    Negin Yousefi

    2015-11-01

    Full Text Available  Objectives: The aim of this study was to evaluate the psychometric features of the Persian version of the Autism Behavior Checklist (ABC.  Method:The International Quality of Life Assessment (IQOLA approach was used to translate the English ABC into Persian. A total sample of 184 parents of children including 114 children with autism disorder (mean age =7.21, SD =1.65 and 70 typically developing children (mean age = 6.82, SD =1.75 completed the ABC. Internal consistency, test-retest reliability, concurrent and discriminant validity, and cut-off score were assessed. Results: The results of this study revealed that the Persian version of the ABC has an acceptable degree of internal consistency (.73. Test–retest comparisons using interclass correlation confirmed the instrument’s time stability (.83. The instrument’s concurrent validity with Gilliam Autism Rating Scale (GARS was verified; the correlation between total scores was .94. In the discriminant validity, the autism group had significantly higher scores compared to the normal group. Receiver Operating Characteristic (ROC analysis revealed that individuals with total scores below 25 are less likely to be in the autism group. Conclusion:The Persian version of the ABC can be used as an initial screening tool in clinical contexts.

  12. Ethical Implications of Validity-vs.-Reliability Trade-Offs in Educational Research

    Science.gov (United States)

    Fendler, Lynn

    2016-01-01

    In educational research that calls itself empirical, the relationship between validity and reliability is that of trade-off: the stronger the bases for validity, the weaker the bases for reliability (and vice versa). Validity and reliability are widely regarded as basic criteria for evaluating research; however, there are ethical implications of…

  13. The validity, reliability and normative scores of the parent, teacher and self report versions of the Strengths and Difficulties Questionnaire in China

    Directory of Open Access Journals (Sweden)

    Coghill David

    2008-04-01

    Full Text Available Abstract Background The Strengths and Difficulties Questionnaire (SDQ has become one of the most widely used measurement tools in child and adolescent mental health work across the globe. The SDQ was originally developed and validated within the UK and whilst its reliability and validity have been replicated in several countries important cross cultural issues have been raised. We describe normative data, reliability and validity of the Chinese translation of the SDQ (parent, teacher and self report versions in a large group of children from Shanghai. Methods The SDQ was administered to the parents and teachers of students from 12 of Shanghai's 19 districts, aged between 3 and 17 years old, and to those young people aged between 11 and 17 years. Retest data was collected from parents and teachers for 45 students six weeks later. Data was analysed to describe normative scores, bandings and cut-offs for normal, borderline and abnormal scores. Reliability was assessed from analyses of internal consistency, inter-rater agreement, and temporal stability. Structural validity, convergent and discriminant validity were assessed. Results Full parent and teacher data was available for 1965 subjects and self report data for 690 subjects. Normative data for this Chinese urban population with bandings and cut-offs for borderline and abnormal scores are described. Principle components analysis indicates partial agreement with the original five factored subscale structure however this appears to hold more strongly for the Prosocial Behaviour, Hyperactivity – Inattention and Emotional Symptoms subscales than for Conduct Problems and Peer Problems. Internal consistency as measured by Cronbach's α coefficient were generally low ranging between 0.30 and 0.83 with only parent and teacher Hyperactivity – Inattention and teacher Prosocial Behaviour subscales having α > 0.7. Inter-rater correlations were similar to those reported previously (range 0.23 – 0

  14. Reliability and Validity of the Inline Skating Skill Test

    Science.gov (United States)

    Radman, Ivan; Ruzic, Lana; Padovan, Viktoria; Cigrovski, Vjekoslav; Podnar, Hrvoje

    2016-01-01

    This study aimed to examine the reliability and validity of the inline skating skill test. Based on previous skating experience forty-two skaters (26 female and 16 male) were randomized into two groups (competitive level vs. recreational level). They performed the test four times, with a recovery time of 45 minutes between sessions. Prior to testing, the participants rated their skating skill using a scale from 1 to 10. The protocol included performance time measurement through a course, combining different skating techniques. Trivial changes in performance time between the repeated sessions were determined in both competitive females/males and recreational females/males (-1.7% [95% CI: -5.8–2.6%] – 2.2% [95% CI: 0.0–4.5%]). In all four subgroups, the skill test had a low mean within-individual variation (1.6% [95% CI: 1.2–2.4%] – 2.7% [95% CI: 2.1–4.0%]) and high mean inter-session correlation (ICC = 0.97 [95% CI: 0.92–0.99] – 0.99 [95% CI: 0.98–1.00]). The comparison of detected typical errors and smallest worthwhile changes (calculated as standard deviations × 0.2) revealed that the skill test was able to track changes in skaters’ performances. Competitive-level skaters needed shorter time (24.4–26.4%, all p skating skills in amateur competitive and recreational level skaters. Further studies are needed to evaluate the reproducibility of this skill test in different populations including elite inline skaters. Key points Study evaluated the reliability and construct validity of a newly developed inline skating skill test. Evaluated test is a first protocol designed to assess specific inline skating skill. Two groups of amateur skaters with different skating proficiency repeated the skill test in four separate occasions. The results suggest that evaluated test is reliable and valid to evaluate inline skating skill in amateur skaters. PMID:27803616

  15. Validity and reliability of eating disorder assessments used with athletes: A review

    Directory of Open Access Journals (Sweden)

    Zachary Pope

    2015-09-01

    Conclusion: Only seven studies calculated validity coefficients within the study whereas 47 cited the validity coefficient. Twenty-six calculated a reliability coefficient whereas 47 cited the reliability of the ED measures. Four studies found validity evidence for the EAT, EDI, BULIT-R, QEDD, and EDE-Q in an athlete population. Few studies reviewed calculated validity and reliability coefficients of ED measures. Cross-validation of these measures in athlete populations is clearly needed.

  16. Exploration of the (interrater) reliability and latent factor structure of the Alcohol Use Disorders Identification Test (AUDIT) and the Drug Use Disorders Identification Test (DUDIT) in a sample of Dutch probationers

    NARCIS (Netherlands)

    Noteborn, M.G.C.; Hildebrand, M.

    2015-01-01

    Background: The use of brief, reliable, valid, and practical measures of substance use is critical for conducting individual (risk and need) assessments in probation practice. In this exploratory study, the basic psychometric properties of the Alcohol Use Disorders Identification Test (AUDIT) and

  17. Validity and reliability of an application review process using dedicated reviewers in one stage of a multi-stage admissions model.

    Science.gov (United States)

    Zeeman, Jacqueline M; McLaughlin, Jacqueline E; Cox, Wendy C

    2017-11-01

    With increased emphasis placed on non-academic skills in the workplace, a need exists to identify an admissions process that evaluates these skills. This study assessed the validity and reliability of an application review process involving three dedicated application reviewers in a multi-stage admissions model. A multi-stage admissions model was utilized during the 2014-2015 admissions cycle. After advancing through the academic review, each application was independently reviewed by two dedicated application reviewers utilizing a six-construct rubric (written communication, extracurricular and community service activities, leadership experience, pharmacy career appreciation, research experience, and resiliency). Rubric scores were extrapolated to a three-tier ranking to select candidates for on-site interviews. Kappa statistics were used to assess interrater reliability. A three-facet Many-Facet Rasch Model (MFRM) determined reviewer severity, candidate suitability, and rubric construct difficulty. The kappa statistic for candidates' tier rank score (n = 388 candidates) was 0.692 with a perfect agreement frequency of 84.3%. There was substantial interrater reliability between reviewers for the tier ranking (kappa: 0.654-0.710). Highest construct agreement occurred in written communication (kappa: 0.924-0.984). A three-facet MFRM analysis explained 36.9% of variance in the ratings, with 0.06% reflecting application reviewer scoring patterns (i.e., severity or leniency), 22.8% reflecting candidate suitability, and 14.1% reflecting construct difficulty. Utilization of dedicated application reviewers and a defined tiered rubric provided a valid and reliable method to effectively evaluate candidates during the application review process. These analyses provide insight into opportunities for improving the application review process among schools and colleges of pharmacy. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Validity And Reliability Of The Stages Cycling Power Meter.

    Science.gov (United States)

    Granier, Cyril; Hausswirth, Christophe; Dorel, Sylvain; Yann, Le Meur

    2017-09-06

    This study aimed to determine the validity and the reliability of the Stages power meter crank system (Boulder, United States) during several laboratory cycling tasks. Eleven trained participants completed laboratory cycling trials on an indoor cycle fitted with SRM Professional and Stages systems. The trials consisted of an incremental test at 100W, 200W, 300W, 400W and four 7s sprints. The level of pedaling asymmetry was determined for each cycling intensity during a similar protocol completed on a Lode Excalibur Sport ergometer. The reliability of Stages and SRM power meters was compared by repeating the incremental test during a test-retest protocol on a Cyclus 2 ergometer. Over power ranges of 100-1250W the Stages system produced trivial to small differences compared to the SRM (standardized typical error values of 0.06, 0.24 and 0.08 for the incremental, sprint and combined trials, respectively). A large correlation was reported between the difference in power output (PO) between the two systems and the level of pedaling asymmetry (r=0.58, p system according to the level of pedaling asymmetry provided only marginal improvements in PO measures. The reliability of the Stages power meter at the sub-maximal intensities was similar to the SRM Professional model (coefficient of variation: 2.1 and 1.3% for Stages and SRM, respectively). The Stages system is a suitable device for PO measurements, except when a typical error of measurement power ranges of 100-1250W is expected.

  19. The reliability and validity of the informant AD8 by comparison with a series of cognitive assessment tools in primary healthcare.

    Science.gov (United States)

    Shaik, Muhammad Amin; Xu, Xin; Chan, Qun Lin; Hui, Richard Jor Yeong; Chong, Steven Shih Tsze; Chen, Christopher Li-Hsian; Dong, YanHong

    2016-03-01

    The validity and reliability of the informant AD8 in primary healthcare has not been established. Therefore, the present study examined the validity and reliability of the informant AD8 in government subsidized primary healthcare centers in Singapore. Eligible patients (≥60 years old) were recruited from primary healthcare centers and their informants received the AD8. Patient-informant dyads who agreed for further cognitive assessments received the Mini-Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), Clinical Dementia Rating (CDR), and a locally validated formal neuropsychological battery at a research center in a tertiary hospital. 1,082 informants completed AD8 assessment at two primary healthcare centers. Of these, 309 patients-informant dyads were further assessed, of whom 243 (78.6%) were CDR = 0; 22 (7.1%) were CDR = 0.5; and 44 (14.2%) were CDR≥1. The mean administration time of the informant AD8 was 2.3 ± 1.0 minutes. The informant AD8 demonstrated good internal consistency (Cronbach's α = 0.85); inter-rater reliability (Intraclass Correlation Coefficient (ICC) = 0.85); and test-retest reliability (weighted κ = 0.80). Concurrent validity, as measured by the correlation between total AD8 scores and CDR global (R = 0.65, p validity, as measured by convergent validity (R ≥ 0.4) between individual items of AD8 with CDR and neuropsychological domains was acceptable. The informant AD8 demonstrated good concurrent and construct validity and is a reliable measure to detect cognitive dysfunction in primary healthcare.

  20. [Nursery Teacher's Stress Scale (NTSS): reliability and validity].

    Science.gov (United States)

    Akada, Taro

    2010-06-01

    This study describes the development and evaluation of the Nursery Teacher's Stress Scale (NTSS), which explores the relation between daily hassles at work and work-related stress. In Analysis 1, 29 items were chosen to construct the NTSS. Six factors were identified: I. Stress relating to child care; II. Stress from human relations at work; III. Stress from staff-parent relations; IV. Stress from lack of time; V. Stress relating to compensation; and VI. Stress from the difference between individual beliefs and school policy. All these factors had high degrees of internal consistency. In Analysis 2, the concurrent validity of the NTSS was examined. The results showed that the NTSS total scores were significantly correlated with the Job Stress Scale-Revised Version (job stressor scale, r = .68), the Pre-school Teacher-efficacy Scale (r = -.21), and the WHO-five Well-Being Index Japanese Version (r = -.40). Work stresses are affected by several daily hassles at work. The NTSS has acceptable reliability and validity, and can be used to improve nursery teacher's mental health.

  1. [Reliability and validity of the Severe Impairment Battery, short form (SIB-s), in patients with dementia in Spain].

    Science.gov (United States)

    Cruz-Orduña, Isabel; Agüera-Ortiz, Luis F; Montorio-Cerrato, Ignacio; León-Salas, Beatriz; Valle de Juan, M Cristina; Martínez-Martín, Pablo

    2015-01-01

    People with progressive dementia evolve into a state where traditional neuropsychological tests are not effective. Severe Impairment Battery (SIB) and short form (SIB-s) were developed for evaluating the cognitive status in patients with severe dementia. To evaluate the psychometric attributes of the SIB-s in patients with severe dementia. 127 institutionalized patients (female: 86.6%; mean age: 82.6 ± 7.5 years-old) with dementia were assessed with the SIB-s, the Global Deterioration Scale (GDS), Mini-Mental State Examination (MMSE), Severe Mini-Mental State Examination (sMMSE), Barthel Index and FAST. SIB-s acceptability, reliability, validity and precision were analyzed. The mean total score for scale was 19.1 ± 15.34 (range: 0-48). Floor effect was 18.1%, only marginally higher than the desirable 15%. Factor analysis identified a single factor explaining 68% of the total variance of the scale. Cronbach's alpha coefficient was 0.96 and the item-total corrected correlation ranged from 0.27 to 0.83. The item homogeneity value was 0.43. Test-retest and inter-rater reliability for the total score was satisfactory (ICC: 0.96 and 0.95, respectively). The SIB-s showed moderate correlation with functional dependency scales (Barthel Index: 0.48, FAST: -0.74). Standard error of measurement was 3.07 for the total score. The SIB-s is a reliable and valid instrument for evaluating patients with severe dementia in the Spanish population of relatively brief instruments.

  2. Development of a new assessment tool for cervical myelopathy using hand-tracking sensor: Part 1: validity and reliability.

    Science.gov (United States)

    Alagha, M Abdulhadi; Alagha, Mahmoud A; Dunstan, Eleanor; Sperwer, Olaf; Timmins, Kate A; Boszczyk, Bronek M

    2017-04-01

    To assess the reliability and validity of a hand motion sensor, Leap Motion Controller (LMC), in the 15-s hand grip-and-release test, as compared against human inspection of an external digital camera recording. Fifty healthy participants were asked to fully grip-and-release their dominant hand as rapidly as possible for two trials with a 10-min rest in-between, while wearing a non-metal wrist splint. Each test lasted for 15 s, and a digital camera was used to film the anterolateral side of the hand on the first test. Three assessors counted the frequency of grip-and-release (G-R) cycles independently and in a blinded fashion. The average mean of the three was compared with that measured by LMC using the Bland-Altman method. Test-retest reliability was examined by comparing the two 15-s tests. The mean number of G-R cycles recorded was: 47.8 ± 6.4 (test 1, video observer); 47.7 ± 6.5 (test 1, LMC); and 50.2 ± 6.5 (test 2, LMC). Bland-Altman indicated good agreement, with a low bias (0.15 cycles) and narrow limits of agreement. The ICC showed high inter-rater agreement and the coefficient of repeatability for the number of cycles was ±5.393, with a mean bias of 3.63. LMC appears to be valid and reliable in the 15-s grip-and-release test. This serves as a first step towards the development of an objective myelopathy assessment device and platform for the assessment of neuromotor hand function in general. Further assessment in a clinical setting and to gauge healthy benchmark values is warranted.

  3. Development of Reliable and Validated Tools to Evaluate Technical Resuscitation Skills in a Pediatric Simulation Setting: Resuscitation and Emergency Simulation Checklist for Assessment in Pediatrics.

    Science.gov (United States)

    Faudeux, Camille; Tran, Antoine; Dupont, Audrey; Desmontils, Jonathan; Montaudié, Isabelle; Bréaud, Jean; Braun, Marc; Fournier, Jean-Paul; Bérard, Etienne; Berlengi, Noémie; Schweitzer, Cyril; Haas, Hervé; Caci, Hervé; Gatin, Amélie; Giovannini-Chami, Lisa

    2017-09-01

    To develop a reliable and validated tool to evaluate technical resuscitation skills in a pediatric simulation setting. Four Resuscitation and Emergency Simulation Checklist for Assessment in Pediatrics (RESCAPE) evaluation tools were created, following international guidelines: intraosseous needle insertion, bag mask ventilation, endotracheal intubation, and cardiac massage. We applied a modified Delphi methodology evaluation to binary rating items. Reliability was assessed comparing the ratings of 2 observers (1 in real time and 1 after a video-recorded review). The tools were assessed for content, construct, and criterion validity, and for sensitivity to change. Inter-rater reliability, evaluated with Cohen kappa coefficients, was perfect or near-perfect (>