WorldWideScience

Sample records for validity interrater reliability

  1. Quantitative measurement of hypertrophic scar: interrater reliability and concurrent validity.

    Science.gov (United States)

    Nedelec, Bernadette; Correa, José A; Rachelska, Grazyna; Armour, Alexis; LaSalle, Léo

    2008-01-01

    Research into the pathophysiology and treatment of hypertrophic scar (HSc) remains limited by the heterogeneity of scar and the imprecision with which its severity is measured. The objective of this study was to test the interrater reliability and concurrent validity of the Cutometer measurement of elasticity, the Mexameter measurement of erythema and pigmentation, and total thickness measure of the DermaScan C relative to the modified Vancouver Scar Scale (mVSS) in patient-matched normal skin, normal scar, and HSc. Three independent investigators evaluated 128 sites (severe HSc, moderate or mild HSc, donor site, and normal skin) on 32 burn survivors using all of the above measurement tools. The intraclass correlation coefficient, which was used to measure interrater reliability, reflects the inherent amount of error in the measure and is considered acceptable when it is >0.75. Interrater reliability of the totals of the height, pliability, and vascularity subscales of the mVSS fell below the acceptable limit ( congruent with0.50). The individual subscales of the mVSS fell well below the acceptable level (0.89) for each study site with the exception of severe scar. Mexameter and DermaScan C reliability measurements were acceptable for all sites (>0.82). Concurrent validity correlations with the mVSS were significant except for the comparison of the mVSS pliability subscale and the Cutometer maximum deformation measure comparison in severe scar. In conclusion, the Mexameter and DermaScan C measurements of scar color and thickness of all sites, as well as the Cutometer measurement of elasticity in all but the most severe scars shows high interrater reliability. Their significant concurrent validity with the mVSS confirms that these tools are measuring the same traits as the mVSS, and in a more objective way.

  2. The PRECIS-2 tool has good interrater reliability and modest discriminant validity.

    Science.gov (United States)

    Loudon, Kirsty; Zwarenstein, Merrick; Sullivan, Frank M; Donnan, Peter T; Gágyor, Ildikó; Hobbelen, Hans J S M; Althabe, Fernando; Krishnan, Jerry A; Treweek, Shaun

    2017-08-01

    PRagmatic Explanatory Continuum Indicator Summary (PRECIS)-2 is a tool that could improve design insight for trialists. Our aim was to validate the PRECIS-2 tool, unlike its predecessor, testing the discriminant validity and interrater reliability. Over 80 international trialists, methodologists, clinicians, and policymakers created PRECIS-2 helping to ensure face validity and content validity. The interrater reliability of PRECIS-2 was measured using 19 experienced trialists who used PRECIS-2 to score a diverse sample of 15 randomized controlled trial protocols. Discriminant validity was tested with two raters to independently determine if the trial protocols were more pragmatic or more explanatory, with scores from the 19 raters for the 15 trials as predictors of pragmatism. Interrater reliability was generally good, with seven of nine domains having an intraclass correlation coefficient over 0.65. Flexibility (adherence) and recruitment had wide confidence intervals, but raters found these difficult to rate and wanted more information. Each of the nine PRECIS-2 domains could be used to differentiate between trials taking more pragmatic or more explanatory approaches with better than chance discrimination for all domains. We have assessed the validity and reliability of PRECIS-2. An elaboration study and web site provide guidance to help future users of the tool which is continuing to be tested by trial teams, systematic reviewers, and funders. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Clinical Functional Capacity Testing in Patients With Facioscapulohumeral Muscular Dystrophy: Construct Validity and Interrater Reliability of Antigravity Tests.

    Science.gov (United States)

    Rijken, Noortje H; van Engelen, Baziel G; Weerdesteyn, Vivian; Geurts, Alexander C

    2015-12-01

    To evaluate the construct validity and interrater reliability of 4 simple antigravity tests in a small group of patients with facioscapulohumeral muscular dystrophy (FSHD). Case-control study. University medical center. Patients with various severity levels of FSHD (n=9) and healthy control subjects (n=10) were included (N=19). Not applicable. A 4-point ordinal scale was designed to grade performance on the following 4 antigravity tests: sit to stance, stance to sit, step up, and step down. In addition, the 6-minute walk test, 10-m walking test, Berg Balance Scale, and timed Up and Go test were administered as conventional tests. Construct validity was determined by linear regression analysis using the Clinical Severity Score (CSS) as the dependent variable. Interrater agreement was tested using a κ analysis. Patients with FSHD performed worse on all 4 antigravity tests compared with the controls. Stronger correlations were found within than between test categories (antigravity vs conventional). The antigravity tests revealed the highest explained variance with regard to the CSS (R(2)=.86, P=.014). Interrater agreement was generally good. The results of this exploratory study support the construct validity and interrater reliability of the proposed antigravity tests for the assessment of functional capacity in patients with FSHD taking into account the use of compensatory strategies. Future research should further validate these results in a larger sample of patients with FSHD. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  4. Inter-Rater Reliability and Validity of the Australian Football League’s Kicking and Handball Tests

    Science.gov (United States)

    Cripps, Ashley J.; Hopper, Luke S.; Joyce, Christopher

    2015-01-01

    Talent identification tests used at the Australian Football League’s National Draft Combine assess the capacities of athletes to compete at a professional level. Tests created for the National Draft Combine are also commonly used for talent identification and athlete development in development pathways. The skills tests created by the Australian Football League required players to either handball (striking the ball with the hand) or kick to a series of 6 randomly generated targets. Assessors subjectively rate each skill execution giving a 0-5 score for each disposal. This study aimed to investigate the inter-rater reliability and validity of the skills tests at an adolescent sub-elite level. Male Australian footballers were recruited from sub-elite adolescent teams (n = 121, age = 15.7 ± 0.3 years, height = 1.77 ± 0.07 m, mass = 69.17 ± 8.08 kg). The coaches (n = 7) of each team were also recruited. Inter-rater reliability was assessed using Inter-class correlations (ICC) and Limits of Agreement statistics. Both the kicking (ICC = 0.96, p handball tests (ICC = 0.89, p handball test. Key points The skill tests created by the AFL demonstrated acceptable levels of relative and absolute inter-rater reliability. Both the AFL’s skills tests are able to differentiate between athletes dominant and non-dominant limbs. However, only the kicking test could consistently differentiated between score outcomes over a range of Australian Football specific disposal distances. Both tests demonstrated poor concurrent validity, with no correlation found between coaches’ perceptions of technical skills and actual skill outcomes measured. PMID:26336356

  5. Inter-rater and intra-rater reliability of the Bahasa Melayu version of Rose Angina Questionnaire.

    Science.gov (United States)

    Hassan, N B; Choudhury, S R; Naing, L; Conroy, R M; Rahman, A R A

    2007-01-01

    The objective of the study is to translate the Rose Questionnaire (RQ) into a Bahasa Melayu version and adapt it cross-culturally, and to measure its inter-rater and intrarater reliability. This cross sectional study was conducted in the respondents' homes or workplaces in Kelantan, Malaysia. One hundred respondents aged 30 and above with different socio-demographic status were interviewed for face validity. For each inter-rater and intra-rater reliability, a sample of 150 respondents was interviewed. Inter-rater and intra-rater reliabilities were assessed by Cohen's kappa. The overall inter-rater agreements by the five pair of interviewers at point one and two were 0.86, and intrarater reliability by the five interviewers on the seven-item questionnaire at poinone and two was 0.88, as measured by kappa coefficient. The translated Malay version of RQ demonstrated an almost perfect inter-rater and intra-rater reliability and further validation such as sensitivity and specificity analysis of this translated questionnaire is highly recommended.

  6. Clinical Functional Capacity Testing in Patients With Facioscapulohumeral Muscular Dystrophy: Construct Validity and Interrater Reliability of Antigravity Tests

    NARCIS (Netherlands)

    Rijken, N.H.M.; Engelen, B.G.M. van; Weerdesteyn, V.G.M.; Geurts, A.C.H.

    2015-01-01

    OBJECTIVE: To evaluate the construct validity and interrater reliability of 4 simple antigravity tests in a small group of patients with facioscapulohumeral muscular dystrophy (FSHD). DESIGN: Case-control study. SETTING: University medical center. PARTICIPANTS: Patients with various severity levels

  7. Test-retest and interrater reliability of the functional lower extremity evaluation.

    Science.gov (United States)

    Haitz, Karyn; Shultz, Rebecca; Hodgins, Melissa; Matheson, Gordon O

    2014-12-01

    Repeated-measures clinical measurement reliability study. To establish the reliability and face validity of the Functional Lower Extremity Evaluation (FLEE). The FLEE is a 45-minute battery of 8 standardized functional performance tests that measures 3 components of lower extremity function: control, power, and endurance. The reliability and normative values for the FLEE in healthy athletes are unknown. A face validity survey for the FLEE was sent to sports medicine personnel to evaluate the level of importance and frequency of clinical usage of each test included in the FLEE. The FLEE was then administered and rated for 40 uninjured athletes. To assess test-retest reliability, each athlete was tested twice, 1 week apart, by the same rater. To assess interrater reliability, 3 raters scored each athlete during 1 of the testing sessions. Intraclass correlation coefficients were used to assess the test-retest and interrater reliability of each of the FLEE tests. In the face validity survey, the FLEE tests were rated as highly important by 58% to 71% of respondents but frequently used by only 26% to 45% of respondents. Interrater reliability intraclass correlation coefficients ranged from 0.83 to 1.00, and test-retest reliability ranged from 0.71 to 0.95. The FLEE tests are considered clinically important for assessing lower extremity function by sports medicine personnel but are underused. The FLEE also is a reliable assessment tool. Future studies are required to determine if use of the FLEE to make return-to-play decisions may reduce reinjury rates.

  8. Validity and Interrater Reliability of the Visual Quarter-Waste Method for Assessing Food Waste in Middle School and High School Cafeteria Settings.

    Science.gov (United States)

    Getts, Katherine M; Quinn, Emilee L; Johnson, Donna B; Otten, Jennifer J

    2017-11-01

    Measuring food waste (ie, plate waste) in school cafeterias is an important tool to evaluate the effectiveness of school nutrition policies and interventions aimed at increasing consumption of healthier meals. Visual assessment methods are frequently applied in plate waste studies because they are more convenient than weighing. The visual quarter-waste method has become a common tool in studies of school meal waste and consumption, but previous studies of its validity and reliability have used correlation coefficients, which measure association but not necessarily agreement. The aims of this study were to determine, using a statistic measuring interrater agreement, whether the visual quarter-waste method is valid and reliable for assessing food waste in a school cafeteria setting when compared with the gold standard of weighed plate waste. To evaluate validity, researchers used the visual quarter-waste method and weighed food waste from 748 trays at four middle schools and five high schools in one school district in Washington State during May 2014. To assess interrater reliability, researcher pairs independently assessed 59 of the same trays using the visual quarter-waste method. Both validity and reliability were assessed using a weighted κ coefficient. For validity, as compared with the measured weight, 45% of foods assessed using the visual quarter-waste method were in almost perfect agreement, 42% of foods were in substantial agreement, 10% were in moderate agreement, and 3% were in slight agreement. For interrater reliability between pairs of visual assessors, 46% of foods were in perfect agreement, 31% were in almost perfect agreement, 15% were in substantial agreement, and 8% were in moderate agreement. These results suggest that the visual quarter-waste method is a valid and reliable tool for measuring plate waste in school cafeteria settings. Copyright © 2017 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.

  9. Face validity and inter-rater reliability of the Danish version of the modified-Yale Preoperative Anxiety Scale

    DEFF Research Database (Denmark)

    Skovby, Pernille; Rask, Charlotte Ulrikka; Dall, Rolf

    2014-01-01

    -YPAS to Danish cultural and linguistic conditions and to test face validity and inter-reliability in a clinical setting. Materials and methods The translation was performed in accordance with WHO guidelines. Face validity as well as linguistic difficulties of the Danish version was tested and solved in a focus...... of the m-YPAS as suitable and relevant, i.e. the face validity satisfactory. Inter-rater reliability analysis revealed that inter-observer agreement at induction 1 were good to very good (kw: 0.63–0.98) and at induction 2, the agreement was good to very good (kw: 0.72–0.96). ICC for the overall weighted...... anxiety score was in: induction 1:0.92 and induction 2: 0.92 Conclusion Standardized and validated assessment tools are needed to evaluate interventions aiming to reduce preoperative anxiety in children. The Danish m-YPAS had a satisfactory face validity and inter-reliability, based on a minor empirical...

  10. A pediatric FOUR score coma scale: interrater reliability and predictive validity.

    Science.gov (United States)

    Czaikowski, Brianna L; Liang, Hong; Stewart, C Todd

    2014-04-01

    The Full Outline of UnResponsiveness (FOUR) Score is a coma scale that consists of four components (eye and motor response, brainstem reflexes, and respiration). It was originally validated among the adult population and recently in a pediatric population. To enhance clinical assessment of pediatric intensive care unit patients, including those intubated and/or sedated, at our children's hospital, we modified the FOUR Score Scale for this population. This modified scale would provide many of the same advantages as the original, such as interrater reliability, simplicity, and elimination of the verbal component that is not compatible with the Glasgow Coma Scale (GCS), creating a more valuable neurological assessment tool for the nursing community. Our goal was to potentially provide greater information than the formally used GCS when assessing critically ill, neurologically impaired patients, including those sedated and/or intubated. Experienced pediatric intensive care unit nurses were trained as "expert raters." Two different nurses assessed each subject using the Pediatric FOUR Score Scale (PFSS), GCS, and Richmond Agitation Sedation Scale at three different time points. Data were compared with the Pediatric Cerebral Performance Category (PCPC) assessed by another nurse. Our hypothesis was that the PFSS and PCPC should highly correlate and the GCS and PCPC should correlate lower. Study results show that the PFSS is excellent for interrater reliability for trained nurse-rater pairs and prediction of poor outcome and in-hospital mortality, under various situations, but there were no statistically significant differences between the PFSS and the GCS. However, the PFSS does have the potential to provide greater neurological assessment in the intubated and/or sedated patient based on the outcomes of our study.

  11. Validation and inter-rater reliability of a three item falls risk screening tool

    Directory of Open Access Journals (Sweden)

    Catherine Maree Said

    2017-11-01

    Full Text Available Abstract Background Falls screening tools are routinely used in hospital settings and the psychometric properties of tools should be examined in the setting in which they are used. The aim of this study was to explore the concurrent and predictive validity of the Austin Health Falls Risk Screening Tool (AHFRST, compared with The Northern Hospital Modified St Thomas’s Risk Assessment Tool (TNH-STRATIFY, and the inter-rater reliability of the AHFRST. Methods A research physiotherapist used the AHFRST and TNH-STRATIFY to classify 130 participants admitted to Austin Health (five acute wards, n = 115 two subacute wards n = 15; median length of stay 6 days IQR 3–12 as ‘High’ or ‘Low’ falls risk. The AHFRST was also completed by nursing staff on patient admission. Falls data was collected from the hospital incident reporting system. Results Six falls occurred during the study period (fall rate of 4.6 falls per 1000 bed days. There was substantial agreement between the AHFRST and the TNH-STRATIFY (Kappa = 0.68, 95% CI 0.52–0.78. Both tools had poor predictive validity, with low specificity (AHFRST 46.0%, 95% CI 37.0–55.1; TNH-STRATIFY 34.7%, 95% CI 26.4–43.7 and positive predictive values (AHFRST 5.6%, 95% CI 1.6–13.8; TNH-STRATIFY 6.9%, 95% CI 2.6–14.4. The AHFRST showed moderate inter-rater reliability (Kappa = 0.54, 95% CI = 0.36–0.67, p < 0.001 although 18 patients did not have the AHFRST completed by nursing staff. Conclusions There was an acceptable level of agreement between the 3 item AHFRST classification of falls risk and the longer, 9 item TNH-STRATIFY classification. However, both tools demonstrated limited predictive validity in the Austin Health population. The results highlight the importance of evaluating the validity of falls screening tools, and the clinical utility of these tools should be reconsidered.

  12. Concurrent validity and interrater reliability of a new smartphone application to assess 3D active cervical range of motion in patients with neck pain.

    Science.gov (United States)

    Stenneberg, Martijn S; Busstra, Harm; Eskes, Michel; van Trijffel, Emiel; Cattrysse, Erik; Scholten-Peeters, Gwendolijne G M; de Bie, Rob A

    2018-04-01

    There is a lack of valid, reliable, and feasible instruments for measuring planar active cervical range of motion (aCROM) and associated 3D coupling motions in patients with neck pain. Smartphones have advanced sensors and appear to be suitable for these measurements. To estimate the concurrent validity and interrater reliability of a new iPhone application for assessing planar aCROM and associated 3D coupling motions in patients with neck pain, using an electromagnetic tracking device as a reference test. Cross-sectional study. Two samples of neck pain patients were recruited; 30 patients for the validity study and 26 patients for the reliability study. Validity was estimated using intraclass correlation coefficients (ICCs), and by calculating 95% limits of agreement (LoA). To estimate interrater reliability, ICCs were calculated. Cervical 3D coupling motions were analyzed by calculating the cross-correlation coefficients and ratio between the main motions and coupled motions for both instruments. ICCs for concurrent validity and interrater reliability ranged from 0.90 to 0.99. The width of the 95% LoA ranged from about 5° for right lateral bending to 11° for total rotation. No significant differences were found between both devices for associated coupling motion analysis. The iPhone application appears to be a useful discriminative tool for the measurement of planar aCROM and associated coupling motions in patients with neck pain. It fulfills the need for a valid, reliable, and feasible instrument in clinical practice and research. Therapists and researchers should consider measurement error when interpreting scores. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Inter-rater and intra-rater reliability of a movement control test in shoulder.

    Science.gov (United States)

    Rajasekar, S; Bangera, Rakshith K; Sekaran, Padmanaban

    2017-07-01

    Movement faults are commonly observed in patients with musculoskeletal pain. The Kinetic Medial Rotation Test (KMRT) is a movement control test used to identify movement faults of the scapula and gleno-humeral joints during arm movement. Objective tests such as the KMRT need to be reliable and valid for the results to be applied across different clinical settings and patient populations. The primary objective of the present study was to determine the intra-rater and inter-rater reliability of KMRT in subjects with and without shoulder pain. Sixty subjects were included in this study based on specific inclusion and exclusion criteria. Two musculoskeletal physiotherapists with different levels of clinical experience performed the tests. The intra-rater reliability was tested in twenty asymptomatic subjects by a single assessor at two week intervals. An equal number of subjects with and without shoulder pain were tested by both the assessors to determine the inter-rater reliability. Both components of the KMRT, the Gleno- Humeral Anterior Translation (GHAT) and the Scapular Forward Tilt (SCFT) were tested. The Kappa values for inter-rater reliability of the GHAT and SCFT were K = 0.68 & K = 0.65 respectively in subjects with shoulder pain. In asymptomatic subjects, the inter-rater reliability of GHAT was K = 0.61 and SCFT was K = 0.85. Intra-rater reliability ranged from K = 0.66 for GHAT to K = 0.87 for SCFT. Our study found substantial agreement in inter-rater reliability of KMRT in subjects with shoulder pain, whereas substantial to near perfect agreement was found in intra-rater and inter-rater reliability of KMRT in subjects without shoulder pain. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Manual muscle testing and hand-held dynamometry in people with inflammatory myopathy: An intra- and interrater reliability and validity study.

    Science.gov (United States)

    Baschung Pfister, Pierrette; de Bruin, Eling D; Sterkele, Iris; Maurer, Britta; de Bie, Rob A; Knols, Ruud H

    2018-01-01

    Manual muscle testing (MMT) and hand-held dynamometry (HHD) are commonly used in people with inflammatory myopathy (IM), but their clinimetric properties have not yet been sufficiently studied. To evaluate the reliability and validity of MMT and HHD, maximum isometric strength was measured in eight muscle groups across three measurement events. To evaluate reliability of HHD, intra-class correlation coefficients (ICC), the standard error of measurements (SEM) and smallest detectable changes (SDC) were calculated. To measure reliability of MMT linear Cohen`s Kappa was computed for single muscle groups and ICC for total score. Additionally, correlations between MMT8 and HHD were evaluated with Spearman Correlation Coefficients. Fifty people with myositis (56±14 years, 76% female) were included in the study. Intra-and interrater reliability of HHD yielded excellent ICCs (0.75-0.97) for all muscle groups, except for interrater reliability of ankle extension (0.61). The corresponding SEMs% ranged from 8 to 28% and the SDCs% from 23 to 65%. MMT8 total score revealed excellent intra-and interrater reliability (ICC>0.9). Intrarater reliability of single muscle groups was substantial for shoulder and hip abduction, elbow and neck flexion, and hip extension (0.64-0.69); moderate for wrist (0.53) and knee extension (0.49) and fair for ankle extension (0.35). Interrater reliability was moderate for neck flexion (0.54) and hip abduction (0.44); fair for shoulder abduction, elbow flexion, wrist and ankle extension (0.20-0.33); and slight for knee extension (0.08). Correlations between the two tests were low for wrist, knee, ankle, and hip extension; moderate for elbow flexion, neck flexion and hip abduction; and good for shoulder abduction. In conclusion, the MMT8 total score is a reliable assessment to consider general muscle weakness in people with myositis but not for single muscle groups. In contrast, our results confirm that HHD can be recommended to evaluate strength of

  15. Nurses assessing pain with the Nociception Coma Scale: interrater reliability and validity

    NARCIS (Netherlands)

    Vink, Peter; Eskes, Anne Maria; Lindeboom, Robert; van den Munckhof, Pepijn; Vermeulen, Hester

    2014-01-01

    The Nociception Coma Scale (NCS) is a pain observation tool, developed for patients with disorders of consciousness (DOC) due to acquired brain injury (ABI). The aim of this study was to assess the interrater reliability of the NCS and NCS-R among nurses for the assessment of pain in ABI patients

  16. Interrater reliability of the mind map assessment rubric in a cohort of medical students

    Directory of Open Access Journals (Sweden)

    Zipp Genevieve

    2009-04-01

    Full Text Available Abstract Background Learning strategies are thinking tools that students can use to actively acquire information. Examples of learning strategies include mnemonics, charts, and maps. One strategy that may help students master the tsunami of information presented in medical school is the mind map learning strategy. Currently, there is no valid and reliable rubric to grade mind maps and this may contribute to their underutilization in medicine. Because concept maps and mind maps engage learners similarly at a metacognitive level, a valid and reliable concept map assessment scoring system was adapted to form the mind map assessment rubric (MMAR. The MMAR can assess mind map depth based upon concept-links, cross-links, hierarchies, examples, pictures, and colors. The purpose of this study was to examine interrater reliability of the MMAR. Methods This exploratory study was conducted at a US medical school as part of a larger investigation on learning strategies. Sixty-six (N = 66 first-year medical students were given a 394-word text passage followed by a 30-minute presentation on mind mapping. After the presentation, subjects were again given the text passage and instructed to create mind maps based upon the passage. The mind maps were collected and independently scored using the MMAR by 3 examiners. Interrater reliability was measured using the intraclass correlation coefficient (ICC statistic. Statistics were calculated using SPSS version 12.0 (Chicago, IL. Results Analysis of the mind maps revealed the following: concept-links ICC = .05 (95% CI, -.42 to .38, cross-links ICC = .58 (95% CI, .37 to .73, hierarchies ICC = .23 (95% CI, -.15 to .50, examples ICC = .53 (95% CI, .29 to .69, pictures ICC = .86 (95% CI, .79 to .91, colors ICC = .73 (95% CI, .59 to .82, and total score ICC = .86 (95% CI, .79 to .91. Conclusion The high ICC value for total mind map score indicates strong MMAR interrater reliability. Pictures and colors demonstrated moderate

  17. Interrater reliability of the mind map assessment rubric in a cohort of medical students.

    Science.gov (United States)

    D'Antoni, Anthony V; Zipp, Genevieve Pinto; Olson, Valerie G

    2009-04-28

    Learning strategies are thinking tools that students can use to actively acquire information. Examples of learning strategies include mnemonics, charts, and maps. One strategy that may help students master the tsunami of information presented in medical school is the mind map learning strategy. Currently, there is no valid and reliable rubric to grade mind maps and this may contribute to their underutilization in medicine. Because concept maps and mind maps engage learners similarly at a metacognitive level, a valid and reliable concept map assessment scoring system was adapted to form the mind map assessment rubric (MMAR). The MMAR can assess mind map depth based upon concept-links, cross-links, hierarchies, examples, pictures, and colors. The purpose of this study was to examine interrater reliability of the MMAR. This exploratory study was conducted at a US medical school as part of a larger investigation on learning strategies. Sixty-six (N = 66) first-year medical students were given a 394-word text passage followed by a 30-minute presentation on mind mapping. After the presentation, subjects were again given the text passage and instructed to create mind maps based upon the passage. The mind maps were collected and independently scored using the MMAR by 3 examiners. Interrater reliability was measured using the intraclass correlation coefficient (ICC) statistic. Statistics were calculated using SPSS version 12.0 (Chicago, IL). Analysis of the mind maps revealed the following: concept-links ICC = .05 (95% CI, -.42 to .38), cross-links ICC = .58 (95% CI, .37 to .73), hierarchies ICC = .23 (95% CI, -.15 to .50), examples ICC = .53 (95% CI, .29 to .69), pictures ICC = .86 (95% CI, .79 to .91), colors ICC = .73 (95% CI, .59 to .82), and total score ICC = .86 (95% CI, .79 to .91). The high ICC value for total mind map score indicates strong MMAR interrater reliability. Pictures and colors demonstrated moderate to strong interrater reliability. We conclude that the

  18. Test-re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females.

    Science.gov (United States)

    Beardsley, Chris; Egerton, Tim; Skinner, Brendon

    2016-01-01

    Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81-0.88), test-re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and test-re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test-re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test-re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test-re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.

  19. Validity and inter-rater reliability of medio-lateral knee motion observed during a single-limb mini squat

    DEFF Research Database (Denmark)

    Ageberg, Eva; Bennell, Kim L; Hunt, Michael A

    2010-01-01

    Muscle function may influence the risk of knee injury and outcomes following injury. Clinical tests, such as a single-limb mini squat, resemble conditions of daily life and are easy to administer. Fewer squats per 30 seconds indicate poorer function. However, the quality of movement, such as the ......, such as the medio-lateral knee motion may also be important. The aim was to validate an observational clinical test of assessing the medio-lateral knee motion, using a three-dimensional (3-D) motion analysis system. In addition, the inter-rater reliability was evaluated....

  20. Interrater reliability of a Pilates movement-based classification system.

    Science.gov (United States)

    Yu, Kwan Kenny; Tulloch, Evelyn; Hendrick, Paul

    2015-01-01

    To determine the interrater reliability for identification of a specific movement pattern using a Pilates Classification system. Videos of 5 subjects performing specific movement tasks were sent to raters trained in the DMA-CP classification system. Ninety-six raters completed the survey. Interrater reliability for the detection of a directional bias was excellent (Pi = 0.92, and K(free) = 0.89). Interrater reliability for classifying an individual into a specific subgroup was moderate (Pi = 0.64, K(free) = 0.55) however raters who had completed levels 1-4 of the DMA-CP training and reported using the assessment daily demonstrated excellent reliability (Pi = 0.89 and K(free) = 0.87). The reliability of the classification system demonstrated almost perfect agreement in determining the existence of a specific movement pattern and classifying into a subgroup for experienced raters. There was a trend for greater reliability associated with increased levels of training and experience of the raters. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Interrater and test-retest reliability and validity of the Norwegian version of the BESTest and mini-BESTest in people with increased risk of falling.

    Science.gov (United States)

    Hamre, Charlotta; Botolfsen, Pernille; Tangen, Gro Gujord; Helbostad, Jorunn L

    2017-04-20

    The Balance Evaluation Systems Test (BESTest) was developed to assess underlying systems for balance control in order to be able to individually tailor rehabilitation interventions to people with balance disorders. A short form, the Mini-BESTest, was developed as a screening test. The study aimed to assess interrater and test-retest reliability of the Norwegian version of the BESTest and the Mini-BESTest in community-dwelling people with increased risk of falling and to assess concurrent validity with the Fall Efficacy Scale-International (FES-I), and it was an observational study with a cross-sectional design. Forty-two persons with increased risk of falling (elderly over 65 years of age, persons with a history of stroke or Multiple Sclerosis) were assessed twice by two raters. Relative reliability was analysed with Intraclass Correlation Coefficient (ICC), and absolute reliability with standard error of measurement (SEM) and smallest detectable change (SDC). Concurrent validity was assessed against the FES-I using Spearman's rho. The BESTest showed very good interrater reliability (ICC = 0.98, SEM = 1.79, SDC 95  = 5.0) and test-retest reliability (rater A/rater B = ICC = 0.89/0.89, SEM = 3.9/4.3, SDC 95  = 10.8/11.8). The Mini-BESTest also showed very good interrater reliability (ICC = 0.95, SEM = 1.19, SDC 95  = 3.3) and test-retest reliability (rater A/rater B = ICC = 0.85/0.84, SEM = 1.8/1.9, SDC 95  = 4.9/5.2). The correlations were moderate between the FES-I and both the BESTest and the Mini-BESTest (Spearman's rho -0.51 and-0.50, p test-retest reliability when assessed in a heterogeneous sample of people with increased risk of falling. The concurrent validity measured against the FES-I showed moderate correlation. The results are comparable with earlier studies and indicate that the Norwegian versions can be used in daily clinic and in research.

  2. Test–re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females

    Directory of Open Access Journals (Sweden)

    Chris Beardsley

    2016-03-01

    Full Text Available Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test–re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81–0.88, test–re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88–0.95, and test–re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65 and good on the right side (ICC = 0.85. Conclusion. Inter-rater reliability and test–re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test–re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test–re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.

  3. [Evaluation of Suicide Risk Levels in Hospitals: Validity and Reliability Tests].

    Science.gov (United States)

    Macagnino, Sandro; Steinert, Tilman; Uhlmann, Carmen

    2018-05-01

    Examination of in-hospital suicide risk levels concerning their validity and their reliability. The internal suicide risk levels were evaluated in a cross sectional study of in 163 inpatients. A reliability check was performed via determining interrater-reliability of senior physician, therapist and the responsible nurse. Within the scope of the validity check, we conducted analyses of criterion validity and construct validity. For the total sample an "acceptable" to "good" interrater-reliability (Kendalls W = .77) of suicide risk levels were obtained. Schizophrenic disorders showed the lowest values, for personality disorders we found the highest level of interrater-reliability. When examining the criterion validity, Item-9 of the BDI-II is substantial correlated to our suicide risk levels (ρ m  = .54, p validity check, affective disorders showed the highest correlation (ρ = .77), compatible also with "convergent validity". They differed with schizophrenic disorders which showed the least concordance (ρ = .43). In-hospital suicide risk levels may represent an important contribution to the assessment of suicidal behavior of inpatients experiencing psychiatric treatment due to their overall good validity and reliability. © Georg Thieme Verlag KG Stuttgart · New York.

  4. High inter-rater reliability, agreement, and convergent validity of Constant score in patients with clavicle fractures

    DEFF Research Database (Denmark)

    Ban, Ilija; Troelsen, Anders; Kristensen, Morten Tange

    2016-01-01

    BACKGROUND: The Constant score (CS) has been the primary endpoint in most studies on clavicle fractures. However, the CS was not developed to assess patients with clavicle fractures. Our aim was to examine inter-rater reliability and agreement of the CS in patients with clavicle fractures...... standardized CS assessment at a mean of 6.8 weeks (SD, 1.0 weeks) after injury. Reliability and agreement of the CS were determined by 2 raters. The interclass correlation coefficient (ICC2,1), standard error of measurement, minimal detectable change, Cronbach α coefficient, and Pearson correlation coefficient...... were estimated. RESULTS: Inter-rater reliability of the total CS was excellent (interclass correlation coefficient, 0.94; 95% confidence interval, 0.88-0.97), with no systematic difference between the 2 raters (P = .75). The standard error of measurement (measurement error at the group level) was 4...

  5. Intrarater and interrater reliability for measurements in videofluoroscopy of swallowing

    International Nuclear Information System (INIS)

    Baijens, Laura; Barikroo, Ali; Pilz, Walmari

    2013-01-01

    Objective: Intrarater and interrater reliability is crucial to the quality of diagnostic and therapy-effect studies. This paper reports on a systematic review of studies on intrarater and interrater reliability for measurements in videofluoroscopy of swallowing. The aim of this review was to summarize and qualitatively analyze published studies on that topic. Materials and methods: Those published up to March 2013 were found through a comprehensive electronic database search using PubMed, Embase, and The Cochrane Library. Two reviewers independently assessed the studies using strict inclusion criteria. Results: Nineteen studies were included and then qualitatively analyzed. In several of these, methodological problems were found. Moreover, intrarater and interrater reliability varied with the measure applied. A meta-analysis was not carried out as studies were not of sufficient quality to warrant doing so. Conclusion: In order to achieve reliable measurements in videofluoroscopy of swallowing, it is recommended that raters use well-defined guidelines for the levels of ordinal visuoperceptual variables. Furthermore, in order to make the measurements reliable (intrarater and interrater) it is recommended that, following protocolled pre-experimental training, the raters should have maximum consensus about the definition of the measured variables

  6. Intrarater and interrater reliability for measurements in videofluoroscopy of swallowing

    Energy Technology Data Exchange (ETDEWEB)

    Baijens, Laura, E-mail: laura.baijens@mumc.nl [Department of Otorhinolaryngology, Head and Neck Surgery, Maastricht University Medical Center, Maastricht (Netherlands); Barikroo, Ali, E-mail: a.Barikroo@ufl.edu [Swallowing Research Laboratory, Department of Speech, Language and Hearing Sciences, College of Public Health and Health Professions, University of Florida, Gainesville, FL (United States); Pilz, Walmari, E-mail: walmari.pilz@mumc.nl [Department of Otorhinolaryngology, Head and Neck Surgery, Maastricht University Medical Center, Maastricht (Netherlands)

    2013-10-01

    Objective: Intrarater and interrater reliability is crucial to the quality of diagnostic and therapy-effect studies. This paper reports on a systematic review of studies on intrarater and interrater reliability for measurements in videofluoroscopy of swallowing. The aim of this review was to summarize and qualitatively analyze published studies on that topic. Materials and methods: Those published up to March 2013 were found through a comprehensive electronic database search using PubMed, Embase, and The Cochrane Library. Two reviewers independently assessed the studies using strict inclusion criteria. Results: Nineteen studies were included and then qualitatively analyzed. In several of these, methodological problems were found. Moreover, intrarater and interrater reliability varied with the measure applied. A meta-analysis was not carried out as studies were not of sufficient quality to warrant doing so. Conclusion: In order to achieve reliable measurements in videofluoroscopy of swallowing, it is recommended that raters use well-defined guidelines for the levels of ordinal visuoperceptual variables. Furthermore, in order to make the measurements reliable (intrarater and interrater) it is recommended that, following protocolled pre-experimental training, the raters should have maximum consensus about the definition of the measured variables.

  7. Inter-rater reliability of shoulder measurements in middle-aged women.

    Science.gov (United States)

    De Groef, A; Van Kampen, M; Vervloesem, N; Clabau, E; Christiaens, M-R; Neven, P; Geraerts, I; Struyf, F; Devoogdt, N

    2017-06-01

    To investigate inter-rater reliability of a set of shoulder measurements including inclinometry [shoulder range of motion (ROM)], acromion-table distance and pectoralis minor muscle length (static scapular positioning), upward rotation with two inclinometers (scapular kinematics) and pain pressure thresholds (muscle tenderness) in middle-aged women. Observational study. Thirty symptom-free middle-aged women (first cohort) were measured by two raters. All measurements with an intraclass correlation coefficient (ICC) below 0.75 were retested after an additional training period in a second cohort of 30 symptom-free middle-aged women. Inter-rater reliability of all variables was measured with the ICC (95% confidence interval) and standard error of measurement (SEM). Acromion-table distance (ICC=0.91, SEM 0.22 to 0.28% of body length), pectoralis minor muscle length (ICC=0.91, SEM 0.16% of body length), pain pressure thresholds (ICC=0.78 to 0.85, SEM 0.39 to 0.70kg) and abduction ROM (ICC=0.77, SEM 5°) showed good to excellent inter-rater reliability in the first cohort. After an additional training period, forward flexion ROM showed good inter-rater reliability (ICC=0.83, SEM 5°), scapular upward rotation in resting position showed moderate reliability (ICC=0.52, SEM 2°), and other scaption angles showed weak reliability (ICC=0.26 to 0.43, SEM 3 to 8°). In a battery of clinical tools to evaluate factors contributing to shoulder pain, static scapular positioning and pressure pain thresholds were found to have good to excellent inter-rater reliability in middle-aged women. Additional training is recommended for measurements with a gravity inclinometer. Copyright © 2016 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.

  8. Inter-rater reliability of case-note audit: a systematic review.

    Science.gov (United States)

    Lilford, Richard; Edwards, Alex; Girling, Alan; Hofer, Timothy; Di Tanna, Gian Luca; Petty, Jane; Nicholl, Jon

    2007-07-01

    The quality of clinical care is often assessed by retrospective examination of case-notes (charts, medical records). Our objective was to determine the inter-rater reliability of case-note audit. We conducted a systematic review of the inter-rater reliability of case-note audit. Analysis was restricted to 26 papers reporting comparisons of two or three raters making independent judgements about the quality of care. Sixty-six separate comparisons were possible, since some papers reported more than one measurement of reliability. Mean kappa values ranged from 0.32 to 0.70. These may be inflated due to publication bias. Measured reliabilities were found to be higher for case-note reviews based on explicit, as opposed to implicit, criteria and for reviews that focused on outcome (including adverse effects) rather than process errors. We found an association between kappa and the prevalence of errors (poor quality care), suggesting alternatives such as tetrachoric and polychoric correlation coefficients be considered to assess inter-rater reliability. Comparative studies should take into account the relationship between kappa and the prevalence of the events being measured.

  9. Hypsarrhythmia assessment exhibits poor interrater reliability: a threat to clinical trial validity.

    Science.gov (United States)

    Hussain, Shaun A; Kwong, Grace; Millichap, John J; Mytinger, John R; Ryan, Nicole; Matsumoto, Joyce H; Wu, Joyce Y; Lerner, Jason T; Sankar, Raman

    2015-01-01

    Hypsarrhythmia is the classic interictal electroencephalographic pattern associated with infantile spasms, and characterized by high voltage, disorganization, and multifocal independent epileptiform discharges. Given this seemingly simple definition, one might expect excellent interrater reliability (IRR) in the identification of this pattern. Alternatively, it may be argued that assessments of voltage and disorganization are fairly subjective, and thus quite challenging in borderline cases. We sought to test the IRR of hypsarrhythmia assessment in a systematic fashion. Six blinded pediatric electroencephalographers from four centers reviewed 22 electroencephalography (EEG) samples from patients with infantile spasms. Each sample was 5 min in duration and included only wakefulness. Raters determined if each EEG was abnormal and if hypsarrhythmia was present/absent, and characterized relevant features: voltage, organization, epileptiform discharges, slowing, interictal attenuations, symmetry, and synchrony. In addition, raters indicated their level of confidence for each assessment. Multirater kappa statistics (κ) were calculated for the assessment of hypsarrhythmia and each feature. Although IRR was favorable in determining whether a study was normal or abnormal (κ=0.89), reliability was unfavorable for assessment of hypsarrhythmia (κ=0.40), modified hypsarrhythmia (κ=0.47), high voltage (κ=0.37), disorganization (κ=0.22), multifocal epileptiform discharges (κ=0.68), interictal voltage attenuations (κ=0.21), slowing (κ=0.20), asymmetry (κ=0.26), and asynchrony (κ=0.08). Despite generally unsatisfactory interrater agreement, raters consistently reported high confidence in assessments. This study contradicts the view that hypsarrhythmia assessment is straightforward. Even small variability in the identification of hypsarrhythmia has potentially deleterious consequences for clinical care, as its presence or absence impacts decisions to pursue high-risk and

  10. Education Research: Bias and poor interrater reliability in evaluating the neurology clinical skills examination

    Science.gov (United States)

    Schuh, L A.; London, Z; Neel, R; Brock, C; Kissela, B M.; Schultz, L; Gelb, D J.

    2009-01-01

    Objective: The American Board of Psychiatry and Neurology (ABPN) has recently replaced the traditional, centralized oral examination with the locally administered Neurology Clinical Skills Examination (NEX). The ABPN postulated the experience with the NEX would be similar to the Mini-Clinical Evaluation Exercise, a reliable and valid assessment tool. The reliability and validity of the NEX has not been established. Methods: NEX encounters were videotaped at 4 neurology programs. Local faculty and ABPN examiners graded the encounters using 2 different evaluation forms: an ABPN form and one with a contracted rating scale. Some NEX encounters were purposely failed by residents. Cohen’s kappa and intraclass correlation coefficients (ICC) were calculated for local vs ABPN examiners. Results: Ninety-eight videotaped NEX encounters of 32 residents were evaluated by 20 local faculty evaluators and 18 ABPN examiners. The interrater reliability for a determination of pass vs fail for each encounter was poor (kappa 0.32; 95% confidence interval [CI] = 0.11, 0.53). ICC between local faculty and ABPN examiners for each performance rating on the ABPN NEX form was poor to moderate (ICC range 0.14-0.44), and did not improve with the contracted rating form (ICC range 0.09-0.36). ABPN examiners were more likely than local examiners to fail residents. Conclusions: There is poor interrater reliability between local faculty and American Board of Psychiatry and Neurology examiners. A bias was detected for favorable assessment locally, which is concerning for the validity of the examination. Further study is needed to assess whether training can improve interrater reliability and offset bias. GLOSSARY ABIM = American Board of Internal Medicine; ABPN = American Board of Psychiatry and Neurology; CI = confidence interval; HFH = Henry Ford Hospital; ICC = intraclass correlation coefficients; IM = internal medicine; mini-CEX = Mini-Clinical Evaluation Exercise; NEX = Neurology Clinical

  11. Inter-rater reliability of three standardized functional tests in patients with low back pain

    Science.gov (United States)

    Tidstrand, Johan; Horneij, Eva

    2009-01-01

    Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs). Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ) and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0), for sitting on a Bobath ball good (κ: 0.79) and very good (κ: 0.88) and for the unilateral pelvic lift: good (κ: 0.61) and moderate (κ: 0.47). Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their ability to evaluate lumbar

  12. Inter-rater reliability of three standardized functional tests in patients with low back pain

    Directory of Open Access Journals (Sweden)

    Tidstrand Johan

    2009-06-01

    Full Text Available Abstract Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs. Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0, for sitting on a Bobath ball good (κ: 0.79 and very good (κ: 0.88 and for the unilateral pelvic lift: good (κ: 0.61 and moderate (κ: 0.47. Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their

  13. Reliability and Validity of Prototype Diagnosis for Adolescent Psychopathology.

    Science.gov (United States)

    Haggerty, Greg; Zodan, Jennifer; Mehra, Ashwin; Zubair, Ayyan; Ghosh, Krishnendu; Siefert, Caleb J; Sinclair, Samuel J; DeFife, Jared

    2016-04-01

    The current study investigated the interrater reliability and validity of prototype ratings of 5 common adolescent psychiatric disorders: attention-deficit/hyperactivity disorder, conduct disorder, major depressive disorder, generalized anxiety disorder, and posttraumatic stress disorder. One hundred fifty-seven adolescent inpatient participants consented to participate in this study. We compared ratings from 2 inpatient clinicians, blinded to each other's ratings and patient measures, after their separate initial diagnostic interview to assess interrater reliability. Prototype ratings completed by clinicians after their initial diagnostic interview with adolescent inpatients and outpatients were compared with patient-reported behavior problems and parents' report of their child's behavioral problems. Prototype ratings demonstrated good interrater reliability. Clinicians' prototype ratings showed predicted relationships with patient-reported behavior problems and parent-reported behavior problems. Prototype matching seems to be a possible alternative for psychiatric diagnosis. Prototype ratings showed good interrater reliability based on clinicians unique experiences with the patient (as opposed to video-/audio-recorded material) with no training.

  14. Reevaluating Interrater Reliability in Offender Risk Assessment

    NARCIS (Netherlands)

    van der Knaap, L.M.; Leenarts, L.E.W.; Born, M.P.; Oosterveld, P.

    2012-01-01

    Offender risk and needs assessment, one of the pillars of the risk-need-responsivity model of offender rehabilitation, usually depends on raters assessing offender risk and needs. The few available studies of interrater reliability in offender risk assessment are, however, limited in the

  15. Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability

    Science.gov (United States)

    Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca

    2018-01-01

    Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…

  16. Corrections for criterion reliability in validity generalization: The consistency of Hermes, the utility of Midas

    Directory of Open Access Journals (Sweden)

    Jesús F. Salgado

    2016-04-01

    Full Text Available There is criticism in the literature about the use of interrater coefficients to correct for criterion reliability in validity generalization (VG studies and disputing whether .52 is an accurate and non-dubious estimate of interrater reliability of overall job performance (OJP ratings. We present a second-order meta-analysis of three independent meta-analytic studies of the interrater reliability of job performance ratings and make a number of comments and reflections on LeBreton et al.s paper. The results of our meta-analysis indicate that the interrater reliability for a single rater is .52 (k = 66, N = 18,582, SD = .105. Our main conclusions are: (a the value of .52 is an accurate estimate of the interrater reliability of overall job performance for a single rater; (b it is not reasonable to conclude that past VG studies that used .52 as the criterion reliability value have a less than secure statistical foundation; (c based on interrater reliability, test-retest reliability, and coefficient alpha, supervisor ratings are a useful and appropriate measure of job performance and can be confidently used as a criterion; (d validity correction for criterion unreliability has been unanimously recommended by "classical" psychometricians and I/O psychologists as the proper way to estimate predictor validity, and is still recommended at present; (e the substantive contribution of VG procedures to inform HRM practices in organizations should not be lost in these technical points of debate.

  17. Inter-Rater Reliability of Cyclotorsion Measurements Using Fundus Photography.

    Science.gov (United States)

    Dysli, Muriel; Kanku, Madeleine; Traber, Ghislaine L

    2018-04-01

    The foveo-papillary angle (FPA) on fundus photographs is the accepted standard for the measurement of ocular cyclotorsion. We assessed the inter-rater reliability of this method in healthy subjects and in patients with trochlear nerve palsies. In this methodological study, fundus photographs of healthy subjects and of patients with trochlear nerve palsies were made with a fundus camera (Zeiss Fundus Camera FF 450 plus, Jena, Germany). Three independent observers measured the FPA on the fundus photographs of all subjects in synedra View (synedra View 16, Version 16.0.0.11, Innsbruck, Austria). One hundred and four eyes of 52 subjects (26 healthy controls and 26 patients) were assessed. The mean FPA of the healthy controls was 5.80 degrees (°) [± 0.44 standard error of the mean (SEM)] compared to 11.55° (± 0.80 SEM) for patients with trochlear nerve palsies. The inter-rater reliability of all measured FPAs showed an intraclass correlation coefficient (ICC) of 0.98 (95% CI 0.97 - 0.98). The inter-rater reliability of objective cyclotorsion measurements using fundus photographs was very high. Georg Thieme Verlag KG Stuttgart · New York.

  18. The timed "up and go" test : Reliability and validity in persons with unilateral lower limb amputation

    NARCIS (Netherlands)

    Schoppen, Tanneke; Boonstra, Antje; Groothoff, JW; de Vries, J; Goeken, LNH; Eisma, Willem

    Objective: To determine the interrater and interrater reliability and the validity of the Timed "up and go" test as a measure for physical mobility in elderly patients with an amputation of the lower extremity. Design: To test interrater reliability, the test was performed for two observers at

  19. Environmental education curriculum evaluation questionnaire: A reliability and validity study

    Science.gov (United States)

    Minner, Daphne Diane

    The intention of this research project was to bridge the gap between social science research and application to the environmental domain through the development of a theoretically derived instrument designed to give educators a template by which to evaluate environmental education curricula. The theoretical base for instrument development was provided by several developmental theories such as Piaget's theory of cognitive development, Developmental Systems Theory, Life-span Perspective, as well as curriculum research within the area of environmental education. This theoretical base fueled the generation of a list of components which were then translated into a questionnaire with specific questions relevant to the environmental education domain. The specific research question for this project is: Can a valid assessment instrument based largely on human development and education theory be developed that reliably discriminates high, moderate, and low quality in environmental education curricula? The types of analyses conducted to answer this question were interrater reliability (percent agreement, Cohen's Kappa coefficient, Pearson's Product-Moment correlation coefficient), test-retest reliability (percent agreement, correlation), and criterion-related validity (correlation). Face validity and content validity were also assessed through thorough reviews. Overall results indicate that 29% of the questions on the questionnaire demonstrated a high level of interrater reliability and 43% of the questions demonstrated a moderate level of interrater reliability. Seventy-one percent of the questions demonstrated a high test-retest reliability and 5% a moderate level. Fifty-five percent of the questions on the questionnaire were reliable (high or moderate) both across time and raters. Only eight questions (8%) did not show either interrater or test-retest reliability. The global overall rating of high, medium, or low quality was reliable across both coders and time, indicating

  20. Reevaluating Interrater Reliability in Offender Risk Assessment

    Science.gov (United States)

    van der Knaap, Leontien M.; Leenarts, Laura E. W.; Born, Marise Ph.; Oosterveld, Paul

    2012-01-01

    Offender risk and needs assessment, one of the pillars of the risk-need-responsivity model of offender rehabilitation, usually depends on raters assessing offender risk and needs. The few available studies of interrater reliability in offender risk assessment are, however, limited in the generalizability of their results. The present study…

  1. Inter-rater reliability of kinesthetic measurements with the KINARM robotic exoskeleton.

    Science.gov (United States)

    Semrau, Jennifer A; Herter, Troy M; Scott, Stephen H; Dukelow, Sean P

    2017-05-22

    Kinesthesia (sense of limb movement) has been extremely difficult to measure objectively, especially in individuals who have survived a stroke. The development of valid and reliable measurements for proprioception is important to developing a better understanding of proprioceptive impairments after stroke and their impact on the ability to perform daily activities. We recently developed a robotic task to evaluate kinesthetic deficits after stroke and found that the majority (~60%) of stroke survivors exhibit significant deficits in kinesthesia within the first 10 days post-stroke. Here we aim to determine the inter-rater reliability of this robotic kinesthetic matching task. Twenty-five neurologically intact control subjects and 15 individuals with first-time stroke were evaluated on a robotic kinesthetic matching task (KIN). Subjects sat in a robotic exoskeleton with their arms supported against gravity. In the KIN task, the robot moved the subjects' stroke-affected arm at a preset speed, direction and distance. As soon as subjects felt the robot begin to move their affected arm, they matched the robot movement with the unaffected arm. Subjects were tested in two sessions on the KIN task: initial session and then a second session (within an average of 18.2 ± 13.8 h of the initial session for stroke subjects), which were supervised by different technicians. The task was performed both with and without the use of vision in both sessions. We evaluated intra-class correlations of spatial and temporal parameters derived from the KIN task to determine the reliability of the robotic task. We evaluated 8 spatial and temporal parameters that quantify kinesthetic behavior. We found that the parameters exhibited moderate to high intra-class correlations between the initial and retest conditions (Range, r-value = [0.53-0.97]). The robotic KIN task exhibited good inter-rater reliability. This validates the KIN task as a reliable, objective method for quantifying

  2. Interrater reliability of videotaped observational gait-analysis assessments.

    Science.gov (United States)

    Eastlack, M E; Arvidson, J; Snyder-Mackler, L; Danoff, J V; McGarvey, C L

    1991-06-01

    The purpose of this study was to determine the interrater reliability of videotaped observational gait-analysis (VOGA) assessments. Fifty-four licensed physical therapists with varying amounts of clinical experience served as raters. Three patients with rheumatoid arthritis who demonstrated an abnormal gait pattern served as subjects for the videotape. The raters analyzed each patient's most severely involved knee during the four subphases of stance for the kinematic variables of knee flexion and genu valgum. Raters were asked to determine whether these variables were inadequate, normal, or excessive. The temporospatial variables analyzed throughout the entire gait cycle were cadence, step length, stride length, stance time, and step width. Generalized kappa coefficients ranged from .11 to .52. Intraclass correlation coefficients (2,1) and (3,1) were slightly higher. Our results indicate that physical therapists' VOGA assessments are only slightly to moderately reliable and that improved interrater reliability of the assessments of physical therapists utilizing this technique is needed. Our data suggest that there is a need for greater standardization of gait-analysis training.

  3. "A Comparison of Consensus, Consistency, and Measurement Approaches to Estimating Interrater Reliability"

    OpenAIRE

    Steven E. Stemler

    2004-01-01

    This article argues that the general practice of describing interrater reliability as a single, unified concept is..at best imprecise, and at worst potentially misleading. Rather than representing a single concept, different..statistical methods for computing interrater reliability can be more accurately classified into one of three..categories based upon the underlying goals of analysis. The three general categories introduced and..described in this paper are: 1) consensus estimates, 2) cons...

  4. Assessment of the nursing care product (APROCENF: a reliability and construct validity study

    Directory of Open Access Journals (Sweden)

    Danielle Fabiana Cucolo

    Full Text Available ABSTRACT Objectives: to verify the reliability and construct validity estimates of the "Assessment of nursing care product" scale (APROCENF and its applicability. Methods: this validation study included a sample of 40 (inter-rater reliability and 172 (construct validity assessments performed by nurses at the end of the work shift at nine inpatient services of a teaching hospital in the Brazilian Southeast. The data were collected between February and September/2014 with interruptions. Cronbach's alpha and Spearman's correlation coefficients were calculated, as well as the intraclass correlation and the weighted kappa index (inter-rater reliability. Exploratory factor analysis was used with principal component extraction and varimax rotation (construct validity. Results: the internal consistency revealed an alpha coefficient of 0.85, item-item correlation ranging between 0.13 and 0.61 and item-total correlation between 0.43 and 0.69. Inter-rater equivalence was obtained and all items evidenced significant factor loadings. Conclusion: this research evidenced the reliability and construct validity of the scale to assess the nursing care product. Its application in nursing practice permits identifying improvements needed in the production process, contributing to management and care decisions.

  5. Validity and reliability of balance assessment software using the Nintendo Wii balance board: usability and validation.

    Science.gov (United States)

    Park, Dae-Sung; Lee, GyuChang

    2014-06-10

    A balance test provides important information such as the standard to judge an individual's functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment.

  6. Content Validity Index and Intra- and Inter-Rater Reliability of a New Muscle Strength/Endurance Test Battery for Swedish Soldiers.

    Directory of Open Access Journals (Sweden)

    Helena Larsson

    Full Text Available The objective of this study was to examine the content validity of commonly used muscle performance tests in military personnel and to investigate the reliability of a proposed test battery. For the content validity investigation, thirty selected tests were those described in the literature and/or commonly used in the Nordic and North Atlantic Treaty Organization (NATO countries. Nine selected experts rated, on a four-point Likert scale, the relevance of these tests in relation to five different work tasks: lifting, carrying equipment on the body or in the hands, climbing, and digging. Thereafter, a content validity index (CVI was calculated for each work task. The result showed excellent CVI (≥0.78 for sixteen tests, which comprised of one or more of the military work tasks. Three of the tests; the functional lower-limb loading test (the Ranger test, dead-lift with kettlebells, and back extension, showed excellent content validity for four of the work tasks. For the development of a new muscle strength/endurance test battery, these three tests were further supplemented with two other tests, namely, the chins and side-bridge test. The inter-rater reliability was high (intraclass correlation coefficient, ICC2,1 0.99 for all five tests. The intra-rater reliability was good to high (ICC3,1 0.82-0.96 with an acceptable standard error of mean (SEM, except for the side-bridge test (SEM%>15. Thus, the final suggested test battery for a valid and reliable evaluation of soldiers' muscle performance comprised the following four tests; the Ranger test, dead-lift with kettlebells, chins, and back extension test. The criterion-related validity of the test battery should be further evaluated for soldiers exposed to varying physical workload.

  7. A Comparison of Three Methods for the Analysis of Skin Flap Viability: Reliability and Validity.

    Science.gov (United States)

    Tim, Carla Roberta; Martignago, Cintia Cristina Santi; da Silva, Viviane Ribeiro; Dos Santos, Estefany Camila Bonfim; Vieira, Fabiana Nascimento; Parizotto, Nivaldo Antonio; Liebano, Richard Eloin

    2018-05-01

    Objective: Technological advances have provided new alternatives to the analysis of skin flap viability in animal models; however, the interrater validity and reliability of these techniques have yet to be analyzed. The present study aimed to evaluate the interrater validity and reliability of three different methods: weight of paper template (WPT), paper template area (PTA), and photographic analysis. Approach: Sixteen male Wistar rats had their cranially based dorsal skin flap elevated. On the seventh postoperative day, the viable tissue area and the necrotic area of the skin flap were recorded using the paper template method and photo image. The evaluation of the percentage of viable tissue was performed using three methods, simultaneously and independently by two raters. The analysis of interrater reliability and viability was performed using the intraclass correlation coefficient and Bland Altman Plot Analysis was used to visualize the presence or absence of systematic bias in the evaluations of data validity. Results: The results showed that interrater reliability for WPT, measurement of PTA, and photographic analysis were 0.995, 0.990, and 0.982, respectively. For data validity, a correlation >0.90 was observed for all comparisons made between the three methods. In addition, Bland Altman Plot Analysis showed agreement between the comparisons of the methods and the presence of systematic bias was not observed. Innovation: Digital methods are an excellent choice for assessing skin flap viability; moreover, they make data use and storage easier. Conclusion: Independently from the method used, the interrater reliability and validity proved to be excellent for the analysis of skin flaps' viability.

  8. Inter-rater reliability of healthcare professional skills' portfolio assessments: The Andalusian Agency for Healthcare Quality model

    Directory of Open Access Journals (Sweden)

    Antonio Almuedo-Paz

    2014-07-01

    Full Text Available This study aims to determine the reliability of assessment criteria used for a portfolio at the Andalusian Agency for Healthcare Quality (ACSA. Data: all competences certification processes, regardless of their discipline. Period: 2010-2011. Three types of tests are used: 368 certificates, 17,895 reports and 22,642 clinical practice reports (N = 3,010 candidates. The tests were evaluated in pairs by the ACSA team of raters using two categories: valid and invalid. Results: The percentage agreement in assessments of certificates was 89,9%, while for the reports of clinical practice was 85,1 % and for clinical practice reports was 81,7%. The inter-rater agreement coefficients (kappa ranged from 0,468 to 0,711. Discussion: The results of this study show that the inter-rater reliability of assessments varies from fair to good. Compared with other similar studies, the results put the reliability of the model in a comfortable position. Among the improvements incorporated, progressive automation of evaluations must be highlighted.

  9. Inter-rater and intra-rater reliability of a clinical protocol for measuring turnout in collegiate dancers.

    Science.gov (United States)

    Greene, Amanda; Lasner, Andrea; Deu, Rajwinder; Oliphant, Seth; Johnson, Kenneth

    2018-02-02

    Reliable methods of measuring turnout in dancers and comparing active turnout (used in class) with functional (uncompensated) turnout are needed. Authors have suggested measurement techniques but there is no clinically useful, easily reproducible technique with established inter-rater and intra-rater reliability. We adapted a technique based on previous research, which is easily reproducible. We hypothesized excellent inter-rater and intra-rater reliability between experienced physical therapists (PTs) and a briefly trained faculty member from a university's department of dance. Thirty-two participants were recruited from the same dance department. Dancers' active and functional turnout was measured by each rater. We found that our technique for measuring active and functional turnout has excellent inter-rater and intra-rater reliability when performed by two experienced PTs and by one briefly trained university-level dance faculty member. For active turnout, inter-rater reliability was 0.78 among all raters and 0.82 among only the PT raters; intra-rater reliability was 0.82 among all raters and 0.85 among only the PT raters. For functional turnout, inter-rater reliability was 0.86 among all raters and 0.88 among only the PT raters; intra-rater reliability was 0.87 among all raters and 0.88 among only the PT raters. The measurement technique described provides a standardized protocol with excellent inter-rater and intra-rater reliability when performed by experienced PTs or by a briefly trained university-level dance faculty member.

  10. Reliability and Validity of the Activity Participation Assessment for School-age Children in Korea

    Directory of Open Access Journals (Sweden)

    Se-Yun Kim

    2016-12-01

    Conclusion: The APA shows good internal reliability, test–retest reliability, discriminant validity, and construct validity. However, evidence of psychometric properties was limited by a small sample size. Psychometric properties such as interrater reliability as well as concurrent validity and construct validity need to be tested using a larger sample size with representative demographics.

  11. Reliability and validity of the de Morton Mobility Index in individuals with sub-acute stroke.

    Science.gov (United States)

    Braun, Tobias; Marks, Detlef; Thiel, Christian; Grüneberg, Christian

    2018-02-04

    To establish the validity and reliability of the de Morton Mobility Index (DEMMI) in patients with sub-acute stroke. This cross-sectional study was performed in a neurological rehabilitation hospital. We assessed unidimensionality, construct validity, internal consistency reliability, inter-rater reliability, minimal detectable change and possible floor and ceiling effects of the DEMMI in adult patients with sub-acute stroke. The study included a total sample of 121 patients with sub-acute stroke. We analysed validity (n = 109) and reliability (n = 51) in two sub-samples. Rasch analysis indicated unidimensionality with an overall fit to the model (chi-square = 12.37, p = 0.577). All hypotheses on construct validity were confirmed. Internal consistency reliability (Cronbach's alpha = 0.94) and inter-rater reliability (intraclass correlation coefficient = 0.95; 95% confidence interval: 0.92-0.97) were excellent. The minimal detectable change with 90% confidence was 13 points. No floor or ceiling effects were evident. These results indicate unidimensionality, sufficient internal consistency reliability, inter-rater reliability, and construct validity of the DEMMI in patients with a sub-acute stroke. Advantages of the DEMMI in clinical application are the short administration time, no need for special equipment and interval level data. The de Morton Mobility Index, therefore, may be a useful performance-based bedside test to measure mobility in individuals with a sub-acute stroke across the whole mobility spectrum. Implications for Rehabilitation The de Morton Mobility Index (DEMMI) is an unidimensional measurement instrument of mobility in individuals with sub-acute stroke. The DEMMI has excellent internal consistency and inter-rater reliability, and sufficient construct validity. The minimal detectable change of the DEMMI with 90% confidence in stroke rehabilitation is 13 points. The lack of any floor or ceiling effects on hospital admission indicates

  12. Rater reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS).

    Science.gov (United States)

    Baker, Nancy A; Cook, James R; Redfern, Mark S

    2009-01-01

    This paper describes the inter-rater and intra-rater reliability, and the concurrent validity of an observational instrument, the Keyboard Personal Computer Style instrument (K-PeCS), which assesses stereotypical postures and movements associated with computer keyboard use. Three trained raters independently rated the video clips of 45 computer keyboard users to ascertain inter-rater reliability, and then re-rated a sub-sample of 15 video clips to ascertain intra-rater reliability. Concurrent validity was assessed by comparing the ratings obtained using the K-PeCS to scores developed from a 3D motion analysis system. The overall K-PeCS had excellent reliability [inter-rater: intra-class correlation coefficients (ICC)=.90; intra-rater: ICC=.92]. Most individual items on the K-PeCS had from good to excellent reliability, although six items fell below ICC=.75. Those K-PeCS items that were assessed for concurrent validity compared favorably to the motion analysis data for all but two items. These results suggest that most items on the K-PeCS can be used to reliably document computer keyboarding style.

  13. Interrater and Intrarater Reliability of the Balance Computerized Adaptive Test in Patients With Stroke.

    Science.gov (United States)

    Chiang, Hsin-Yu; Lu, Wen-Shian; Yu, Wan-Hui; Hsueh, I-Ping; Hsieh, Ching-Lin

    2018-04-11

    To examine the interrater and intrarater reliability of the Balance Computerized Adaptive Test (Balance CAT) in patients with chronic stroke having a wide range of balance functions. Repeated assessments design (1wk apart). Seven teaching hospitals. A pooled sample (N=102) including 2 independent groups of outpatients (n=50 for the interrater reliability study; n=52 for the intrarater reliability study) with chronic stroke. Not applicable. Balance CAT. For the interrater reliability study, the values of intraclass correlation coefficient, minimal detectable change (MDC), and percentage of MDC (MDC%) for the Balance CAT were .84, 1.90, and 31.0%, respectively. For the intrarater reliability study, the values of intraclass correlation coefficient, MDC, and MDC% ranged from .89 to .91, from 1.14 to 1.26, and from 17.1% to 18.6%, respectively. The Balance CAT showed sufficient intrarater reliability in patients with chronic stroke having balance functions ranging from sitting with support to independent walking. Although the Balance CAT may have good interrater reliability, we found substantial random measurement error between different raters. Accordingly, if the Balance CAT is used as an outcome measure in clinical or research settings, same raters are suggested over different time points to ensure reliable assessments. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  14. Interrater reliability of the Volume-Viscosity Swallow Test; screening for dysphagia among hospitalized elderly medical patients.

    Science.gov (United States)

    Jørgensen, Lise Walther; Søndergaard, Kasper; Melgaard, Dorte; Warming, Susan

    2017-12-01

    Oropharyngeal dysphagia (OD) is prevalent among medical and geriatric patients admitted due to acute illness and it is associated with malnutrition, increased length of stay and increased mortality. A valid and reliable bedside screening test for patients at risk of OD is essential in order to detect patients in need of further assessment. The Volume-Viscosity Swallow Test (V-VST) has been shown to be a valid screening test for OD in mixed outpatient populations. However, as reliability of the test has yet to be investigated in a population of medical and geriatric patients admitted due to acute illness, we aimed to determine the interrater reliability of the V-VST in this clinical setting. Reporting in this study is in accordance with proposed guidelines for the reporting of reliability and agreement studies (GRRAS). In three Danish hospitals (CRD-BFH, CRD-GH, NDR-H) 11 skilled occupational therapists examined an unselected group of 110 patients admitted to geriatric or medical wards. In an overall agreement phase raters reached ≥80% agreement before data collection phase was commenced. The V-VST was applied to patients twice within maximum one hour by raters who administrated the test in an order based on randomization, blinded to each other's results. Agreement, Kappa values, weighed Kappa values and Kappa adjusted for bias and prevalence are reported. The interrater reliability of V-VST as screening test for OD in patients admitted to geriatric or medical wards was substantial with an overall Kappa value of 0.77 (95% CI 0.65-0.89) however interrater reliability varied among hospitals ranging from 0.37 (95% CI -0.01 to 0.41) to 0.85 (95% CI 0.75-1.00). Interrater reliability of the accompanying recommendations of volume and viscosity was moderate with a weighted kappa value of 0.55 (95% CI 0.37-0.73) for viscosity and 0.53 (95% CI 0.36-0.7) for volume. The overall prevalence of OD was 34.5%, ranging from 8% to 53.6% across hospitals. The prevalence and bias

  15. Introducing a new definition of a near fall: intra-rater and inter-rater reliability.

    Science.gov (United States)

    Maidan, I; Freedman, T; Tzemah, R; Giladi, N; Mirelman, A; Hausdorff, J M

    2014-01-01

    Near falls (NFs) are more frequent than falls, and may occur before falls, potentially predicting fall risk. As such, identification of a NF is important. We aimed to assess intra and inter-rater reliability of the traditional definition of a NF and to demonstrate the potential utility of a new definition. To this end, 10 older adults, 10 idiopathic elderly fallers, and 10 patients with Parkinson's disease (PD) walked in an obstacle course while wearing a safety harness. All walks were videotaped. Forty-nine video segments were extracted to create 2 clips each of 8.48 min. Four raters scored each event using the traditional definition and, two weeks later, using the new definition. A fifth rater used only the new definition. Intra-rater reliability was determined using Kappa (K) statistics and inter-rater reliability was determined using ICC. Using the traditional definition, three raters had poor intra-rater reliability (K0.137) and one rater had moderate intra-rater reliability (K=0.624, pdefinition, inter-rater reliability between the four raters was moderate (ICC=0.667, pdefinition showed high intra-rater (K>0.601, pdefinition of NF is required. The results of the present study suggest that the proposed new definition increases intra and inter-rater reliability, a critical step for using NFs to quantify fall risk. Copyright © 2013 Elsevier B.V. All rights reserved.

  16. Inter-rater reliability of measures to characterize the tobacco retail environment in Mexico

    Directory of Open Access Journals (Sweden)

    Marissa G Hall

    2015-11-01

    Full Text Available Objective. To evaluate the inter-rater reliability of a data collection instrument to assess the tobacco retail environ- ment in Mexico, after major marketing regulations were implemented. Materials and methods. In 2013, two data collectors independently evaluated 21 stores in two census tracts, through a data collection instrument that assessed the presence of price promotions, whether single cigarettes were sold, the number of visible advertisements, the pre- sence of signage prohibiting the sale of cigarettes to minors, and characteristics of cigarette pack displays. We evaluated the inter-rater reliability of the collected data, through the calculation of metrics such as intraclass correlation coefficient, percent agreement, Cohen’s kappa and Krippendorff’s alpha. Results. Most measures demonstrated substantial or perfect inter-rater reliability. Conclusions. Our results indicate the potential utility of the data collection instrument for future point-of-sale research.

  17. Analyses of inter-rater reliability between professionals, medical students and trained school children as assessors of basic life support skills.

    Science.gov (United States)

    Beck, Stefanie; Ruhnke, Bjarne; Issleib, Malte; Daubmann, Anne; Harendza, Sigrid; Zöllner, Christian

    2016-10-07

    Training of lay-rescuers is essential to improve survival-rates after cardiac arrest. Multiple campaigns emphasise the importance of basic life support (BLS) training for school children. Trainings require a valid assessment to give feedback to school children and to compare the outcomes of different training formats. Considering these requirements, we developed an assessment of BLS skills using MiniAnne and tested the inter-rater reliability between professionals, medical students and trained school children as assessors. Fifteen professional assessors, 10 medical students and 111-trained school children (peers) assessed 1087 school children at the end of a CPR-training event using the new assessment format. Analyses of inter-rater reliability (intraclass correlation coefficient; ICC) were performed. Overall inter-rater reliability of the summative assessment was high (ICC = 0.84, 95 %-CI: 0.84 to 0.86, n = 889). The number of comparisons between peer-peer assessors (n = 303), peer-professional assessors (n = 339), and peer-student assessors (n = 191) was adequate to demonstrate high inter-rater reliability between peer- and professional-assessors (ICC: 0.76), peer- and student-assessors (ICC: 0.88) and peer- and other peer-assessors (ICC: 0.91). Systematic variation in rating of specific items was observed for three items between professional- and peer-assessors. Using this assessment and integrating peers and medical students as assessors gives the opportunity to assess hands-on skills of school children with high reliability.

  18. Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps.

    Science.gov (United States)

    Powell, Adam C; Torous, John; Chan, Steven; Raynor, Geoffrey Stephen; Shwarts, Erik; Shanahan, Meghan; Landman, Adam B

    2016-02-10

    There are over 165,000 mHealth apps currently available to patients, but few have undergone an external quality review. Furthermore, no standardized review method exists, and little has been done to examine the consistency of the evaluation systems themselves. We sought to determine which measures for evaluating the quality of mHealth apps have the greatest interrater reliability. We identified 22 measures for evaluating the quality of apps from the literature. A panel of 6 reviewers reviewed the top 10 depression apps and 10 smoking cessation apps from the Apple iTunes App Store on these measures. Krippendorff's alpha was calculated for each of the measures and reported by app category and in aggregate. The measure for interactiveness and feedback was found to have the greatest overall interrater reliability (alpha=.69). Presence of password protection (alpha=.65), whether the app was uploaded by a health care agency (alpha=.63), the number of consumer ratings (alpha=.59), and several other measures had moderate interrater reliability (alphas>.5). There was the least agreement over whether apps had errors or performance issues (alpha=.15), stated advertising policies (alpha=.16), and were easy to use (alpha=.18). There were substantial differences in the interrater reliabilities of a number of measures when they were applied to depression versus smoking apps. We found wide variation in the interrater reliability of measures used to evaluate apps, and some measures are more robust across categories of apps than others. The measures with the highest degree of interrater reliability tended to be those that involved the least rater discretion. Clinical quality measures such as effectiveness, ease of use, and performance had relatively poor interrater reliability. Subsequent research is needed to determine consistent means for evaluating the performance of apps. Patients and clinicians should consider conducting their own assessments of apps, in conjunction with

  19. Construct validity and inter-rater reliability of the Dutch activity measure for post-acute care "6-clicks" basic mobility form to assess the mobility of hospitalized patients.

    Science.gov (United States)

    Geelen, Sven Jacobus Gertruda; Valkenet, Karin; Veenhof, Cindy

    2018-05-12

    To evaluate the construct validity and the inter-rater reliability of the Dutch Activity Measure for Post-Acute Care "6-clicks" Basic Mobility short form measuring the patient's mobility in Dutch hospital care. First, the "6-clicks" was translated by using a forward-backward translation protocol. Next, 64 patients were assessed by the physiotherapist to determine the validity while being admitted to the Internal Medicine wards of a university medical center. Six hypotheses were tested regarding the construct "mobility" which showed that: Better "6-clicks" scores were related to less restrictive pre-admission living situations (p = 0.011), less restrictive discharge locations (p = 0.001), more independence in activities of daily living (p = 0.001) and less physiotherapy visits (p Dutch "6-clicks" shows a good construct validity and moderate-to-excellent inter-rater reliability when used to assess the mobility of hospitalized patients. Implications for Rehabilitation Even though various measurement tools have been developed, it appears the majority of physiotherapists working in a hospital currently do not use these tools as a standard part of their care. The Activity Measure for Post-Acute Care "6-clicks" Basic Mobility is the only tool which is designed to be short, easy to use within usual care and has been validated in the entire hospital population. This study shows that the Dutch version of the Activity Measure for Post-Acute Care "6-clicks" Basic Mobility form is a valid, easy to use, quick tool to assess the basic mobility of Dutch hospitalized patients.

  20. Inter-rater reliability of diagnostic criteria for sacroiliac joint-, disc- and facet joint pain.

    Science.gov (United States)

    van Tilburg, Cornelis W J; Groeneweg, Johannes G; Stronks, Dirk L; Huygen, Frank J P M

    2017-01-01

    Several diagnostic criteria sets are described in the literature to identify low back pain subtypes, but very little is known about the inter-rater reliability of these criteria. We conducted a study to determine the reliability of diagnostic tests that point towards SI joint-, disc- or facet joint pain. Inter-rater reliability study alongside three randomized clinical trials. Multidisciplinary pain center of general hospital. Patients aged 18 or more with medical history and physical examination suggestive of sacroiliac joint-, disc- and facet joint pain on lumbar level. Making use of nowadays most common used diagnostic criteria, a physical examination is taken independently by three physicians (two pain physicians and one orthopedic surgeon). Inter-rater reliability (Kappa (κ) measure of agreement) and significance (p) between raters are presented. Strengths of agreement, indicated with κ values above 0,20, are presented in order of agreement. One hundred patients were included. None of the parameters from the physical investigation had κ values of more than 0.21 (fair) in all pairs of raters. Between two raters (C and D), there was an almost perfect agreement on three parameters, more specifically ``Abnormal sensory and motor examination, hyperactive or diminished reflexes'', ``Sitting exam shows no reflex, motor or sensory signs in the legs'' and ``Straight leg raising (Laségue) negative between 30 and 70 degrees of flexion''. The ``Drop test positive'' parameters had moderate strength of agreement between raters A and D and fair strength between raters A and B. The ``Digital interspinous pressure test positive'' had moderate strength of agreement between raters C and D and fair strength of agreement between raters A and B as well as raters B and C. Three other parameters had a fair strength of agreement between two raters, all other parameters had a slight or poor strength of agreement. Inter-rater reliability, confidence intervals and significance of

  1. Reliability and validity of a nutrition and physical activity environmental self-assessment for child care

    Directory of Open Access Journals (Sweden)

    Ammerman Alice S

    2007-07-01

    Full Text Available Abstract Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for

  2. Grant Peer Review: Improving Inter-Rater Reliability with Training.

    Science.gov (United States)

    Sattler, David N; McKnight, Patrick E; Naney, Linda; Mathis, Randy

    2015-01-01

    This study developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-rater reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers--especially those with experience--have good understanding of the grant review rating scale. The findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. The results underscore the benefits of and need for specialized peer reviewer training.

  3. Orthopaedic nurses' knowledge and interrater reliability of neurovascular assessments with 2-point discrimination test.

    Science.gov (United States)

    Turney, Jennifer; Raley Noble, Deana; Kim, Son Chae

    2013-01-01

    : This study was conducted to evaluate the effects of education on knowledge and interrater reliability of neurovascular assessments with 2-point discrimination (2-PD) test among pediatric orthopaedic nurses. : A pre- and posttest study was done among 60 nurses attending 2-hour educational sessions. Neurovascular assessments with 2-PD test were performed on 64 casted pediatric patients by the nurses and 5 nurse experts before and after the educational sessions. : The mean neurovascular assessment knowledge score was improved at posteducation compared with the preeducation (p < .001). The 2-PD test interrater reliability also improved from Cohen's kappa value of 0.24-0.48 at posteducation. : The 2-hour educational session may be effective in improving nurses' knowledge and the interrater reliability of neurovascular assessment with 2-PD test.

  4. Inter-rater reliability of AMSTAR is dependent on the pair of reviewers.

    Science.gov (United States)

    Pieper, Dawid; Jacobs, Anja; Weikert, Beate; Fishta, Alba; Wegewitz, Uta

    2017-07-11

    Inter-rater reliability (IRR) is mainly assessed based on only two reviewers of unknown expertise. The aim of this paper is to examine differences in the IRR of the Assessment of Multiple Systematic Reviews (AMSTAR) and R(evised)-AMSTAR depending on the pair of reviewers. Five reviewers independently applied AMSTAR and R-AMSTAR to 16 systematic reviews (eight Cochrane reviews and eight non-Cochrane reviews) from the field of occupational health. Responses were dichotomized and reliability measures were calculated by applying Holsti's method (r) and Cohen's kappa (κ) to all potential pairs of reviewers. Given that five reviewers participated in the study, there were ten possible pairs of reviewers. Inter-rater reliability varied for AMSTAR between r = 0.82 and r = 0.98 (median r = 0.88) using Holsti's method and κ = 0.41 and κ = 0.69 (median κ = 0.52) using Cohen's kappa and for R-AMSTAR between r = 0.77 and r = 0.89 (median r = 0.82) and κ = 0.32 and κ = 0.67 (median κ = 0.45) depending on the pair of reviewers. The same pair of reviewers yielded the highest IRR for both instruments. Pairwise Cohen's kappa reliability measures showed a moderate correlation between AMSTAR and R-AMSTAR (Spearman's ρ =0.50). The mean inter-rater reliability for AMSTAR was highest for item 1 (κ = 1.00) and item 5 (κ = 0.78), while lowest values were found for items 3, 8, 9 and 11, which showed only fair agreement. Inter-rater reliability varies widely depending on the pair of reviewers. There may be some shortcomings associated with conducting reliability studies with only two reviewers. Further studies should include additional reviewers and should probably also take account of their level of expertise.

  5. INTER-RATER RELIABILITY FOR MOVEMENT PATTERN ANALYSIS (MPA: MEASURING PATTERNING OF BEHAVIORS VERSUS DISCRETE BEHAVIOR COUNTS AS INDICATORS OF DECISION-MAKING STYLE

    Directory of Open Access Journals (Sweden)

    Brenda L Connors

    2014-06-01

    Full Text Available The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic – the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts – and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from Movement Pattern Analysis (MPA, an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective, inter-rater reliability for patterning (proportional indicators of each factor was significantly higher and excellent (ICC = .89. Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring discrete behavioral counts versus patterning of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns.

  6. Inter-rater reliability of the South African Triage Scale: Assessing two different cadres of health care workers in a real time environment

    Directory of Open Access Journals (Sweden)

    Michèle Twomey

    2011-09-01

    Conclusion: The inter-rater reliability of SATS ratings is excellent within individual HCWs, but significantly lower between different HCWs. This confirms previous reliability studies of the SATS using vignettes and if validated by larger studies would support the feasibility of further implementation of the SATS in primary health care settings across the Western Cape.

  7. Interrater and Intrarater Reliability of the Tuck Jump Assessment by Health Professionals of Varied Educational Backgrounds

    Directory of Open Access Journals (Sweden)

    Lisa A. Dudley

    2013-01-01

    Full Text Available Objective. The Tuck Jump Assessment (TJA, a clinical plyometric assessment, identifies 10 jumping and landing technique flaws. The study objective was to investigate TJA interrater and intrarater reliability with raters of different educational and clinical backgrounds. Methods. 40 participants were video recorded performing the TJA using published protocol and instructions. Five raters of varied educational and clinical backgrounds scored the TJA. Each score of the 10 technique flaws was summed for the total TJA score. Approximately one month later, 3 raters scored the videos again. Intraclass correlation coefficients determined interrater (5 and 3 raters for first and second session, resp. and intrarater (3 raters reliability. Results. Interrater reliability with 5 raters was poor (ICC = 0.47; 95% confidence intervals (CI 0.33–0.62. Interrater reliability between 3 raters who completed 2 scoring sessions improved from 0.52 (95% CI 0.35–0.68 for session one to 0.69 (95% CI 0.55–0.81 for session two. Intrarater reliability was poor to moderate, ranging from 0.44 (95% CI 0.22–0.68 to 0.72 (95% CI 0.55–0.84. Conclusion. Published protocol and training of raters were insufficient to allow consistent TJA scoring. There may be a learned effect with the TJA since interrater reliability improved with repetition. TJA instructions and training should be modified and enhanced before clinical implementation.

  8. Intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injuries.

    Science.gov (United States)

    Wangensteen, Arnlaug; Tol, Johannes L; Roemer, Frank W; Bahr, Roald; Dijkstra, H Paul; Crema, Michel D; Farooq, Abdulaziz; Guermazi, Ali

    2017-04-01

    To assess and compare the intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injury. Male athletes (n=40) with clinical diagnosis of acute hamstring injury and MRI ≤5days were selected from a prospective cohort. Two radiologists independently evaluated the MRIs using standardised scoring form including the modified Peetrons grading system, the Chan acute muscle strain injury classification and the British Athletics Muscle Injury Classification. Intra-and interrater reliability was assessed with linear weighted kappa (κ) or unweighted Cohen's κ and percentage agreement was calculated. We observed 'substantial' to 'almost perfect' intra- (κ range 0.65-1.00) and interrater reliability (κ range 0.77-1.00) with percentage agreement 83-100% and 88-100%, respectively, for severity gradings, overall anatomical sites and overall classifications for the three MRI systems. We observed substantial variability (κ range -0.05 to 1.00) for subcategories within the Chan classification and the British Athletics Muscle Injury Classification, however, the prevalence of positive scorings was low for some subcategories. The modified Peetrons grading system, overall Chan classification and overall British Athletics Muscle Injury Classification demonstrated 'substantial' to 'almost perfect' intra- and interrater reliability when scored by experienced radiologists. The intra- and interrater reliability for the anatomical subcategories within the classifications remains unclear. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Binge Eating Disorder: Reliability and Validity of a New Diagnostic Category.

    Science.gov (United States)

    Brody, Michelle L.; And Others

    1994-01-01

    Examined reliability and validity of binge eating disorder (BED), proposed for inclusion in Diagnostic and Statistical Manual of Mental Disorders (DSM), fourth edition. Interrater reliability of BED diagnosis compared favorably with that of most diagnoses in DSM revised third edition. Study comparing obese individuals with and without BED and…

  10. Intra and Inter-Rater Reliability of Screening for Movement Impairments: Movement Control Tests from The Foundation Matrix

    Science.gov (United States)

    Mischiati, Carolina R.; Comerford, Mark; Gosford, Emma; Swart, Jacqueline; Ewings, Sean; Botha, Nadine; Stokes, Maria; Mottram, Sarah L.

    2015-01-01

    Pre-season screening is well established within the sporting arena, and aims to enhance performance and reduce injury risk. With the increasing need to identify potential injury with greater accuracy, a new risk assessment process has been produced; The Performance Matrix (battery of movement control tests). As with any new method of objective testing, it is fundamental to establish whether the same results can be reproduced between examiners and by the same examiner on consecutive occasions. This study aimed to determine the intra-rater test re-test and inter-rater reliability of tests from a component of The Performance Matrix, The Foundation Matrix. Twenty participants were screened by two experienced musculoskeletal therapists using nine tests to assess the ability to control movement during specific tasks. Movement evaluation criteria for each test were rated as pass or fail. The therapists observed participants real-time and tests were recorded on video to enable repeated ratings four months later to examine intra-rater reliability (videos rated two weeks apart). Overall test percentage agreement was 87% for inter-rater reliability; 98% Rater 1, 94% Rater 2 for test re-test reliability; and 75% for real-time versus video. Intraclass-correlation coefficients (ICCs) were excellent between raters (0.81) and within raters (Rater 1, 0.96; Rater 2, 0.88) but poor for real-time versus video (0.23). Reliability for individual components of each test was more variable: inter-rater, 68-100%; intra-rater, 88-100% Rater 1, 75-100% Rater 2; and real-time versus video 31-100%. Cohen’s Kappa values for inter-rater reliability were 0.0-1.0; intra-rater 0.6-1.0 for Rater 1; -0.1-1.0 for Rater 2; and -0.1-1 for real-time versus video. It is concluded that both inter and intra-rater reliability of tests in The Foundation Matrix are acceptable when rated by experienced therapists. Recommendations are made for modifying some of the criteria to improve reliability where

  11. Rater reliability and construct validity of a mobile application for posture analysis.

    Science.gov (United States)

    Szucs, Kimberly A; Brown, Elena V Donoso

    2018-01-01

    [Purpose] Measurement of posture is important for those with a clinical diagnosis as well as researchers aiming to understand the impact of faulty postures on the development of musculoskeletal disorders. A reliable, cost-effective and low tech posture measure may be beneficial for research and clinical applications. The purpose of this study was to determine rater reliability and construct validity of a posture screening mobile application in healthy young adults. [Subjects and Methods] Pictures of subjects were taken in three standing positions. Two raters independently digitized the static standing posture image twice. The app calculated posture variables, including sagittal and coronal plane translations and angulations. Intra- and inter-rater reliability were calculated using the appropriate ICC models for complete agreement. Construct validity was determined through comparison of known groups using repeated measures ANOVA. [Results] Intra-rater reliability ranged from 0.71 to 0.99. Inter-rater reliability was good to excellent for all translations. ICCs were stronger for translations versus angulations. The construct validity analysis found that the app was able to detect the change in the four variables selected. [Conclusion] The posture mobile application has demonstrated strong rater reliability and preliminary evidence of construct validity. This application may have utility in clinical and research settings.

  12. Validity and Reliability Study of the Korean Tinetti Mobility Test for Parkinson's Disease.

    Science.gov (United States)

    Park, Jinse; Koh, Seong-Beom; Kim, Hee Jin; Oh, Eungseok; Kim, Joong-Seok; Yun, Ji Young; Kwon, Do-Young; Kim, Younsoo; Kim, Ji Seon; Kwon, Kyum-Yil; Park, Jeong-Ho; Youn, Jinyoung; Jang, Wooyoung

    2018-01-01

    Postural instability and gait disturbance are the cardinal symptoms associated with falling among patients with Parkinson's disease (PD). The Tinetti mobility test (TMT) is a well-established measurement tool used to predict falls among elderly people. However, the TMT has not been established or widely used among PD patients in Korea. The purpose of this study was to evaluate the reliability and validity of the Korean version of the TMT for PD patients. Twenty-four patients diagnosed with PD were enrolled in this study. For the interrater reliability test, thirteen clinicians scored the TMT after watching a video clip. We also used the test-retest method to determine intrarater reliability. For concurrent validation, the unified Parkinson's disease rating scale, Hoehn and Yahr staging, Berg Balance Scale, Timed-Up and Go test, 10-m walk test, and gait analysis by three-dimensional motion capture were also used. We analyzed receiver operating characteristic curve to predict falling. The interrater reliability and intrarater reliability of the Korean Tinetti balance scale were 0.97 and 0.98, respectively. The interrater reliability and intra-rater reliability of the Korean Tinetti gait scale were 0.94 and 0.96, respectively. The Korean TMT scores were significantly correlated with the other clinical scales and three-dimensional motion capture. The cutoff values for predicting falling were 14 points (balance subscale) and 10 points (gait subscale). We found that the Korean version of the TMT showed excellent validity and reliability for gait and balance and had high sensitivity and specificity for predicting falls among patients with PD.

  13. Inter-rater and intrarater reliability of the South African Triage Scale in low-resource settings of Haiti and Afghanistan.

    Science.gov (United States)

    Dalwai, Mohammed; Tayler-Smith, Katie; Twomey, Michèle; Nasim, Masood; Popal, Abdul Qayum; Haqdost, Waliul Haq; Gayraud, Olivia; Cheréstal, Sophia; Wallis, Lee; Valles, Pola

    2018-03-16

    The South African Triage Scale (SATS) has demonstrated good validity in the EDs of Médecins Sans Frontières (MSF)-supported sites in Afghanistan and Haiti; however, corresponding reliability in these settings has not yet been reported on. This study set out to assess the inter-rater and intrarater reliability of the SATS in four MSF-supported EDs in Afghanistan and Haiti (two trauma-only EDs and two mixed (including both medical and trauma cases) EDs). Under classroom conditions between December 2013 and February 2014, ED nurses at each site assigned triage ratings to a set of context-specific vignettes (written case reports of ED patients). Inter-rater reliability was assessed by comparing triage ratings among nurses; intrarater reliability was assessed by asking the nurses to retriage 10 random vignettes from the original set and comparing these duplicate ratings. Inter-rater reliability was calculated using the unweighted kappa, linearly weighted kappa and quadratically weighted kappa (QWK) statistics, and the intraclass correlation coefficient (ICC). Intrarater reliability was calculated according to the percentage of exact agreement and the percentage of agreement allowing for one level of discrepancy in triage ratings. The correlation between years of nursing experience and reliability of the SATS was assessed based on comparison of ICCs and the respective 95% CIs. A total of 67 nurses agreed to participate in the study: In Afghanistan there were 19 nurses from Kunduz Trauma Centre and nine from Ahmed Shah Baba; in Haiti, there were 20 nurses from Martissant Emergency Centre and 19 from Tabarre Surgical and Trauma Centre. Inter-rater agreement was moderate across all sites (ICC range: 0.50-0.60; QWK range: 0.50-0.59) apart from the trauma ED in Haiti where it was moderate to substantial (ICC: 0.58; QWK: 0.61). Intrarater agreement was similar across the four sites (68%-74% exact agreement); when allowing for a one-level discrepancy in triage ratings

  14. Rating scales for dystonia in cerebral palsy: reliability and validity.

    Science.gov (United States)

    Monbaliu, E; Ortibus, E; Roelens, F; Desloovere, K; Deklerck, J; Prinzie, P; de Cock, P; Feys, H

    2010-06-01

    This study investigated the reliability and validity of the Barry-Albright Dystonia Scale (BADS), the Burke-Fahn-Marsden Movement Scale (BFMMS), and the Unified Dystonia Rating Scale (UDRS) in patients with bilateral dystonic cerebral palsy (CP). Three raters independently scored videotapes of 10 patients (five males, five females; mean age 13 y 3 mo, SD 5 y 2 mo, range 5-22 y). One patient each was classified at levels I-IV in the Gross Motor Function Classification System and six patients were classified at level V. Reliability was measured by (1) intraclass correlation coefficient (ICC) for interrater reliability, (2) standard error of measurement (SEM) and smallest detectable difference (SDD), and (3) Cronbach's alpha for internal consistency. Validity was assessed by Pearson's correlations among the three scales used and by content analysis. Moderate to good interrater reliability was found for total scores of the three scales (ICC: BADS=0.87; BFMMS=0.86; UDRS=0.79). However, many subitems showed low reliability, in particular for the UDRS. SEM and SDD were respectively 6.36% and 17.72% for the BADS, 9.88% and 27.39% for the BFMMS, and 8.89% and 24.63% for the UDRS. High internal consistency was found. Pearson's correlations were high. Content validity showed insufficient accordance with the new CP definition and classification. Our results support the internal consistency and concurrent validity of the scales; however, taking into consideration the limitations in reliability, including the large SDD values and the content validity, further research on methods of assessment of dystonia is warranted.

  15. Reliability and validity of the Wolfram Unified Rating Scale (WURS

    Directory of Open Access Journals (Sweden)

    Nguyen Chau

    2012-11-01

    Full Text Available Abstract Background Wolfram syndrome (WFS is a rare, neurodegenerative disease that typically presents with childhood onset insulin dependent diabetes mellitus, followed by optic atrophy, diabetes insipidus, deafness, and neurological and psychiatric dysfunction. There is no cure for the disease, but recent advances in research have improved understanding of the disease course. Measuring disease severity and progression with reliable and validated tools is a prerequisite for clinical trials of any new intervention for neurodegenerative conditions. To this end, we developed the Wolfram Unified Rating Scale (WURS to measure the severity and individual variability of WFS symptoms. The aim of this study is to develop and test the reliability and validity of the Wolfram Unified Rating Scale (WURS. Methods A rating scale of disease severity in WFS was developed by modifying a standardized assessment for another neurodegenerative condition (Batten disease. WFS experts scored the representativeness of WURS items for the disease. The WURS was administered to 13 individuals with WFS (6-25 years of age. Motor, balance, mood and quality of life were also evaluated with standard instruments. Inter-rater reliability, internal consistency reliability, concurrent, predictive and content validity of the WURS were calculated. Results The WURS had high inter-rater reliability (ICCs>.93, moderate to high internal consistency reliability (Cronbach’s α = 0.78-0.91 and demonstrated good concurrent and predictive validity. There were significant correlations between the WURS Physical Assessment and motor and balance tests (rs>.67, ps>.76, ps=-.86, p=.001. The WURS demonstrated acceptable content validity (Scale-Content Validity Index=0.83. Conclusions These preliminary findings demonstrate that the WURS has acceptable reliability and validity and captures individual differences in disease severity in children and young adults with WFS.

  16. Intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injuries

    Energy Technology Data Exchange (ETDEWEB)

    Wangensteen, Arnlaug, E-mail: arnlaug.wangensteen@nih.no [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Oslo Sports Trauma Research Center, Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo (Norway); Tol, Johannes L., E-mail: johannes.tol@aspetar.com [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Amsterdam Center for Evidence Sports Medicine, Academic Medical Center (Netherlands); The Sports Physician Group, OLVG, Amsterdam (Netherlands); Roemer, Frank W. [Quantitative Imaging Center, Department of Radiology, Boston University School of Medicine, Boston, MA (United States); Department of Radiology, University of Erlangen-Nuremberg, Erlangen (Germany); Bahr, Roald [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Oslo Sports Trauma Research Center, Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo (Norway); Dijkstra, H. Paul [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Crema, Michel D. [Quantitative Imaging Center, Department of Radiology, Boston University School of Medicine, Boston, MA (United States); Department of Radiology, Saint-Antoine Hospital, University Paris VI, Paris (France); Farooq, Abdulaziz [Aspetar, Orthopaedic and Sports Medicine Hospital, Doha (Qatar); Guermazi, Ali [Quantitative Imaging Center, Department of Radiology, Boston University School of Medicine, Boston, MA (United States)

    2017-04-15

    Highlights: • Three different MRI grading and classification systems for acute hamstring injuries are overall reliable. • Reliability for the subcategories within these MRI grading and classification systems remains, however, unclear. - Abstract: Objective: To assess and compare the intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injury. Methods: Male athletes (n = 40) with clinical diagnosis of acute hamstring injury and MRI ≤5 days were selected from a prospective cohort. Two radiologists independently evaluated the MRIs using standardised scoring form including the modified Peetrons grading system, the Chan acute muscle strain injury classification and the British Athletics Muscle Injury Classification. Intra-and interrater reliability was assessed with linear weighted kappa (κ) or unweighted Cohen's κ and percentage agreement was calculated. Results: We observed ‘substantial’ to ‘almost perfect’ intra- (κ range 0.65–1.00) and interrater reliability (κ range 0.77–1.00) with percentage agreement 83–100% and 88–100%, respectively, for severity gradings, overall anatomical sites and overall classifications for the three MRI systems. We observed substantial variability (κ range −0.05 to 1.00) for subcategories within the Chan classification and the British Athletics Muscle Injury Classification, however, the prevalence of positive scorings was low for some subcategories. Conclusions: The modified Peetrons grading system, overall Chan classification and overall British Athletics Muscle Injury Classification demonstrated ‘substantial' to ‘almost perfect' intra- and interrater reliability when scored by experienced radiologists. The intra- and interrater reliability for the anatomical subcategories within the classifications remains unclear.

  17. Intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injuries

    International Nuclear Information System (INIS)

    Wangensteen, Arnlaug; Tol, Johannes L.; Roemer, Frank W.; Bahr, Roald; Dijkstra, H. Paul; Crema, Michel D.; Farooq, Abdulaziz; Guermazi, Ali

    2017-01-01

    Highlights: • Three different MRI grading and classification systems for acute hamstring injuries are overall reliable. • Reliability for the subcategories within these MRI grading and classification systems remains, however, unclear. - Abstract: Objective: To assess and compare the intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injury. Methods: Male athletes (n = 40) with clinical diagnosis of acute hamstring injury and MRI ≤5 days were selected from a prospective cohort. Two radiologists independently evaluated the MRIs using standardised scoring form including the modified Peetrons grading system, the Chan acute muscle strain injury classification and the British Athletics Muscle Injury Classification. Intra-and interrater reliability was assessed with linear weighted kappa (κ) or unweighted Cohen's κ and percentage agreement was calculated. Results: We observed ‘substantial’ to ‘almost perfect’ intra- (κ range 0.65–1.00) and interrater reliability (κ range 0.77–1.00) with percentage agreement 83–100% and 88–100%, respectively, for severity gradings, overall anatomical sites and overall classifications for the three MRI systems. We observed substantial variability (κ range −0.05 to 1.00) for subcategories within the Chan classification and the British Athletics Muscle Injury Classification, however, the prevalence of positive scorings was low for some subcategories. Conclusions: The modified Peetrons grading system, overall Chan classification and overall British Athletics Muscle Injury Classification demonstrated ‘substantial' to ‘almost perfect' intra- and interrater reliability when scored by experienced radiologists. The intra- and interrater reliability for the anatomical subcategories within the classifications remains unclear.

  18. BurnCase 3D software validation study: Burn size measurement accuracy and inter-rater reliability.

    Science.gov (United States)

    Parvizi, Daryousch; Giretzlehner, Michael; Wurzer, Paul; Klein, Limor Dinur; Shoham, Yaron; Bohanon, Fredrick J; Haller, Herbert L; Tuca, Alexandru; Branski, Ludwik K; Lumenta, David B; Herndon, David N; Kamolz, Lars-P

    2016-03-01

    The aim of this study was to compare the accuracy of burn size estimation using the computer-assisted software BurnCase 3D (RISC Software GmbH, Hagenberg, Austria) with that using a 2D scan, considered to be the actual burn size. Thirty artificial burn areas were pre planned and prepared on three mannequins (one child, one female, and one male). Five trained physicians (raters) were asked to assess the size of all wound areas using BurnCase 3D software. The results were then compared with the real wound areas, as determined by 2D planimetry imaging. To examine inter-rater reliability, we performed an intraclass correlation analysis with a 95% confidence interval. The mean wound area estimations of the five raters using BurnCase 3D were in total 20.7±0.9% for the child, 27.2±1.5% for the female and 16.5±0.1% for the male mannequin. Our analysis showed relative overestimations of 0.4%, 2.8% and 1.5% for the child, female and male mannequins respectively, compared to the 2D scan. The intraclass correlation between the single raters for mean percentage of the artificial burn areas was 98.6%. There was also a high intraclass correlation between the single raters and the 2D Scan visible. BurnCase 3D is a valid and reliable tool for the determination of total body surface area burned in standard models. Further clinical studies including different pediatric and overweight adult mannequins are warranted. Copyright © 2016 Elsevier Ltd and ISBI. All rights reserved.

  19. Intra and inter-rater reliability study of pelvic floor muscle dynamometric measurements

    Directory of Open Access Journals (Sweden)

    Natalia M. Martinho

    2015-04-01

    Full Text Available OBJECTIVE: The aim of this study was to evaluate the intra and inter-rater reliability of pelvic floor muscle (PFM dynamometric measurements for maximum and average strengths, as well as endurance. METHOD: A convenience sample of 18 nulliparous women, without any urogynecological complaints, aged between 19 and 31 (mean age of 25.4±3.9 participated in this study. They were evaluated using a pelvic floor dynamometer based on load cell technology. The dynamometric evaluations were repeated in three successive sessions: two on the same day with a rest period of 30 minutes between them, and the third on the following day. All participants were evaluated twice in each session; first by examiner 1 followed by examiner 2. The vaginal dynamometry data were analyzed using three parameters: maximum strength, average strength, and endurance. The Intraclass Correlation Coefficient (ICC was applied to estimate the PFM dynamometric measurement reliability, considering a good level as being above 0.75. RESULTS: The intra and inter-raters' analyses showed good reliability for maximum strength (ICCintra-rater1=0.96, ICCintra-rater2=0.95, and ICCinter-rater=0.96, average strength (ICCintra-rater1=0.96, ICCintra-rater2=0.94, and ICCinter-rater=0.97, and endurance (ICCintra-rater1=0.88, ICCintra-rater2=0.86, and ICCinter-rater=0.92 dynamometric measurements. CONCLUSIONS: The PFM dynamometric measurements showed good intra- and inter-rater reliability for maximum strength, average strength and endurance, which demonstrates that this is a reliable device that can be used in clinical practice.

  20. Simulated patient training: Using inter-rater reliability to evaluate simulated patient consistency in nursing education.

    Science.gov (United States)

    MacLean, Sharon; Geddes, Fiona; Kelly, Michelle; Della, Phillip

    2018-03-01

    Simulated patients (SPs) are frequently used for training nursing students in communication skills. An acknowledged benefit of using SPs is the opportunity to provide a standardized approach by which participants can demonstrate and develop communication skills. However, relatively little evidence is available on how to best facilitate and evaluate the reliability and accuracy of SPs' performances. The aim of this study is to investigate the effectiveness of an evidenced based SP training framework to ensure standardization of SPs. The training framework was employed to improve inter-rater reliability of SPs. A quasi-experimental study was employed to assess SP post-training understanding of simulation scenario parameters using inter-rater reliability agreement indices. Two phases of data collection took place. Initially a trial phase including audio-visual (AV) recordings of two undergraduate nursing students completing a simulation scenario is rated by eight SPs using the Interpersonal Communication Assessments Scale (ICAS) and Quality of Discharge Teaching Scale (QDTS). In phase 2, eight SP raters and four nursing faculty raters independently evaluated students' (N=42) communication practices using the QDTS. Intraclass correlation coefficients (ICC) were >0.80 for both stages of the study in clinical communication skills. The results support the premise that if trained appropriately, SPs have a high degree of reliability and validity to both facilitate and evaluate student performance in nurse education. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.

  1. Validity and Reliability of the Clinical Competency Evaluation Instrument for Use among Physiotherapy Students: Pilot study.

    Science.gov (United States)

    Muhamad, Zailani; Ramli, Ayiesah; Amat, Salleh

    2015-05-01

    The aim of this study was to determine the content validity, internal consistency, test-retest reliability and inter-rater reliability of the Clinical Competency Evaluation Instrument (CCEVI) in assessing the clinical performance of physiotherapy students. This study was carried out between June and September 2013 at University Kebangsaan Malaysia (UKM), Kuala Lumpur, Malaysia. A panel of 10 experts were identified to establish content validity by evaluating and rating each of the items used in the CCEVI with regards to their relevance in measuring students' clinical competency. A total of 50 UKM undergraduate physiotherapy students were assessed throughout their clinical placement to determine the construct validity of these items. The instrument's reliability was determined through a cross-sectional study involving a clinical performance assessment of 14 final-year undergraduate physiotherapy students. The content validity index of the entire CCEVI was 0.91, while the proportion of agreement on the content validity indices ranged from 0.83-1.00. The CCEVI construct validity was established with factor loading of ≥0.6, while internal consistency (Cronbach's alpha) overall was 0.97. Test-retest reliability of the CCEVI was confirmed with a Pearson's correlation range of 0.91-0.97 and an intraclass coefficient correlation range of 0.95-0.98. Inter-rater reliability of the CCEVI domains ranged from 0.59 to 0.97 on initial and subsequent assessments. This pilot study confirmed the content validity of the CCEVI. It showed high internal consistency, thereby providing evidence that the CCEVI has moderate to excellent inter-rater reliability. However, additional refinement in the wording of the CCEVI items, particularly in the domains of safety and documentation, is recommended to further improve the validity and reliability of the instrument.

  2. Palliative sedation: reliability and validity of sedation scales.

    Science.gov (United States)

    Arevalo, Jimmy J; Brinkkemper, Tijn; van der Heide, Agnes; Rietjens, Judith A; Ribbe, Miel; Deliens, Luc; Loer, Stephan A; Zuurmond, Wouter W A; Perez, Roberto S G M

    2012-11-01

    Observer-based sedation scales have been used to provide a measurable estimate of the comfort of nonalert patients in palliative sedation. However, their usefulness and appropriateness in this setting has not been demonstrated. To study the reliability and validity of observer-based sedation scales in palliative sedation. A prospective evaluation of 54 patients under intermittent or continuous sedation with four sedation scales was performed by 52 nurses. Included scales were the Minnesota Sedation Assessment Tool (MSAT), Richmond Agitation-Sedation Scale (RASS), Vancouver Interaction and Calmness Scale (VICS), and a sedation score proposed in the Guideline for Palliative Sedation of the Royal Dutch Medical Association (KNMG). Inter-rater reliability was tested with the intraclass correlation coefficient (ICC) and Cohen's kappa coefficient. Correlations between the scales using Spearman's rho tested concurrent validity. We also examined construct, discriminative, and evaluative validity. In addition, nurses completed a user-friendliness survey. Overall moderate to high inter-rater reliability was found for the VICS interaction subscale (ICC = 0.85), RASS (ICC = 0.73), and KNMG (ICC = 0.71). The largest correlation between scales was found for the RASS and KNMG (rho = 0.836). All scales showed discriminative and evaluative validity, except for the MSAT motor subscale and VICS calmness subscale. Finally, the RASS was less time consuming, clearer, and easier to use than the MSAT and VICS. The RASS and KNMG scales stand as the most reliable and valid among the evaluated scales. In addition, the RASS was less time consuming, clearer, and easier to use than the MSAT and VICS. Further research is needed to evaluate the impact of the scales on better symptom control and patient comfort. Copyright © 2012 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.

  3. Feasibility and Inter-Rater Reliability of Physical Performance Measures in Acutely Admitted Older Medical Patients

    DEFF Research Database (Denmark)

    Bodilsen, Ann Christine; Juul-Larsen, Helle Gybel; Petersen, Janne

    2015-01-01

    OBJECTIVE: Physical performance measures can be used to predict functional decline and increased dependency in older persons. However, few studies have assessed the feasibility or reliability of such measures in hospitalized older patients. Here we assessed the feasibility and inter-rater reliabi......OBJECTIVE: Physical performance measures can be used to predict functional decline and increased dependency in older persons. However, few studies have assessed the feasibility or reliability of such measures in hospitalized older patients. Here we assessed the feasibility and inter......-rater reliability of four simple measures of physical performance in acutely admitted older medical patients. DESIGN: During the first 24 hours of hospitalization, the following were assessed twice by different raters in 52 (≥ 65 years) patients admitted for acute medical illness: isometric hand grip strength, 4......, and 30-s chair stand were 8%, 7%, and 18%, and the SRD95% values were 22%, 17%, and 49%. CONCLUSION: In acutely admitted older medical patients, grip strength, gait speed, and the Cumulated Ambulation Score measurements were feasible and showed high inter-rater reliability when administered by different...

  4. Reliability and validity of a tool to assess airway management skills in anesthesia trainees

    Directory of Open Access Journals (Sweden)

    Aliya Ahmed

    2016-01-01

    Conclusion: The tool designed to assess bag-mask ventilation and tracheal intubation skills in anesthesia trainees demonstrated excellent inter-rater reliability, fair test-retest reliability, and good construct validity. The authors recommend its use for formative and summative assessment of junior anesthesia trainees.

  5. Photographic assessment of burn size and depth: reliability and validity

    NARCIS (Netherlands)

    Hop, M.; Moues, C.; Bogomolova, K.; Nieuwenhuis, M.; Oen, I.; Middelkoop, E.; Breederveld, R.; de Baar, M.

    2014-01-01

    Objective: The aim of this study was to examine the reliability and validity of using photographs of burns to assess both burn size and depth. Method: Fifty randomly selected photographs taken on day 0-1 post burn were assessed by seven burn experts and eight referring physicians. Inter-rater

  6. The reliability and validity of the Turkish version of Fullerton Advanced Balance (FAB-T) scale.

    Science.gov (United States)

    Iyigun, Gozde; Kirmizigil, Berkiye; Angin, Ender; Oksuz, Sevim; Can, Filiz; Eker, Levent; Rose, Debra J

    2018-06-04

    The aim of this study was to evaluate the reliability and validity of the Turkish version of the FAB(FAB-T) scale in the older Turkish adults. The reliability and validity of the scale was tested on 200 community-dwelling older adults. FAB-T scale was scored by different physiotherapists on different days to evaluate inter-rater and intrarater reliability. The Berg Balance Scale (BBS) was used for the evaluation of convergent validity, and the content validity of the FAB-T scale was investigated. The FAB-T scale showed very high inter- and intra-rater reliability. For inter-rater agreement, on the individual test items and total score ICC values were 0.92 (95 %CI; 0.90-0.94) and 0.96 (95% CI; 0.95-0.97) respectively. The intra-rater agreement, on the individual test items and total score ICC values were 0.93 (95 %CI; 0.91- 0.95) and 0.96 (95% CI; 0.95- 0.97) respectively. There was a good agreement between the FAB-T and BBS scales. A high correlation was found between the BBS and FAB-T scales [rho = 0.70 (%95 CI; 0.62-0.76)] indicating good convergent validity. Considering the content validity of the FAB-T scale, no floor (floor score: 0%) or ceiling (ceiling score: 6.5%) effect was detected. The FAB-T scale was successfully translated from the original English version (FAB) and demonstrated strong psychometric features. It was found that the FAB-T scale has very high inter-rater and intra-rater reliability. Considering the convergent validity, the scale has high correlation with the BBS. The FAB-T has no floor and ceiling effect. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. Validity and Reliability of the Arabic Version of the Positive and Negative Syndrome Scale.

    Science.gov (United States)

    Yehya, Arij; Ghuloum, Suhaila; Mahfoud, Ziyad; Opler, Mark; Khan, Anzalee; Hammoudeh, Samer; Abdulhakam, Abdulmoneim; Al-Mujalli, Azza; Hani, Yahya; Elsherbiny, Reem; Al-Amin, Hassen

    The Positive and Negative Syndrome Scale (PANSS) is widely used for patients with schizophrenia. This scale is reliable and valid. The PANSS was translated and validated in several languages. The aim of this study was to translate and validate the PANSS in the Arab population. The PANSS was translated into formal Arabic language using the back-translation method. 101 Arab patients with schizophrenia and 98 Arabs with no diagnosis of any mental disorder were recruited. The Arabic version of the Mini International Neuropsychiatric Interview (MINI-6) was used as a diagnostic tool to confirm the diagnosis of schizophrenia or rule out any diagnosis for the healthy control group. Reliability of the scale was assessed by calculating internal consistency, interrater reliability and test-retest reliability. Construct validity was assessed using the Arabic version of the MINI-6. PANSS total scores were correlated with the Clinical Global Impression-Severity scale. Our findings showed that the internal consistency was good (0.92). Scores on the PANSS of the patients were much higher than those of the healthy controls. The PANSS showed good interrater reliability and test-retest reliability (0.92 and 0.75, respectively). In comparison with the MINI-6, the PANSS showed good sensitivity and specificity, which implies good construct validity of this version. In conclusion, the Arabic version of the PANSS is a reliable and valid instrument for the assessment of patients with schizophrenia in the Arab population. © 2016 S. Karger AG, Basel.

  8. Development and interrater reliability testing of a telephone interview training programme for Australian nurse interviewers.

    Science.gov (United States)

    Ahern, Tracey; Gardner, Anne; Gardner, Glenn; Middleton, Sandy; Della, Phillip

    2013-05-01

    The final phase of a three phase study analysing the implementation and impact of the nurse practitioner role in Australia (the Australian Nurse Practitioner Project or AUSPRAC) was undertaken in 2009, requiring nurse telephone interviewers to gather information about health outcomes directly from patients and their treating nurse practitioners. A team of several registered nurses was recruited and trained as telephone interviewers. The aim of this paper is to report on development and evaluation of the training process for telephone interviewers. The training process involved planning the content and methods to be used in the training session; delivering the session; testing skills and understanding of interviewers post-training; collecting and analysing data to determine the degree to which the training process was successful in meeting objectives and post-training follow-up. All aspects of the training process were informed by established educational principles. Interrater reliability between interviewers was high for well-validated sections of the survey instrument resulting in 100% agreement between interviewers. Other sections with unvalidated questions showed lower agreement (between 75% and 90%). Overall the agreement between interviewers was 92%. Each interviewer was also measured against a specifically developed master script or gold standard and for this each interviewer achieved a percentage of correct answers of 94.7% or better. This equated to a Kappa value of 0.92 or better. The telephone interviewer training process was very effective and achieved high interrater reliability. We argue that the high reliability was due to the use of well validated instruments and the carefully planned programme based on established educational principles. There is limited published literature on how to successfully operationalise educational principles and tailor them for specific research studies; this report addresses this knowledge gap. Copyright © 2012 Elsevier

  9. Intra- and interrater reliability of the 'lumbar-locked thoracic rotation test' in competitive swimmers ages 10 through 18 years.

    Science.gov (United States)

    Feijen, Stef; Kuppens, Kevin; Tate, Angela; Baert, Isabel; Struyf, Thomas; Struyf, Filip

    2018-04-17

    Measuring thoracic spine mobility can be of interest to competitive swimmers as it has been associated with shoulder girdle function and scapular position in subjects with and without shoulder pain. At present, no reliability data of thoracic spine mobility measurements are available in the swimming population. This study aims to evaluate the within-session intra- and interrater reliability of the "lumbar-locked rotation test" for thoracic spine rotation in competitive swimmers aged 10 to 18 years. This reliability study is part of a larger prospective cohort study investigating potential risk factors for the development of shoulder pain in competitive swimmers. Within-session, intra- and inter-rater reliability. Competitive swimming clubs in Belgium. 21 competitive swimmers. Intra- and inter-rater reliability of the lumbar-locked thoracic rotation test. Intraclass correlation coefficients (ICCs) ranged from 0.91 (95% CI 0.78 to 0.96) to 0.96 (0.89-0.98) for intra-rater reliability. Results for inter-rater reliability ranged from 0.89 (0.72-0.95) to 0.86 (0.65-0.94) respectively for right and left thoracic rotation. Results suggest good to excellent reliability of the lumbar-locked thoracic rotation test, indicating this test can be used reliably in clinical practice. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Inter-rater reliability of the Greek version of CAARMS among two groups of mental health professionals.

    Science.gov (United States)

    Kollias, C; Kontaxakis, V; Havaki-Kontaxaki, B; Simmons, M B; Stefanis, N; Papageorgiou, C

    2015-01-01

    There is increasing interest within the Greek psychiatric community in the early detection and prevention of psychotic disorders. To support this, there is a need for a valid and reliable tool to identify young people that may be at risk of developing a psychotic disorder. Our team has previously translated the Comprehensive Assessment of At-Risk Mental States (CAARMS). The validity of the CAARMS was ensured by the procedure of translation and the aim of the current study was to estimate the interrater reliability of the CAARMS Greek translation among residents in psychiatry and specialized mental health professionals. 43 mental health workers (27 residents in psychiatry and 16 specialized mental health professionals (i.e. 11 psychiatrists and 5 psychologist) participated in two seminars that covered theoretical information about the ultra high risk concept and training in the CAARMS. During the seminars, 10 vignettes with psychiatric history cases were presented, including healthy, ultra high risk and first episode psychosis. The mean correlated percentage of agreement with the correct answers regarding diagnosis of the presented history cases among all our subjects was 81.42, among specialized mental health professionals 77.88, and among residents 84.46. Intraclass correlation co-efficients were 0.994 for specialized mental health professionals and 0.997 for residents. The translated Greek version of CAARMS presents a satisfying interrater reliability when used by both residents and specialized mental health professionals. Residents declare even higher intraclass correlation co-efficients and mean correlated percentage of agreement than specialized mental health professionals, which indicate that residents are capable of using the CAARMS in early intervention units.

  11. Interrater reliability of schizoaffective disorder compared with schizophrenia, bipolar disorder, and unipolar depression - A systematic review and meta-analysis.

    Science.gov (United States)

    Santelmann, Hanno; Franklin, Jeremy; Bußhoff, Jana; Baethge, Christopher

    2016-10-01

    Schizoaffective disorder is a common diagnosis in clinical practice but its nosological status has been subject to debate ever since it was conceptualized. Although it is key that diagnostic reliability is sufficient, schizoaffective disorder has been reported to have low interrater reliability. Evidence based on systematic review and meta-analysis methods, however, is lacking. Using a highly sensitive literature search in Medline, Embase, and PsycInfo we identified studies measuring the interrater reliability of schizoaffective disorder in comparison to schizophrenia, bipolar disorder, and unipolar disorder. Out of 4126 records screened we included 25 studies reporting on 7912 patients diagnosed by different raters. The interrater reliability of schizoaffective disorder was moderate (meta-analytic estimate of Cohen's kappa 0.57 [95% CI: 0.41-0.73]), and substantially lower than that of its main differential diagnoses (difference in kappa between 0.22 and 0.19). Although there was considerable heterogeneity, analyses revealed that the interrater reliability of schizoaffective disorder was consistently lower in the overwhelming majority of studies. The results remained robust in subgroup and sensitivity analyses (e.g., diagnostic manual used) as well as in meta-regressions (e.g., publication year) and analyses of publication bias. Clinically, the results highlight the particular importance of diagnostic re-evaluation in patients diagnosed with schizoaffective disorder. They also quantify a widely held clinical impression of lower interrater reliability and agree with earlier meta-analysis reporting low test-retest reliability. Copyright © 2016. Published by Elsevier B.V.

  12. The Berg Balance Scale has high intra- and inter-rater reliability but absolute reliability varies across the scale: a systematic review.

    Science.gov (United States)

    Downs, Stephen; Marquez, Jodie; Chiarelli, Pauline

    2013-06-01

    What is the intra-rater and inter-rater relative reliability of the Berg Balance Scale? What is the absolute reliability of the Berg Balance Scale? Does the absolute reliability of the Berg Balance Scale vary across the scale? Systematic review with meta-analysis of reliability studies. Any clinical population that has undergone assessment with the Berg Balance Scale. Relative intra-rater reliability, relative inter-rater reliability, and absolute reliability. Eleven studies involving 668 participants were included in the review. The relative intrarater reliability of the Berg Balance Scale was high, with a pooled estimate of 0.98 (95% CI 0.97 to 0.99). Relative inter-rater reliability was also high, with a pooled estimate of 0.97 (95% CI 0.96 to 0.98). A ceiling effect of the Berg Balance Scale was evident for some participants. In the analysis of absolute reliability, all of the relevant studies had an average score of 20 or above on the 0 to 56 point Berg Balance Scale. The absolute reliability across this part of the scale, as measured by the minimal detectable change with 95% confidence, varied between 2.8 points and 6.6 points. The Berg Balance Scale has a higher absolute reliability when close to 56 points due to the ceiling effect. We identified no data that estimated the absolute reliability of the Berg Balance Scale among participants with a mean score below 20 out of 56. The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects. The review was only able to comment on the absolute reliability of the Berg Balance Scale among people with moderately poor to normal balance. Copyright © 2013 Australian Physiotherapy Association. Published by .. All rights reserved.

  13. Reliability and Validity of Autism Diagnostic Interview-Revised, Japanese Version

    Science.gov (United States)

    Tsuchiya, Kenji J.; Matsumoto, Kaori; Yagi, Atsuko; Inada, Naoko; Kuroda, Miho; Inokuchi, Eiko; Koyama, Tomonori; Kamio, Yoko; Tsujii, Masatsugu; Sakai, Saeko; Mohri, Ikuko; Taniike, Masako; Iwanaga, Ryoichiro; Ogasahara, Kei; Miyachi, Taishi; Nakajima, Shunji; Tani, Iori; Ohnishi, Masafumi; Inoue, Masahiko; Nomura, Kazuyo; Hagiwara, Taku; Uchiyama, Tokio; Ichikawa, Hironobu; Kobayashi, Shuji; Miyamoto, Ken; Nakamura, Kazuhiko; Suzuki, Katsuaki; Mori, Norio; Takei, Nori

    2013-01-01

    To examine the inter-rater reliability of Autism Diagnostic Interview-Revised, Japanese Version (ADI-R-JV), the authors recruited 51 individuals aged 3-19 years, interviewed by two independent raters. Subsequently, to assess the discriminant and diagnostic validity of ADI-R-JV, the authors investigated 317 individuals aged 2-19 years, who were…

  14. Interrater and intrarater reliability of the Knosp scale for pituitary adenoma grading.

    Science.gov (United States)

    Mooney, Michael A; Hardesty, Douglas A; Sheehy, John P; Bird, Robert; Chapple, Kristina; White, William L; Little, Andrew S

    2017-05-01

    OBJECTIVE The goal of this study was to determine the interrater and intrarater reliability of the Knosp grading scale for predicting pituitary adenoma cavernous sinus (CS) involvement. METHODS Six independent raters (3 neurosurgery residents, 2 pituitary surgeons, and 1 neuroradiologist) participated in the study. Each rater scored 50 unique pituitary MRI scans (with contrast) of biopsy-proven pituitary adenoma. Reliabilities for the full scale were determined 3 ways: 1) using all 50 scans, 2) using scans with midrange scores versus end scores, and 3) using a dichotomized scale that reflects common clinical practice. The performance of resident raters was compared with that of faculty raters to assess the influence of training level on reliability. RESULTS Overall, the interrater reliability of the Knosp scale was "strong" (0.73, 95% CI 0.56-0.84). However, the percent agreement for all 6 reviewers was only 10% (26% for faculty members, 30% for residents). The reliability of the middle scores (i.e., average rated Knosp Grades 1 and 2) was "very weak" (0.18, 95% CI -0.27 to 0.56) and the percent agreement for all reviewers was only 5%. When the scale was dichotomized into tumors unlikely to have intraoperative CS involvement (Grades 0, 1, and 2) and those likely to have CS involvement (Grades 3 and 4), the reliability was "strong" (0.60, 95% CI 0.39-0.75) and the percent agreement for all raters improved to 60%. There was no significant difference in reliability between residents and faculty (residents 0.72, 95% CI 0.55-0.83 vs faculty 0.73, 95% CI 0.56-0.84). Intrarater reliability was moderate to strong and increased with the level of experience. CONCLUSIONS Although these findings suggest that the Knosp grading scale has acceptable interrater reliability overall, it raises important questions about the "very weak" reliability of the scale's middle grades. By dichotomizing the scale into clinically useful groups, the authors were able to address the poor

  15. The reliability and validity of the Turkish version of the Neuropsychiatric Inventory-Clinician.

    Science.gov (United States)

    Sahin Cankurtaran, Eylem; Danişman, Mustafa; Tutar, Hasan; Ulusoy Kaymak, Semra

    2015-01-01

    The Neuropsychiatric Inventory-Clinician (NPI-C) scale is one of the best-known scales for evaluating the behavioral and psychological symptoms of dementia. This study aimed to assess the reliability and validity of the Turkish version of the NPI-C scale in patients with Alzheimer disease (AD). The NPI-C scale was administered to 125 patients with AD. For reliability, both Cronbach's α and interrater reliability were analyzed. The Behavioral Pathology in Alzheimer's Disease (BEHAVE-AD) scale was applied for validity and, in addition, the Mini Mental State Examination (MMSE), Instrumental Activities of Daily Living (IADL) scale, and Disability Assessment of Dementia (DAD) scale were completed. The Turkish version of the NPI-C scale showed high internal consistency (Cronbach's α = 0.75) and mostly good interrater reliability. Assessments of validity showed that the NPI-C and corresponding BEHAVE-AD domains were found to be significantly correlated, between 0.925 and 0.195. Moreover, the correlations between NPI-C and MMSE were significant for all domains except the dysphoria, anxiety, and elation/euphoria domains. When we conducted a correlation analysis of NPI-C with IADL, all domains were statistically significantly correlated except aggression, anxiety, elation/euphoria, and dysphoria. The Turkish version of the NPI-C scale was found to be a reliable and valid instrument to assess neuropsychiatric symptoms in Turkish elderly subjects with AD.

  16. Intra-rater and inter-rater reliability of a medical record abstraction study on transition of care after childhood cancer.

    Directory of Open Access Journals (Sweden)

    Micòl E Gianinazzi

    Full Text Available The abstraction of data from medical records is a widespread practice in epidemiological research. However, studies using this means of data collection rarely report reliability. Within the Transition after Childhood Cancer Study (TaCC which is based on a medical record abstraction, we conducted a second independent abstraction of data with the aim to assess a intra-rater reliability of one rater at two time points; b the possible learning effects between these two time points compared to a gold-standard; and c inter-rater reliability.Within the TaCC study we conducted a systematic medical record abstraction in the 9 Swiss clinics with pediatric oncology wards. In a second phase we selected a subsample of medical records in 3 clinics to conduct a second independent abstraction. We then assessed intra-rater reliability at two time points, the learning effect over time (comparing each rater at two time-points with a gold-standard and the inter-rater reliability of a selected number of variables. We calculated percentage agreement and Cohen's kappa.For the assessment of the intra-rater reliability we included 154 records (80 for rater 1; 74 for rater 2. For the inter-rater reliability we could include 70 records. Intra-rater reliability was substantial to excellent (Cohen's kappa 0-6-0.8 with an observed percentage agreement of 75%-95%. In all variables learning effects were observed. Inter-rater reliability was substantial to excellent (Cohen's kappa 0.70-0.83 with high agreement ranging from 86% to 100%.Our study showed that data abstracted from medical records are reliable. Investigating intra-rater and inter-rater reliability can give confidence to draw conclusions from the abstracted data and increase data quality by minimizing systematic errors.

  17. Interrater reliability assessment using the Test of Gross Motor Development-2.

    Science.gov (United States)

    Barnett, Lisa M; Minto, Christine; Lander, Natalie; Hardy, Louise L

    2014-11-01

    The aim was to examine interrater reliability of the object control subtest from the Test of Gross Motor Development-2 by live observation in a school field setting. Reliability Study--cross sectional. Raters were rated on their ability to agree on (1) the raw total for the six object control skills; (2) each skill performance and (3) the skill components. Agreement for the object control subtest and the individual skills was assessed by an intraclass correlation (ICC) and a kappa statistic assessed for skill component agreement. A total of 37 children (65% girls) aged 4-8 years (M = 6.2, SD = 0.8) were assessed in six skills by two raters; equating to 222 skill tests. Interrater reliability was excellent for the object control subset (ICC = 0.93), and for individual skills, highest for the dribble (ICC = 0.94) followed by strike (ICC = 0.85), overhand throw (ICC = 0.84), underhand roll (ICC = 0.82), kick (ICC = 0.80) and the catch (ICC = 0.71). The strike and the throw had more components with less agreement. Even though the overall subtest score and individual skill agreement was good, some skill components had lower agreement, suggesting these may be more problematic to assess. This may mean some skill components need to be specified differently in order to improve component reliability. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  18. Examination of anomalous self-experience in first-episode psychosis: interrater reliability

    DEFF Research Database (Denmark)

    Møller, Paul; Haug, Elisabeth; Raballo, Andrea

    2011-01-01

    -rater correlation above 0.80 (Spearman's rho, p values at an item level were very good in 9 items, good in 20 items, moderate in 11 items and fair in 4 items. Conclusion: The EASE provides a reliable and internally......) is a phenomenologically inspired checklist, specifically designed to support the comprehensive assessment of these characteristic subjective experiences. Aim: To assess the interrater reliability of the EASE. Sampling and Methods: Twenty-five first-episode psychosis (FEP) patients were interviewed with the EASE...

  19. Inter-rater reliability of an observation-based ergonomics assessment checklist for office workers.

    Science.gov (United States)

    Pereira, Michelle Jessica; Straker, Leon Melville; Comans, Tracy Anne; Johnston, Venerina

    2016-12-01

    To establish the inter-rater reliability of an observation-based ergonomics assessment checklist for computer workers. A 37-item (38-item if a laptop was part of the workstation) comprehensive observational ergonomics assessment checklist comparable to government guidelines and up to date with empirical evidence was developed. Two trained practitioners assessed full-time office workers performing their usual computer-based work and evaluated the suitability of workstations used. Practitioners assessed each participant consecutively. The order of assessors was randomised, and the second assessor was blinded to the findings of the first. Unadjusted kappa coefficients between the raters were obtained for the overall checklist and subsections that were formed from question-items relevant to specific workstation equipment. Twenty-seven office workers were recruited. The inter-rater reliability between two trained practitioners achieved moderate to good reliability for all except one checklist component. This checklist has mostly moderate to good reliability between two trained practitioners. Practitioner Summary: This reliable ergonomics assessment checklist for computer workers was designed using accessible government guidelines and supplemented with up-to-date evidence. Employers in Queensland (Australia) can fulfil legislative requirements by using this reliable checklist to identify and subsequently address potential risk factors for work-related injury to provide a safe working environment.

  20. Inter-rater reliability of nursing home quality indicators in the U.S

    Directory of Open Access Journals (Sweden)

    Roy Jason

    2003-11-01

    Full Text Available Abstract Background In the US, Quality Indicators (QI's profiling and comparing the performance of hospitals, health plans, nursing homes and physicians are routinely published for consumer review. We report the results of the largest study of inter-rater reliability done on nursing home assessments which generate the data used to derive publicly reported nursing home quality indicators. Methods We sampled nursing homes in 6 states, selecting up to 30 residents per facility who were observed and assessed by research nurses on 100 clinical assessment elements contained in the Minimum Data Set (MDS and compared these with the most recent assessment in the record done by facility nurses. Kappa statistics were generated for all data items and derived for 22 QI's over the entire sample and for each facility. Finally, facilities with many QI's with poor Kappa levels were compared to those with many QI's with excellent Kappa levels on selected characteristics. Results A total of 462 facilities in 6 states were approached and 219 agreed to participate, yielding a response rate of 47.4%. A total of 5758 residents were included in the inter-rater reliability analyses, around 27.5 per facility. Patients resembled the traditional nursing home resident, only 43.9% were continent of urine and only 25.2% were rated as likely to be discharged within the next 30 days. Results of resident level comparative analyses reveal high inter-rater reliability levels (most items >.75. Using the research nurses as the "gold standard", we compared composite quality indicators based on their ratings with those based on facility nurses. All but two QI's have adequate Kappa levels and 4 QI's have average Kappa values in excess of .80. We found that 16% of participating facilities performed poorly (Kappa .75 on 12 or more QI's. No facility characteristics were related to reliability of the data on which Qis are based. Conclusion While a few QI's being used for public reporting

  1. Blinded evaluation of interrater reliability of an operative competency assessment tool for direct laryngoscopy and rigid bronchoscopy.

    Science.gov (United States)

    Ishman, Stacey L; Benke, James R; Johnson, Kaalan Erik; Zur, Karen B; Jacobs, Ian N; Thorne, Marc C; Brown, David J; Lin, Sandra Y; Bhatti, Nasir; Deutsch, Ellen S

    2012-10-01

    OBJECTIVES To confirm interrater reliability using blinded evaluation of a skills-assessment instrument to assess the surgical performance of resident and fellow trainees performing pediatric direct laryngoscopy and rigid bronchoscopy in simulated models. DESIGN Prospective, paired, blinded observational validation study. SUBJECTS Paired observers from multiple institutions simultaneously evaluated residents and fellows who were performing surgery in an animal laboratory or using high-fidelity manikins. The evaluators had no previous affiliation with the residents and fellows and did not know their year of training. INTERVENTIONS One- and 2-page versions of an objective structured assessment of technical skills (OSATS) assessment instrument composed of global and a task-specific surgical items were used to evaluate surgical performance. RESULTS Fifty-two evaluations were completed by 17 attending evaluators. The instrument agreement for the 2-page assessment was 71.4% when measured as a binary variable (ie, competent vs not competent) (κ = 0.38; P = .08). Evaluation as a continuous variable revealed a 42.9% percentage agreement (κ = 0.18; P = .14). The intraclass correlation was 0.53, considered substantial/good interrater reliability (69% reliable). For the 1-page instrument, agreement was 77.4% when measured as a binary variable (κ = 0.53, P = .0015). Agreement when evaluated as a continuous measure was 71.0% (κ = 0.54, P formative feedback on operational competency.

  2. Intrarater and interrater reliability of pulse examination in traditional Indian Ayurvedic medicine.

    Science.gov (United States)

    Kurande, Vrinda; Waagepetersen, Rasmus; Toft, Egon; Prasad, Ramjee

    2013-09-01

    In Ayurveda, pulse examination ( nadipariksha ) is an important tool to assess the status of three doshas : vata , pitta , and kapha . Long historical use has been seen as a documentation of its efficacy; however, there is a lack of a quantitative measure of the reliability of the pulse examination method. The objective of this study was to test the intrarater and interrater reliability of pulse examination in Ayurveda. Fifteen registered Ayurvedic doctors with 3-15 years of experience examined the pulse of 20 healthy volunteers twice, for a total of 600 examinations. The examinations were performed blind and in a random order. Only the current status of dosha- specific methods of pulse examination were considered. Cohen's weighted κ statistic was used as a measure of intrarater and interrater reliability, and a hypothesis of homogeneous diagnosis (random rating) was tested. Following this, we tested whether proportions of ratings were equal between doctors. According to the Landis and Koch scale, the level of reliability ranged from poor to moderate. It was observed that the doctors more frequently diagnosed a combination of two doshas than a single dosha. The κ values were generally larger for experienced doctors ( p   =  0.04). Experience and proper training have important roles in pulse examination.

  3. Interrater reliability of Violence Risk Appraisal Guide scores provided in Canadian criminal proceedings.

    Science.gov (United States)

    Edens, John F; Penson, Brittany N; Ruchensky, Jared R; Cox, Jennifer; Smith, Shannon Toney

    2016-12-01

    Published research suggests that most violence risk assessment tools have relatively high levels of interrater reliability, but recent evidence of inconsistent scores among forensic examiners in adversarial settings raises concerns about the "field reliability" of such measures. This study specifically examined the reliability of Violence Risk Appraisal Guide (VRAG) scores in Canadian criminal cases identified in the legal database, LexisNexis. Over 250 reported cases were located that made mention of the VRAG, with 42 of these cases containing 2 or more scores that could be submitted to interrater reliability analyses. Overall, scores were skewed toward higher risk categories. The intraclass correlation (ICCA1) was .66, with pairs of forensic examiners placing defendants into the same VRAG risk "bin" in 68% of the cases. For categorical risk statements (i.e., low, moderate, high), examiners provided converging assessment results in most instances (86%). In terms of potential predictors of rater disagreement, there was no evidence for adversarial allegiance in our sample. Rater disagreement in the scoring of 1 VRAG item (Psychopathy Checklist-Revised; Hare, 2003), however, strongly predicted rater disagreement in the scoring of the VRAG (r = .58). (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  4. Development and inter-rater reliability of a standardized verbal instruction manual for the Chinese Geriatric Depression Scale-short form.

    Science.gov (United States)

    Wong, M T P; Ho, T P; Ho, M Y; Yu, C S; Wong, Y H; Lee, S Y

    2002-05-01

    The Geriatric Depression Scale (GDS) is a common screening tool for elderly depression in Hong Kong. This study aimed at (1) developing a standardized manual for the verbal administration and scoring of the GDS-SF, and (2) comparing the inter-rater reliability between the standardized and non-standardized verbal administration of GDS-SF. Two studies were reported. In Study 1, the process of developing the manual was described. In Study 2, we compared the inter-rater reliabilities of GDS-SF scores using the standardized verbal instructions and the traditional non-standardized administration. Results of Study 2 indicated that the standardized procedure in verbal administration and scoring improved the inter-rater reliabilities of GDS-SF. Copyright 2002 John Wiley & Sons, Ltd.

  5. Validity and Reliability of 2 Goniometric Mobile Apps: Device, Application, and Examiner Factors.

    Science.gov (United States)

    Wellmon, Robert H; Gulick, Dawn T; Paterson, Mark L; Gulick, Colleen N

    2016-12-01

    Smartphones are being used in a variety of practice settings to measure joint range of motion (ROM). A number of factors can affect the validity of the measurements generated. However, there are no studies examining smartphone-based goniometer applications focusing on measurement variability and error arising from the electromechanical properties of the device being used. To examine the concurrent validity and interrater reliability of 2 goniometric mobile applications (Goniometer Records, Goniometer Pro), an inclinometer, and a universal goniometer (UG). Nonexperimental, descriptive validation study. University laboratory. 3 physical therapists having an average of 25 y of experience. Three standardized angles (acute, right, obtuse) were constructed to replicate the movement of a hinge joint in the human body. Angular changes were measured and compared across 3 raters who used 3 different devices (UG, inclinometer, and 2 goniometric apps installed on 3 different smartphones: Apple iPhone 5, LG Android, and Samsung SIII Android). Intraclass correlation coefficients (ICCs) and Bland-Altman plots were used to examine interrater reliability and concurrent validity. Interrater reliability for each of the smartphone apps, inclinometer and UG were excellent (ICC = .995-1.000). Concurrent validity was also good (ICC = .998-.999). Based on the Bland-Altman plots, the means of the differences between the devices were low (range = -0.4° to 1.2°). This study identifies the error inherent in measurement that is independent of patient factors and due to the smartphone, the installed apps, and examiner skill. Less than 2° of measurement variability was attributable to those factors alone. The data suggest that 3 smartphones with the 2 installed apps are a viable substitute for using a UG or an inclinometer when measuring angular changes that typically occur when examining ROM and demonstrate the capacity of multiple examiners to accurately use smartphone-based goniometers.

  6. Inter-rater reliability of assessment of levator ani muscle strength and attachment to the pubic bone in nulliparous women.

    Science.gov (United States)

    van Delft, K; Schwertner-Tiepelmann, N; Thakar, R; Sultan, A H

    2013-09-01

    The modified Oxford scale (MOS) has been found previously to have poor inter-rater reliability, whereas digital assessment of levator ani muscle (LAM) attachment to the pubic bone has been shown to have acceptable reliability. Our aim was to evaluate inter-rater reliability of the validated MOS and to develop a reliable classification system for digital assessment of LAM attachment, correlating this to findings on transperineal ultrasound (TPUS) examination. Evaluation of the MOS by palpation was performed in nulliparous women by two investigators. LAM attachment was evaluated using digital palpation, for which a novel classification system was developed with four grades based on the position of the attachment and presence of discernible muscle. Findings were compared with those on TPUS examination. Inter-rater reliability was assessed using Cohen's kappa statistic. Twenty-five nulliparous women were examined. There was agreement in MOS scores between the investigators in 64% of women (n = 16), with a kappa of 0.66 (indicating substantial agreement). There was agreement in palpation of LAM attachment using the new grading system in 96% of women (n = 24), with a kappa of 0.90 (indicating almost perfect agreement). TPUS examination did not show LAM avulsion in any woman, with the exception of one with a partial avulsion. In this group of nulliparous patients, there was substantial agreement between the two investigators in evaluation of the MOS and there was good agreement between grades of LAM attachment using the new classification system, which correlated with findings on TPUS examination. It therefore appears that these results are reproducible in nulliparous women and the techniques can be readily learned and reliably incorporated into clinical practice and research after appropriate training. Further research is required to establish clinical utility of the grading system for LAM attachment in postpartum women and in women with symptomatic pelvic organ

  7. Validity and reliability of three definitions of hip osteoarthritis: cross sectional and longitudinal approach

    OpenAIRE

    Reijman, Max; Hazes, Mieke; Pols, Huib; Bernsen, Roos; Koes, Bart; Bierma-Zeinstra, Sita

    2004-01-01

    textabstractOBJECTIVES: To compare the reliability and validity in a large open population of three frequently used radiological definitions of hip osteoarthritis (OA): Kellgren and Lawrence grade, minimal joint space (MJS), and Croft grade; and to investigate whether the validity of the three definitions of hip OA is sex dependent. METHODS: SUBJECTS: from the Rotterdam study (aged > or= 55 years, n = 3585) were evaluated. The inter-rater reliability was tested in a random set of 148 x rays. ...

  8. Translation, adaptation and inter-rater reliability of the administration manual for the Fugl-Meyer assessment.

    Science.gov (United States)

    Michaelsen, Stella M; Rocha, André S; Knabben, Rodrigo J; Rodrigues, Luciano P; Fernandes, Claudia G C

    2011-01-01

    Recently, the reliability of the Brazilian version of the Fugl-Meyer Assessment (FMA) was assessed through the scoring given according to observations made by a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. Eighteen adults (59±10 years) with chronic hemiparesis (38±35 months after a stroke) took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98) and lower limbs (ICC=0.90), as well as for movement sense (ICC=0.98) and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively). The reliability was moderate for tactile sensitivity (0.75). The joint pain assessment presented low reliability. The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.

  9. Inter-rater reliability in the classification of supraspinatus tendon tears using 3D ultrasound – a question of experience?

    Directory of Open Access Journals (Sweden)

    Giorgio Tamborrini

    2016-09-01

    Full Text Available Background: Three-dimensional (3D ultrasound of the shoulder is characterized by a comparable accuracy to two-dimensional (2D ultrasound. No studies investigating 2D versus 3D inter-rater reliability in the detection of supraspinatus tendon tears taking into account the level of experience of the raters have been carried out so far. Objectives: The aim of this study was to determine the inter-rater reliability in the analysis of 3D ultrasound image sets of the supraspinatus tendon between sonographer with different levels of experience. Patients and methods: Non-interventional, prospective, observational pilot study of 2309 images of 127 adult patients suffering from unilateral shoulder pain. 3D ultrasound image sets were scored by three raters independently. The intra-and interrater reliabilities were calculated. Results: There was an excellent intra-rater reliability of rater A in the overall classification of supraspinatus tendon tears (2D vs 3D κ = 0.892, pairwise reliability 93.81%, 3D scoring round 1 vs 3D scoring round 2 κ = 0.875, pairwise reliability 92.857%. The inter-rater reliability was only moderate compared to rater B on 3D (κ = 0.497, pairwise reliability 70.95% and fair compared to rater C (κ = 0.238, pairwise reliability 42.38%. Conclusions: The reliability of 3D ultrasound of the supraspinatus tendon depends on the level of experience of the sonographer. Experience in 2D ultrasound does not seem to be sufficient for the analysis of 3D ultrasound imaging sets. Therefore, for a 3D ultrasound analysis new diagnostic criteria have to be established and taught even to experienced 2D sonographers to improve reproducibility.

  10. The interrater and intrarater reliability of the Philpott-Javer staging system based on level of training.

    Science.gov (United States)

    Parhar, Harman S; Thamboo, Andrew; Habib, Al-Rahim; Chang, Brent; Gan, Eng Cern; Javer, Amin R

    2014-04-01

    The Philpott-Javer postoperative endoscopic mucosal staging system for allergic fungal rhinosinusitis has previously demonstrated acceptable interrater reliability among rhinologists. There are, however, numerous learners involved in patient care at tertiary centers. This study aims to analyze the interrater and intrarater reliability of this system among learners in otolaryngology at different stages in training. A prospective analysis of retrospectively collected endoscopic photographs. A tertiary care teaching hospital (January 2013). Fifty patients undergoing routine follow-up. Three photographs from each of 50 patients undergoing routine postsurgical nasoendoscopy were reviewed. Images were played twice, 1 week apart, in 2 differently randomized cycles and scored according to Philpott-Javer criteria by a rhinologist, a rhinology fellow, a senior otolaryngology resident, a junior otolaryngology resident, and a medical student. Interobserver reliability was assessed using the intraclass correlation coefficient, while intrarater reliability was assessed by Shrout-Fleiss κ values. Agreement between each learner and the rhinologist was also assessed using κ values. The interclass correlation among the 5 raters was 0.7600 (95% confidence interval, 0.6917-0.8161) for the Philpott-Javer scoring system, suggesting substantial reliability. Intrarater data showed substantial to almost-perfect reliability (κ values between 0.668 and 0.815) among all raters using this system. There was also moderate to substantial agreement between the learners and the rhinologist (κ values between 0.534 and 0.710). Results suggest that the Philpott-Javer staging system has acceptable intrarater and interrater reliability among learners of differing levels of clinical experience and is suitable for evaluating progress following surgery.

  11. Validity and reliability of the European portuguese version of neuropsychiatric inventory in an institutionalized sample.

    Science.gov (United States)

    Ferreira, Ana Rita; Martins, Sonia; Ribeiro, Orquidea; Fernandes, Lia

    2015-01-01

    Neuropsychiatric symptoms are very common in dementia and have been associated with patient and caregiver distress, increased risk of institutionalization and higher costs of care. In this context, the neuropsychiatric inventory (NPI) is the most widely used comprehensive tool designed to measure neuropsychiatric Symptoms in geriatric patients with dementia. The aim of this study was to present the validity and reliability of the European Portuguese version of NPI. A cross-sectional study was carried out with a convenience sample of institutionalized patients (≥ 50 years old) in three nursing homes in Portugal. All patients were also assessed with mini-mental state examination (MMSE) (cognition), geriatric depression scale (GDS) (depression) and adults and older adults functional assessment inventory (IAFAI) (functionality). NPI was administered to a formal caregiver, usually from the clinical staff. Inter-rater and test-retest reliability were assessed in a subsample of 25 randomly selected subjects. The sample included 166 elderly, with a mean age of 80.9 (standard deviation: 10.2) years. Three out of the NPI behavioral items had negative correlations with MMSE: delusions (rs = -0.177, P = 0.024), disinhibition (rs = -0.174, P = 0.026) and aberrant motor activity (rs = -0.182, P = 0.020). The NPI subsection of depression/dysphoria correlated positively with GDS total score (rs = 0.166, P = 0.038). NPI showed good internal consistency (overall α = 0.766; frequency α = 0.737; severity α = 0.734). The inter-rater reliability was excellent (intraclass correlation coefficient (ICC): 1.00, 95% confidence interval (CI) 1.00 - 1.00), as well as test-retest reliability (ICC: 0.91, 95% CI 0.80 - 0.96). The results found for convergent validity, inter-rater and test-retest reliability, showed that this version appears to be a valid and reliable instrument for evaluation of neuropsychiatric symptoms in institutionalized elderly.

  12. Validity and inter-rater reliability of medio-lateral knee motion observed during a single-limb mini squat

    Directory of Open Access Journals (Sweden)

    Simic Milena

    2010-11-01

    Full Text Available Abstract Background Muscle function may influence the risk of knee injury and outcomes following injury. Clinical tests, such as a single-limb mini squat, resemble conditions of daily life and are easy to administer. Fewer squats per 30 seconds indicate poorer function. However, the quality of movement, such as the medio-lateral knee motion may also be important. The aim was to validate an observational clinical test of assessing the medio-lateral knee motion, using a three-dimensional (3-D motion analysis system. In addition, the inter-rater reliability was evaluated. Methods Twenty-five (17 women non-injured participants (mean age 25.6 years, range 18-37 were included. Visual analysis of the medio-lateral knee motion, scored as knee-over-foot or knee-medial-to-foot by two raters, and 3-D kinematic data were collected simultaneously during a single-limb mini squat. Frontal plane 2-D peak tibial, thigh, and knee varus-valgus angles, and 3-D peak hip internal-external rotation, and knee varus-valgus angles were calculated. Results Ten subjects were scored as having a knee-medial-to-foot position and 15 subjects a knee-over-foot position assessed by visual inspection. In 2-D, the peak tibial angle (mean 89.0 (SE 0.7 vs mean 86.3 (SE 0.4 degrees, p = 0.001 and peak thigh angle (mean 77.4 (SE 1.0 vs mean 81.2 (SE 0.5 degrees, p = 0.001 with respect to the horizontal, indicated that the knee was more medially placed than the ankle and thigh, respectively. Thus, the knee was in more valgus (mean 11.6 (SE 1.5 vs 5.0 (SE 0.8 degrees, p 0.90 and 96 between raters. Conclusions Medio-lateral motion of the knee can reliably be assessed during a single-leg mini-squat. The test is valid in 2-D, while the actual movement, in 3-D, is mainly exhibited as increased internal hip rotation. The single-limb mini squat is feasible and easy to administer in the clinical setting and in research to address lower extremity movement quality.

  13. THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS).

    Science.gov (United States)

    McCunn, Robert; Aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

    2017-02-01

    The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the 'pure' intra-rater (intra-occasion) reliability for those movements. Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate

  14. Reliability and Validity of the Dyadic Observed Communication Scale (DOCS).

    Science.gov (United States)

    Hadley, Wendy; Stewart, Angela; Hunter, Heather L; Affleck, Katelyn; Donenberg, Geri; Diclemente, Ralph; Brown, Larry K

    2013-02-01

    We evaluated the reliability and validity of the Dyadic Observed Communication Scale (DOCS) coding scheme, which was developed to capture a range of communication components between parents and adolescents. Adolescents and their caregivers were recruited from mental health facilities for participation in a large, multi-site family-based HIV prevention intervention study. Seventy-one dyads were randomly selected from the larger study sample and coded using the DOCS at baseline. Preliminary validity and reliability of the DOCS was examined using various methods, such as comparing results to self-report measures and examining interrater reliability. Results suggest that the DOCS is a reliable and valid measure of observed communication among parent-adolescent dyads that captures both verbal and nonverbal communication behaviors that are typical intervention targets. The DOCS is a viable coding scheme for use by researchers and clinicians examining parent-adolescent communication. Coders can be trained to reliably capture individual and dyadic components of communication for parents and adolescents and this complex information can be obtained relatively quickly.

  15. An initial reliability and validity study of the Interaction, Communication, and Literacy Skills Audit.

    Science.gov (United States)

    El-Choueifati, Nisrine; Purcell, Alison; McCabe, Patricia; Heard, Robert; Munro, Natalie

    2014-06-01

    Early childhood educators (ECEs) have an important role in promoting positive outcomes for children's language and literacy development. This paper reports the development of a new tool, The Interaction Communication and Literacy (ICL) Skills Audit, and pilots its reliability and validity. Intra- and inter-rater reliability was examined by three speech-language pathologists (SLPs). Five skill areas relating to ECE language and literacy practice were rated. The face and content validity of the ICL Skills Audit was examined by expert SLPs (n = 8) and expert ECEs (n = 4) via questionnaire. The overall intra-rater reliability for the ICL Skills Audit was excellent with percentage close agreement (PCA) of 91-94. Inter-rater agreement was PCA 68-80. Expert SLPs and ECEs agreed that the content was comprehensive and practical. Based on this preliminary study, the ICL Skills Audit appears to be a promising tool that can be used by SLPs and ECEs in collaboration to measure the skills of ECEs in the areas of language and literacy support. Future psychometric and outcome research on the revised ICL Skills Audit is warranted.

  16. Reliability and validity of a Chinese version of the Diagnostic Interview for Borderlines-Revised.

    Science.gov (United States)

    Wang, Lanlan; Yuan, Chenmei; Qiu, Jianying; Gunderson, John; Zhang, Min; Jiang, Kaida; Leung, Freedom; Zhong, Jie; Xiao, Zeping

    2014-09-01

    Borderline personality disorder (BPD) is the most studied of the axis II disorders. One of the most widely used diagnostic instruments is the Diagnostic Interview for Borderline Patients-Revised (DIB-R). The aim of this study was to test the reliability and validity of DIB-R for use in the Chinese culture. The reliability and validity of the DIB-R Chinese version were assessed in a sample of 236 outpatients with a probable BPD diagnosis. The Structured Clinical Interview for DSM-IV Personality Disorders (SCID-II) was used as a standard. Test-retest reliability was tested six months later with 20 patients, and inter-rater reliability was tested on 32 patients. The Chinese version of the DIB-R showed good internal global consistency (Cronbach's α of 0.916), good test-retest reliability (Pearson correlation of 0.704), good inter-rater reliability (intra-class correlation coefficient of 0.892 and kappa of 0.861). When compared with the DSM-IV diagnosis as measured by the SCID-II, the DIB-R showed relatively good sensitivity (0.768) and specificity (0.891) at the cutoff of 7, moderate diagnostic convergence (kappa of 0.631), as well as good discriminating validity. The Chinese version of the DIB-R has good psychometric properties, which renders it a valuable method for examining the presence, the severity, and component phenotypes of BPD in Chinese samples. © 2013 Wiley Publishing Asia Pty Ltd.

  17. Intra- and interrater reliability and agreement of the Danish version of the Dynamic Gait Index in older people with balance impairments

    DEFF Research Database (Denmark)

    Jønsson, Line R; Kristensen, Morten; Tibaek, Sigrid

    2011-01-01

    To examine the intrarater and interrater reliability and agreement of the Danish version of the Dynamic Gait Index (DGI) in hospitalized and community-dwelling older people with balance impairments.......To examine the intrarater and interrater reliability and agreement of the Danish version of the Dynamic Gait Index (DGI) in hospitalized and community-dwelling older people with balance impairments....

  18. Test–re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females

    OpenAIRE

    Chris Beardsley; Tim Egerton; Brendon Skinner

    2016-01-01

    Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females.\\ud \\ud Method. The inter-rater reliability and test–re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart.\\ud \\ud Results. For measuring pel...

  19. Reliability and Validity of the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A).

    Science.gov (United States)

    Bervoets, Liene; Van Noten, Caroline; Van Roosbroeck, Sofie; Hansen, Dominique; Van Hoorenbeeck, Kim; Verheyen, Els; Van Hal, Guido; Vankerckhoven, Vanessa

    2014-01-01

    This study was designed to validate the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A). After adjustment of the original Canadian PAQ-C and PAQ-A (i.e. translation/back-translation and evaluation by expert committee), content validity of both PAQs was assessed and calculated using item-level (I-CVI) and scale-level (S-CVI) content validity indexes. Inter-item and inter-rater reliability of 196 PAQ-C and 95 PAQ-A filled in by both children or adolescents and their parent, were evaluated. Inter-item reliability was calculated by Cronbach's alpha (α) and inter-rater reliability was examined by percent observed agreement and weighted kappa (κ). Concurrent validity of PAQ-A was examined in a subsample of 28 obese and 16 normal-weight children by comparing it with concurrently measured physical activity using a maximal cardiopulmonary exercise test for the assessment of peak oxygen uptake (VO2 peak). For both PAQs, I-CVI ranged 0.67-1.00. S-CVI was 0.89 for PAQ-C and 0.90 for PAQ-A. A total of 192 PAQ-C and 94 PAQ-A were fully completed by both child and parent. Cronbach's α was 0.777 for PAQ-C and 0.758 for PAQ-A. Percent agreement ranged 59.9-74.0% for PAQ-C and 51.1-77.7% for PAQ-A, and weighted κ ranged 0.48-0.69 for PAQ-C and 0.51-0.68 for PAQ-A. The correlation between total PAQ-A score and VO2 peak - corrected for age, gender, height and weight - was 0.516 (p = 0.001). Both PAQs have an excellent content validity, an acceptable inter-item reliability and a moderate to good strength of inter-rater agreement. In addition, total PAQ-A score showed a moderate positive correlation with VO2 peak. Both PAQs have an acceptable to good reliability and validity, however, further validity testing is recommended to provide a more complete assessment of both PAQs.

  20. Assessment of the severity of dementia: validity and reliability of the Chinese (Cantonese) version of the Hierarchic Dementia Scale (CV-HDS).

    Science.gov (United States)

    Poon, Vickie Wan-kei; Lam, Linda Chiu-wa; Wong, Samuel Yeung-shan

    2008-09-01

    With the rapid growth of the older population, early detection of cognitive deficits is crucial in slowing down functional deterioration of the elderly persons. To examine the validity and reliability of the Chinese (Cantonese) version of the Hierarchic Dementia Scale (CV-HDS) for Chinese older persons in Hong Kong. The HDS was translated into Cantonese Chinese. The content and cultural validity were evaluated by six expert panel members. Sixty-two participants with diagnosis of dementia were recruited for evaluation. Inter-rater reliability, test-retest reliability, internal consistency and concurrent validity were examined. The CV-HDS demonstrated satisfactory psychometric properties. inter-rater reliability and test-retest reliability were high (alpha=0.89 and alpha=0.94 respectively). High value of Cronbach's alpha (alpha=0.94) demonstrated good internal consistency. The concurrent validity of CV-HDS, through correlation with its scores with that of the Chinese version of Mini Mental Status Examination, was established (ranged from r=0.58 to r=0.78, pCantonese speaking Chinese people with dementia. It facilitates treatment planning to optimize the effects of functional training and rehabilitation.

  1. Quality of the Critical Incident Technique in practice: Interrater reliability and users' acceptance under real conditions

    Directory of Open Access Journals (Sweden)

    ANNA KOCH

    2009-03-01

    Full Text Available The Critical Incident Technique (CIT is a widely used task analysis method in personnel psychology. While studies on psychometric properties of the CIT so far primarily took into account relevance ratings of task-lists or attributes, and hence, only a smaller or adapted part of the CIT, little is known about the psychometric properties of the complete CIT in its most meaningful and fruitful way. Therefore, the aim of the present study was to assess interrater reliability and the participants’ view of the CIT under real conditions and especially to provide data for the key step of the CIT: the classification of behavior descriptions into requirements. Additionally, the cost-benefit-ratio and practicability were rated from the participants’ views as an important indicator for the acceptance of the task analysis approach in practice. Instructors of German Institutions for Statutory Accidents Insurance and Prevention as well as their supervisors took part in a job analysis with the CIT. Moderate interrater reliability for the relevance rating was found while the classification step yielded unexpectedly low coefficients for interrater reliability. The cost-benefit-ratio and practicability of the complete CIT were rated very positive. The results are discussed in relation to determinants that facilitate or impede the application of task analysis procedures.

  2. Exploration of the (Interrater) Reliability and Latent Factor Structure of the Alcohol Use Disorders Identification Test (AUDIT) and the Drug Use Disorders Identification Test (DUDIT) in a Sample of Dutch Probationers.

    Science.gov (United States)

    Hildebrand, Martin; Noteborn, Mirthe G C

    2015-01-01

    The use of brief, reliable, valid, and practical measures of substance use is critical for conducting individual (risk and need) assessments in probation practice. In this exploratory study, the basic psychometric properties of the Alcohol Use Disorders Identification Test (AUDIT) and the Drug Use Disorders Identification Test (DUDIT) are evaluated. The instruments were administered as an oral interview instead of a self-report questionnaire. The sample comprised 383 offenders (339 men, 44 women). A subset of 56 offenders (49 men, 7 women) participated in the interrater reliability study. Data collection took place between September 2011 and November 2012. Overall, both instruments have acceptable levels of interrater reliability for total scores and acceptable to good interrater reliabilities for most of the individual items. Confirmatory factor analyses (CFA) indicated that the a priori one-, two- and three-factor solutions for the AUDIT did not fit the observed data very well. Principal axis factoring (PAF) supported a two-factor solution for the AUDIT that included a level of alcohol consumption/consequences factor (Factor 1) and a dependence factor (Factor 2), with both factors explaining substantial variance in AUDIT scores. For the DUDIT, CFA and PAF suggest that a one-factor solution is the preferred model (accounting for 62.61% of total variance). The Dutch language versions of the AUDIT and the DUDIT are reliable screening instruments for use with probationers and both instruments can be reliably administered by probation officers in probation practice. However, future research on concurrent and predictive validity is warranted.

  3. Using the eating disorder examination in the assessment of bulimia and anorexia: issues of reliability and validity.

    Science.gov (United States)

    Guest, T

    2000-01-01

    The Eating Disorder Examination will be assessed according to its reliability and validity in the assessment of anorexia nervosa and bulimia nervosa. A thorough review of the literature was conducted to judge the reliability and validity of the Eating Disorder Examination and its subscales. The review shows that the EDE and its subscales have good interrater reliability and internal consistency reliability. Similarly, high levels of discriminant validity, construct validity, and treatment validity in the assessment of eating disorders were also found. A summary of each study concerning the various types of reliability and validity will be provided. The EDE is considered to be the "gold standard" by which to identify eating disorders, so this tool used in conjunction with other behavioral measures will be imperative for clinical social work practice.

  4. The inter-rater reliability and prognostic value of coma scales in Nepali children with acute encephalitis syndrome.

    Science.gov (United States)

    Ray, Stephen; Rayamajhi, Ajit; Bonnett, Laura J; Solomon, Tom; Kneen, Rachel; Griffiths, Michael J

    2018-02-01

    Background Acute encephalitis syndrome (AES) is a common cause of coma in Nepali children. The Glasgow coma scale (GCS) is used to assess the level of coma in these patients and predict outcome. Alternative coma scales may have better inter-rater reliability and prognostic value in encephalitis in Nepali children, but this has not been studied. The Adelaide coma scale (ACS), Blantyre coma scale (BCS) and the Alert, Verbal, Pain, Unresponsive scale (AVPU) are alternatives to the GCS which can be used. Methods Children aged 1-14 years who presented to Kanti Children's Hospital, Kathmandu with AES between September 2010 and November 2011 were recruited. All four coma scales (GCS, ACS, BCS and AVPU) were applied on admission, 48 h later and on discharge. Inter-rater reliability (unweighted kappa) was measured for each. Correlation and agreement between total coma score and outcome (Liverpool outcome score) was measured by Spearman's rank and Bland-Altman plot. The prognostic value of coma scales alone and in combination with physiological variables was investigated in a subgroup (n = 22). A multivariable logistic regression model was fitted by backward stepwise. Results Fifty children were recruited. Inter-rater reliability using the variables scales was fair to moderate. However, the scales poorly predicted clinical outcome. Combining the scales with physiological parameters such as systolic blood pressure improved outcome prediction. Conclusion This is the first study to compare four coma scales in Nepali children with AES. The scales exhibited fair to moderate inter-rater reliability. However, the study is inadequately powered to answer the question on the relationship between coma scales and outcome. Further larger studies are required.

  5. Inter-rater reliability of a modified version of Delitto et al.’s classification-based system for low back pain : a pilot study

    NARCIS (Netherlands)

    Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C W

    2016-01-01

    Study design:: Observational inter-rater reliability study. Objectives: To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3)

  6. An Assessment of Reliability and Validity of a Rubric for Grading APA-Style Introductions

    Science.gov (United States)

    Stellmack, Mark A.; Konheim-Kalkstein, Yasmine L.; Manor, Julia E.; Massey, Abigail R.; Schmitz, Julie Ann P.

    2009-01-01

    This article describes the empirical evaluation of the reliability and validity of a grading rubric for grading APA-style introductions of undergraduate students. Levels of interrater agreement and intrarater agreement were not extremely high but were similar to values reported in the literature for comparably structured rubrics. Rank-order…

  7. IRR (Inter-Rater Reliability) of a COP (Classroom Observation Protocol)--A Critical Appraisal

    Science.gov (United States)

    Rui, Ning; Feldman, Jill M.

    2012-01-01

    Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…

  8. Publishing nutrition research: validity, reliability, and diagnostic test assessment in nutrition-related research.

    Science.gov (United States)

    Gleason, Philip M; Harris, Jeffrey; Sheean, Patricia M; Boushey, Carol J; Bruemmer, Barbara

    2010-03-01

    This is the sixth in a series of monographs on research design and analysis. The purpose of this article is to describe and discuss several concepts related to the measurement of nutrition-related characteristics and outcomes, including validity, reliability, and diagnostic tests. The article reviews the methodologic issues related to capturing the various aspects of a given nutrition measure's reliability, including test-retest, inter-item, and interobserver or inter-rater reliability. Similarly, it covers content validity, indicators of absolute vs relative validity, and internal vs external validity. With respect to diagnostic assessment, the article summarizes the concepts of sensitivity and specificity. The hope is that dietetics practitioners will be able to both use high-quality measures of nutrition concepts in their research and recognize these measures in research completed by others. Copyright 2010 American Dietetic Association. Published by Elsevier Inc. All rights reserved.

  9. Chest computed tomography-based scoring of thoracic sarcoidosis: Inter-rater reliability of CT abnormalities

    Energy Technology Data Exchange (ETDEWEB)

    Heuvel, D.A.V. den; Es, H.W. van; Heesewijk, J.P. van; Spee, M. [St. Antonius Hospital Nieuwegein, Department of Radiology, Nieuwegein (Netherlands); Jong, P.A. de [University Medical Center Utrecht, Department of Radiology, Utrecht (Netherlands); Zanen, P.; Grutters, J.C. [University Medical Center Utrecht, Division Heart and Lungs, Utrecht (Netherlands); St. Antonius Hospital Nieuwegein, Center of Interstitial Lung Diseases, Department of Pulmonology, Nieuwegein (Netherlands)

    2015-09-15

    To determine inter-rater reliability of sarcoidosis-related computed tomography (CT) findings that can be used for scoring of thoracic sarcoidosis. CT images of 51 patients with sarcoidosis were scored by five chest radiologists for various abnormal CT findings (22 in total) encountered in thoracic sarcoidosis. Using intra-class correlation coefficient (ICC) analysis, inter-rater reliability was analysed and reported according to the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) criteria. A pre-specified sub-analysis was performed to investigate the effect of training. Scoring was trained in a distinct set of 15 scans in which all abnormal CT findings were represented. Median age of the 51 patients (36 men, 70 %) was 43 years (range 26 - 64 years). All radiographic stages were present in this group. ICC ranged from 0.91 for honeycombing to 0.11 for nodular margin (sharp versus ill-defined). The ICC was above 0.60 in 13 of the 22 abnormal findings. Sub-analysis for the best-trained observers demonstrated an ICC improvement for all abnormal findings and values above 0.60 for 16 of the 22 abnormalities. In our cohort, reliability between raters was acceptable for 16 thoracic sarcoidosis-related abnormal CT findings. (orig.)

  10. Chest computed tomography-based scoring of thoracic sarcoidosis: Inter-rater reliability of CT abnormalities

    International Nuclear Information System (INIS)

    Heuvel, D.A.V. den; Es, H.W. van; Heesewijk, J.P. van; Spee, M.; Jong, P.A. de; Zanen, P.; Grutters, J.C.

    2015-01-01

    To determine inter-rater reliability of sarcoidosis-related computed tomography (CT) findings that can be used for scoring of thoracic sarcoidosis. CT images of 51 patients with sarcoidosis were scored by five chest radiologists for various abnormal CT findings (22 in total) encountered in thoracic sarcoidosis. Using intra-class correlation coefficient (ICC) analysis, inter-rater reliability was analysed and reported according to the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) criteria. A pre-specified sub-analysis was performed to investigate the effect of training. Scoring was trained in a distinct set of 15 scans in which all abnormal CT findings were represented. Median age of the 51 patients (36 men, 70 %) was 43 years (range 26 - 64 years). All radiographic stages were present in this group. ICC ranged from 0.91 for honeycombing to 0.11 for nodular margin (sharp versus ill-defined). The ICC was above 0.60 in 13 of the 22 abnormal findings. Sub-analysis for the best-trained observers demonstrated an ICC improvement for all abnormal findings and values above 0.60 for 16 of the 22 abnormalities. In our cohort, reliability between raters was acceptable for 16 thoracic sarcoidosis-related abnormal CT findings. (orig.)

  11. Validity and reliability of three definitions of hip osteoarthritis: cross sectional and longitudinal approach.

    Science.gov (United States)

    Reijman, M; Hazes, J M W; Pols, H A P; Bernsen, R M D; Koes, B W; Bierma-Zeinstra, S M A

    2004-11-01

    To compare the reliability and validity in a large open population of three frequently used radiological definitions of hip osteoarthritis (OA): Kellgren and Lawrence grade, minimal joint space (MJS), and Croft grade; and to investigate whether the validity of the three definitions of hip OA is sex dependent. from the Rotterdam study (aged > or= 55 years, n = 3585) were evaluated. The inter-rater reliability was tested in a random set of 148 x rays. The validity was expressed as the ability to identify patients who show clinical symptoms of hip OA (construct validity) and as the ability to predict total hip replacement (THR) at follow up (predictive validity). Inter-rater reliability was similar for the Kellgren and Lawrence grade and MJS (kappa statistics 0.68 and 0.62, respectively) but lower for Croft's grade (kappa statistic, 0.51). The Kellgren and Lawrence grade and MJS showed the strongest associations with clinical symptoms of hip OA. Sex appeared to be an effect modifier for Kellgren and Lawrence and MJS definitions, women showing a stronger association between grading and symptoms than men. However, the sex dependency was attributed to differences in height between women and men. The Kellgren and Lawrence grade showed the highest predictive value for THR at follow up. Based on these findings, Kellgren and Lawrence still appears to be a useful OA definition for epidemiological studies focusing on the presence of hip OA.

  12. Assessing the environmental characteristics of cycling routes to school: a study on the reliability and validity of a Google Street View-based audit.

    Science.gov (United States)

    Vanwolleghem, Griet; Van Dyck, Delfien; Ducheyne, Fabian; De Bourdeaudhuij, Ilse; Cardon, Greet

    2014-06-10

    Google Street View provides a valuable and efficient alternative to observe the physical environment compared to on-site fieldwork. However, studies on the use, reliability and validity of Google Street View in a cycling-to-school context are lacking. We aimed to study the intra-, inter-rater reliability and criterion validity of EGA-Cycling (Environmental Google Street View Based Audit - Cycling to school), a newly developed audit using Google Street View to assess the physical environment along cycling routes to school. Parents (n = 52) of 11-to-12-year old Flemish children, who mostly cycled to school, completed a questionnaire and identified their child's cycling route to school on a street map. Fifty cycling routes of 11-to-12-year olds were identified and physical environmental characteristics along the identified routes were rated with EGA-Cycling (5 subscales; 37 items), based on Google Street View. To assess reliability, two researchers performed the audit. Criterion validity of the audit was examined by comparing the ratings based on Google Street View with ratings through on-site assessments. Intra-rater reliability was high (kappa range 0.47-1.00). Large variations in the inter-rater reliability (kappa range -0.03-1.00) and criterion validity scores (kappa range -0.06-1.00) were reported, with acceptable inter-rater reliability values for 43% of all items and acceptable criterion validity for 54% of all items. EGA-Cycling can be used to assess physical environmental characteristics along cycling routes to school. However, to assess the micro-environment specifically related to cycling, on-site assessments have to be added.

  13. How do cognitively impaired elderly patients define "testament": reliability and validity of the testament definition scale.

    Science.gov (United States)

    Heinik, J; Werner, P; Lin, R

    1999-01-01

    The testament definition scale (TDS) is a specifically designed six-item scale aimed at measuring the respondent's capacity to define "testament." We assessed the reliability and validity of this new short scale in 31 community-dwelling cognitively impaired elderly patients. Interrater reliability for the six items ranged from .87 to .97. The interrater reliability for the total score was .77. Significant correlations were found between the TDS score and the Mini-Mental State Examination (MMSE) and the Cambridge Cognitive Examination scores (r = .71 and .72 respectively, p = .001). Criterion validity yielded significantly different means for subjects with MMSE scores of 24-30 and 0-23: mean 3.9 and 1.6 respectively (t(20) = 4.7, p = .001). Using a cutoff point of 0-2 vs. 3+, 79% of the subjects were correctly classified as severely cognitively impaired, with only 8.3% false positives, and a positive predictive value of 94%. Thus, TDS was found both reliable and valid. This scale, however, is not synonymous with testamentary capacity. The discussion deals with the methodological limitations of this study, and highlights the practical as well as the theoretical relevance of TDS. Future studies are warranted to elucidate the relationships between TDS and existing legal requirements of testamentary capacity.

  14. Interrater Reliability of the Power Mobility Road Test in the Virtual Reality-Based Simulator-2.

    Science.gov (United States)

    Kamaraj, Deepan C; Dicianno, Brad E; Mahajan, Harshal P; Buhari, Alhaji M; Cooper, Rory A

    2016-07-01

    To assess interrater reliability of the Power Mobility Road Test (PMRT) when administered through the Virtual Reality-based SIMulator-version 2 (VRSIM-2). Within-subjects repeated-measures design. Participants interacted with VRSIM-2 through 2 display options (desktop monitor vs immersive virtual reality screens) using 2 control interfaces (roller system vs conventional movement-sensing joystick), providing 4 different driving scenarios (driving conditions 1-4). Participants performed 3 virtual driving sessions for each of the 2 display screens and 1 session through a real-world driving course (driving condition 5). The virtual PMRT was conducted in a simulated indoor office space, and an equivalent course was charted in an open space for the real-world assessment. After every change in driving condition, participants completed a self-reported workload assessment questionnaire, the Task Load Index, developed by the National Aeronautics and Space Administration. A convenience sample of electric-powered wheelchair (EPW) athletes (N=21) recruited at the 31st National Veterans Wheelchair Games. Not applicable. Total composite PMRT score. The PMRT had high interrater reliability (intraclass correlation coefficient [ICC]>.75) between the 2 raters in all 5 driving conditions. Post hoc analyses revealed that the reliability analyses had >80% power to detect high ICCs in driving conditions 1 and 4. The PMRT has high interrater reliability in conditions 1 and 4 and could be used to assess EPW driving performance virtually in VRSIM-2. However, further psychometric assessment is necessary to assess the feasibility of administering the PMRT using the different interfaces of VRSIM-2. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  15. Ultrasound assessment for grading structural tendon changes in supraspinatus tendinopathy: an inter-rater reliability study

    DEFF Research Database (Denmark)

    Ingwersen, Kim Gordon; Hjarbæk, John; Eshøj, Henrik

    2016-01-01

    Aim To evaluate the inter-rater reliability of measuring structural changes in the tendon of patients, clinically diagnosed with supraspinatus tendinopathy (cases) and healthy participants (controls), on ultrasound (US) images captured by standardised procedures. Methods A total of 40 participant...

  16. Quality of nursing intensity data: inter-rater reliability of the patient classification after two decades in clinical use.

    Science.gov (United States)

    Liljamo, Pia; Kinnunen, Ulla-Mari; Ohtonen, Pasi; Saranto, Kaija

    2017-09-01

    The aim of this study was to measure the inter-rater reliability of the Oulu Patient Classification and to discuss existing methods of reliability testing. The Oulu Patient Classification, part of the RAFAELA ® System, has been developed to assist nursing managers with the proper allocation of nursing resources. Due to the increased intensity of inpatient care during recent years, there is a need for the reliability testing of the classification, which has been in clinical use for 20 years. Retrospective statistical study. To test inter-rater reliability, a pair of nurses classified the same patients, without knowledge of each other's ratings, as a part of annually conducted standardization. Data on the parallel classifications (n = 19,997) was obtained from inpatient units (n = 32) with different specialties at a university hospital in Finland during 2010-2015. Parallel classification practices were also analysed. The reliability of the overall classification and its subareas were calculated using suitable statistical coefficients. Inter-rater reliability coefficients were a reliable or almost perfect means of considering the nursing intensity category and various practices, but there were detectable differences between subareas. The lowest agreement levels occurred in the subareas 'Planning and Coordination of Nursing Care' and 'Guiding of Care/Continued Care and Emotional Support'. There is a need to develop the descriptions of subareas and to clarify the related concepts. Precise nursing documentation can promote a high level of agreement and reliable results. The traditional overall proportion of agreement does not provide an adequate picture of reliability - weighted kappa coefficients should be used instead. © 2017 John Wiley & Sons Ltd.

  17. Interrater reliability of quantitative ultrasound using force feedback among examiners with varied levels of experience

    Directory of Open Access Journals (Sweden)

    Michael O. Harris-Love

    2016-06-01

    Full Text Available Background. Quantitative ultrasound measures are influenced by multiple external factors including examiner scanning force. Force feedback may foster the acquisition of reliable morphometry measures under a variety of scanning conditions. The purpose of this study was to determine the reliability of force-feedback image acquisition and morphometry over a range of examiner-generated forces using a muscle tissue-mimicking ultrasound phantom. Methods. Sixty material thickness measures were acquired from a muscle tissue mimicking phantom using B-mode ultrasound scanning by six examiners with varied experience levels (i.e., experienced, intermediate, and novice. Estimates of interrater reliability and measurement error with force feedback scanning were determined for the examiners. In addition, criterion-based reliability was determined using material deformation values across a range of examiner scanning forces (1–10 Newtons via automated and manually acquired image capture methods using force feedback. Results. All examiners demonstrated acceptable interrater reliability (intraclass correlation coefficient, ICC = .98, p .90, p < .001, independent of their level of experience. The measurement error among all examiners was 1.5%–2.9% across all applied stress conditions. Conclusion. Manual image capture with force feedback may aid the reliability of morphometry measures across a range of examiner scanning forces, and allow for consistent performance among examiners with differing levels of experience.

  18. Method of Quantifying Size of Retinal Hemorrhages in Eyes with Branch Retinal Vein Occlusion Using 14-Square Grid: Interrater and Intrarater Reliability

    Directory of Open Access Journals (Sweden)

    Yuko Takashima

    2016-01-01

    Full Text Available Purpose. To describe a method of quantifying the size of the retinal hemorrhages in branch retinal vein occlusion (BRVO and to determine the interrater and intrarater reliabilities of these measurements. Methods. Thirty-five fundus photographs from 35 consecutive eyes with BRVO were studied. The fundus images were analyzed with Power-Point® software, and a grid of 14 squares was laid over the fundus image. Raters were asked to judge the percentage of each of the 14 squares that was covered by the hemorrhages, and the average of the 14 squares was taken to be the relative size of the retinal hemorrhage. Results. Interrater reliability between three raters was higher when a grid with 14 squares was used (intraclass correlation coefficient (ICC, 0.96 than that when a box with no grid was used (ICC, 0.78. Intrarater reliability, which was calculated by the retinal hemorrhage area measured on two different days, was also higher (ICC, 0.97 than that with no grid (ICC, 0.86. Interrater reliability for five fundus pictures with poor image quality was also good when a grid with 14 squares was used (ICC, 0.88. Conclusions. Although our method is subjective, excellent interrater and intrarater reliabilities indicate that this method can be adapted for clinical use.

  19. Intra- and inter-rater reliability of the Knee Society Knee Score when used by two physiotherapists in patients post total knee arthroplasty

    Directory of Open Access Journals (Sweden)

    S. Gopal

    2010-01-01

    Full Text Available Background and Purpose: It has yet to be shown whether routine physiotherapy plays a role in the rehabilitation of patients post totalknee arthroplasty (Rajan et al 2004. Physiotherapists should be using validoutcome measures to provide evidence of the benefit of their intervention. The aim of this study was to establish the intra and inter-rater reliability of the Knee Society Knee Score, a scoring system developed by Insall et al(1989. The Knee Society Knee Score can be used to assess the integrity of theknee joint of patients undergoing total knee arthroplasty. Since the scoreinvolves clinical testing, the intra-rater reliability of the clinician should be established prior to using the scores as datain clinical research. W here multiple clinicians are involved, inter-rater reliability should also be established.Design: This was a correlation study.Subjects: A  sample of thirty patients post total knee arthroplasty attending the arthroplasty clinic at Johannesburg Hospital between six weeks and twelve months postoperatively.M ethod: Recruited patients were evaluated twice with a time interval of one hour between each assessment. Statistical A nalysis: The intra- and inter-rater reliability were estimated using Intraclass Correlation Coefficient (ICC. R esults: The intra-rater reliability showed excellent reliability (h= 0.95 for Examiner A  and good reliability (h= 0.71for Examiner B. The inter-rater reliability showed moderate reliability (h= 0.67 during test one and h= 0.66 during test two.Conclusion: The KSKS has good intra-rater reliability when tested within a period of one hour. The KSKS demonstrated moderate agreement for inter rater reliability.

  20. Validity and reliability of a novel immunosuppressive adverse effects scoring system in renal transplant recipients.

    Science.gov (United States)

    Meaney, Calvin J; Arabi, Ziad; Venuto, Rocco C; Consiglio, Joseph D; Wilding, Gregory E; Tornatore, Kathleen M

    2014-06-12

    After renal transplantation, many patients experience adverse effects from maintenance immunosuppressive drugs. When these adverse effects occur, patient adherence with immunosuppression may be reduced and impact allograft survival. If these adverse effects could be prospectively monitored in an objective manner and possibly prevented, adherence to immunosuppressive regimens could be optimized and allograft survival improved. Prospective, standardized clinical approaches to assess immunosuppressive adverse effects by health care providers are limited. Therefore, we developed and evaluated the application, reliability and validity of a novel adverse effects scoring system in renal transplant recipients receiving calcineurin inhibitor (cyclosporine or tacrolimus) and mycophenolic acid based immunosuppressive therapy. The scoring system included 18 non-renal adverse effects organized into gastrointestinal, central nervous system and aesthetic domains developed by a multidisciplinary physician group. Nephrologists employed this standardized adverse effect evaluation in stable renal transplant patients using physical exam, review of systems, recent laboratory results, and medication adherence assessment during a clinic visit. Stable renal transplant recipients in two clinical studies were evaluated and received immunosuppressive regimens comprised of either cyclosporine or tacrolimus with mycophenolic acid. Face, content, and construct validity were assessed to document these adverse effect evaluations. Inter-rater reliability was determined using the Kappa statistic and intra-class correlation. A total of 58 renal transplant recipients were assessed using the adverse effects scoring system confirming face validity. Nephrologists (subject matter experts) rated the 18 adverse effects as: 3.1 ± 0.75 out of 4 (maximum) regarding clinical importance to verify content validity. The adverse effects scoring system distinguished 1.75-fold increased gastrointestinal adverse

  1. Measurement of the Inter-Rater Reliability Rate Is Mandatory for Improving the Quality of a Medical Database: Experience with the Paulista Lung Cancer Registry.

    Science.gov (United States)

    Lauricella, Leticia L; Costa, Priscila B; Salati, Michele; Pego-Fernandes, Paulo M; Terra, Ricardo M

    2018-06-01

    Database quality measurement should be considered a mandatory step to ensure an adequate level of confidence in data used for research and quality improvement. Several metrics have been described in the literature, but no standardized approach has been established. We aimed to describe a methodological approach applied to measure the quality and inter-rater reliability of a regional multicentric thoracic surgical database (Paulista Lung Cancer Registry). Data from the first 3 years of the Paulista Lung Cancer Registry underwent an audit process with 3 metrics: completeness, consistency, and inter-rater reliability. The first 2 methods were applied to the whole data set, and the last method was calculated using 100 cases randomized for direct auditing. Inter-rater reliability was evaluated using percentage of agreement between the data collector and auditor and through calculation of Cohen's κ and intraclass correlation. The overall completeness per section ranged from 0.88 to 1.00, and the overall consistency was 0.96. Inter-rater reliability showed many variables with high disagreement (>10%). For numerical variables, intraclass correlation was a better metric than inter-rater reliability. Cohen's κ showed that most variables had moderate to substantial agreement. The methodological approach applied to the Paulista Lung Cancer Registry showed that completeness and consistency metrics did not sufficiently reflect the real quality status of a database. The inter-rater reliability associated with κ and intraclass correlation was a better quality metric than completeness and consistency metrics because it could determine the reliability of specific variables used in research or benchmark reports. This report can be a paradigm for future studies of data quality measurement. Copyright © 2018 American College of Surgeons. Published by Elsevier Inc. All rights reserved.

  2. Validity and Reliability of Assessing Body Composition Using a Mobile Application.

    Science.gov (United States)

    Macdonald, Elizabeth Z; Vehrs, Pat R; Fellingham, Gilbert W; Eggett, Dennis; George, James D; Hager, Ronald

    2017-12-01

    The purpose of this study was to determine the validity and reliability of the LeanScreen (LS) mobile application that estimates percent body fat (%BF) using estimates of circumferences from photographs. The %BF of 148 weight-stable adults was estimated once using dual-energy x-ray absorptiometry (DXA). Each of two administrators assessed the %BF of each subject twice using the LS app and manually measured circumferences. A mixed-model ANOVA and Bland-Altman analyses were used to compare the estimates of %BF obtained from each method. Interrater and intrarater reliabilities values were determined using multiple measurements taken by each of the two administrators. The LS app and manually measured circumferences significantly underestimated (P < 0.05) the %BF determined using DXA by an average of -3.26 and -4.82 %BF, respectively. The LS app (6.99 %BF) and manually measured circumferences (6.76 %BF) had large limits of agreement. All interrater and intrarater reliability coefficients of estimates of %BF using the LS app and manually measured circumferences exceeded 0.99. The estimates of %BF from manually measured circumferences and the LS app were highly reliable. However, these field measures are not currently recommended for the assessment of body composition because of significant bias and large limits of agreements.

  3. The inter-rater reliability of the incontinence-associated dermatitis intervention tool-D (IADIT-D) between two independent registered nurses of nursing home residents in long-term care facilities.

    Science.gov (United States)

    Braunschmidt, Brigitte; Müller, Gerhard; Jukic-Puntigam, Margareta; Steininger, Alfred

    2013-01-01

    Incontinence-associated dermatitis (IAD) is the clinical manifestation of moisture related skin damage (Beeckman, Woodward, & Gray, 2011). Valid assessment instruments are needed for risk assessment and classification of IAD. Aim of the quantitative-descriptive cross-sectional study was to determine the inter-rater reliability of the item scores of the German Incontinence Associated Dermatitis Intervention Tool (IADIT-D) between two independent assessors of nursing home residents (n = 381) in long-term care facilities. The 19 pairs of assessors consisted of registered nurses. The data analysis was computed first with the calculation of the total percentage of agreement. Because this value is not randomly adjusted, the calculation of the Kappa-coefficients and AC1-Statistic was done as well. The total percentage of the inter-rater agreement was 84% (n = 319). In a second step of analysis, the calculation of all items determined high (kappa = .70) and very high agreement (AC1 = .83) levels, respectively. For the risk assessment (kappa = .82; AC1 = .94), the values amounted to very high agreement levels and for the classification (kappa(w) = .70; AC1 = .76) to high agreement levels. The high to very high agreement values of IADIT-D demonstrate that the items can be regarded as stable in regards to the inter-rater reliability for the use in long-term care facilities. In addition, further validation studies are needed.

  4. Monitoring sedation status over time in ICU patients: reliability and validity of the Richmond Agitation-Sedation Scale (RASS).

    Science.gov (United States)

    Ely, E Wesley; Truman, Brenda; Shintani, Ayumi; Thomason, Jason W W; Wheeler, Arthur P; Gordon, Sharon; Francis, Joseph; Speroff, Theodore; Gautam, Shiva; Margolin, Richard; Sessler, Curtis N; Dittus, Robert S; Bernard, Gordon R

    2003-06-11

    Goal-directed delivery of sedative and analgesic medications is recommended as standard care in intensive care units (ICUs) because of the impact these medications have on ventilator weaning and ICU length of stay, but few of the available sedation scales have been appropriately tested for reliability and validity. To test the reliability and validity of the Richmond Agitation-Sedation Scale (RASS). Prospective cohort study. Adult medical and coronary ICUs of a university-based medical center. Thirty-eight medical ICU patients enrolled for reliability testing (46% receiving mechanical ventilation) from July 21, 1999, to September 7, 1999, and an independent cohort of 275 patients receiving mechanical ventilation were enrolled for validity testing from February 1, 2000, to May 3, 2001. Interrater reliability of the RASS, Glasgow Coma Scale (GCS), and Ramsay Scale (RS); validity of the RASS correlated with reference standard ratings, assessments of content of consciousness, GCS scores, doses of sedatives and analgesics, and bispectral electroencephalography. In 290-paired observations by nurses, results of both the RASS and RS demonstrated excellent interrater reliability (weighted kappa, 0.91 and 0.94, respectively), which were both superior to the GCS (weighted kappa, 0.64; P<.001 for both comparisons). Criterion validity was tested in 411-paired observations in the first 96 patients of the validation cohort, in whom the RASS showed significant differences between levels of consciousness (P<.001 for all) and correctly identified fluctuations within patients over time (P<.001). In addition, 5 methods were used to test the construct validity of the RASS, including correlation with an attention screening examination (r = 0.78, P<.001), GCS scores (r = 0.91, P<.001), quantity of different psychoactive medication dosages 8 hours prior to assessment (eg, lorazepam: r = - 0.31, P<.001), successful extubation (P =.07), and bispectral electroencephalography (r = 0.63, P

  5. Validity and reliability of a Malay version of the Lawton instrumental activities of daily living scale among the Malay speaking elderly in Malaysia.

    Science.gov (United States)

    Kadar, Masne; Ibrahim, Suhaili; Razaob, Nor Afifi; Chai, Siaw Chui; Harun, Dzalani

    2018-02-01

    The Lawton Instrumental Activities of Daily Living Scale is a tool often used to assess independence among elderly at home. Its suitability to be used with the elderly population in Malaysia has not been validated. This current study aimed to assess the validity and reliability of the Lawton Instrumental Activities of Daily Living Scale - Malay Version to Malay speaking elderly in Malaysia. This study was divided into three phases: (1) translation and linguistic validity involving both forward and backward translations; (2) establishment of face validity and content validity; and (3) establishment of reliability involving inter-rater, test-retest and internal consistency analyses. Data used for these analyses were obtained by interviewing 65 elderly respondents. Percentages of Content Validity Index for 4 criteria were from 88.89 to 100.0. The Cronbach α coefficient for internal consistency was 0.838. Intra-class Correlation Coefficient of inter-rater reliability and test-retest reliability was 0.957 and 0.950 respectively. The result shows that the Lawton Instrumental Activities of Daily Living Scale - Malay Version has excellent reliability and validity for use with the Malay speaking elderly people in Malaysia. This scale could be used by professionals to assess functional ability of elderly who live independently in community. © 2018 Occupational Therapy Australia.

  6. Inter-Rater Reliability of Provider Interpretations of Irritable Bowel Syndrome Food and Symptom Journals.

    Science.gov (United States)

    Zia, Jasmine; Chung, Chia-Fang; Xu, Kaiyuan; Dong, Yi; Schenk, Jeanette M; Cain, Kevin; Munson, Sean; Heitkemper, Margaret M

    2017-11-04

    There are currently no standardized methods for identifying trigger food(s) from irritable bowel syndrome (IBS) food and symptom journals. The primary aim of this study was to assess the inter-rater reliability of providers' interpretations of IBS journals. A second aim was to describe whether these interpretations varied for each patient. Eight providers reviewed 17 IBS journals and rated how likely key food groups (fermentable oligo-di-monosaccharides and polyols, high-calorie, gluten, caffeine, high-fiber) were to trigger IBS symptoms for each patient. Agreement of trigger food ratings was calculated using Krippendorff's α-reliability estimate. Providers were also asked to write down recommendations they would give to each patient. Estimates of agreement of trigger food likelihood ratings were poor (average α = 0.07). Most providers gave similar trigger food likelihood ratings for over half the food groups. Four providers gave the exact same written recommendation(s) (range 3-7) to over half the patients. Inter-rater reliability of provider interpretations of IBS food and symptom journals was poor. Providers favored certain trigger food likelihood ratings and written recommendations. This supports the need for a more standardized method for interpreting these journals and/or more rigorous techniques to accurately identify personalized IBS food triggers.

  7. Inter-Rater Reliability of Provider Interpretations of Irritable Bowel Syndrome Food and Symptom Journals

    Directory of Open Access Journals (Sweden)

    Jasmine Zia

    2017-11-01

    Full Text Available There are currently no standardized methods for identifying trigger food(s from irritable bowel syndrome (IBS food and symptom journals. The primary aim of this study was to assess the inter-rater reliability of providers’ interpretations of IBS journals. A second aim was to describe whether these interpretations varied for each patient. Eight providers reviewed 17 IBS journals and rated how likely key food groups (fermentable oligo-di-monosaccharides and polyols, high-calorie, gluten, caffeine, high-fiber were to trigger IBS symptoms for each patient. Agreement of trigger food ratings was calculated using Krippendorff’s α-reliability estimate. Providers were also asked to write down recommendations they would give to each patient. Estimates of agreement of trigger food likelihood ratings were poor (average α = 0.07. Most providers gave similar trigger food likelihood ratings for over half the food groups. Four providers gave the exact same written recommendation(s (range 3–7 to over half the patients. Inter-rater reliability of provider interpretations of IBS food and symptom journals was poor. Providers favored certain trigger food likelihood ratings and written recommendations. This supports the need for a more standardized method for interpreting these journals and/or more rigorous techniques to accurately identify personalized IBS food triggers.

  8. Examining Design and Inter-Rater Reliability of a Rubric Measuring Research Quality across Multiple Disciplines

    Directory of Open Access Journals (Sweden)

    Marilee J. Bresciani

    2009-05-01

    Full Text Available The paper presents a rubric to help evaluate the quality of research projects. The rubric was applied in a competition across a variety of disciplines during a two-day research symposium at one institution in the southwest region of the United States of America. It was collaboratively designed by a faculty committee at the institution and was administered to 204 undergraduate, master, and doctoral oral presentations by approximately 167 different evaluators. No training or norming of the rubric was given to 147 of the evaluators prior to the competition. The findings of the inter-rater reliability analysis reveal substantial agreement among the judges, which contradicts literature describing the fact that formal norming must occur prior to seeing substantial levels of inter-rater reliability. By presenting the rubric along with the methodology used in its design and evaluation, it is hoped that others will find this to be a useful tool for evaluating documents and for teaching research methods.

  9. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age

    NARCIS (Netherlands)

    van Daalen, Emma; Kemner, Chantal; Dietz, Claudine; Swinkels, Sophie H. N.; Buitelaar, Jan K.; van Engeland, Herman

    2009-01-01

    To examine the inter-rater reliability and stability of autism spectrum disorder (ASD) diagnoses made at a very early age in children identified through a screening procedure around 14 months of age. In a prospective design, preschoolers were recruited from a screening study for ASD. The inter-rater

  10. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age.

    NARCIS (Netherlands)

    Daalen, E. van; Kemner, C.; Dietz, C.; Swinkels, S.H.N.; Buitelaar, J.K.; Engeland, H.M. van

    2009-01-01

    To examine the inter-rater reliability and stability of autism spectrum disorder (ASD) diagnoses made at a very early age in children identified through a screening procedure around 14 months of age. In a prospective design, preschoolers were recruited from a screening study for ASD. The inter-rater

  11. The Effect of Instrument-Specific Rater Training on Interrater Reliability and Counseling Skills Performance Differentiation

    Science.gov (United States)

    Meacham, Paul Douglas, Jr.

    2013-01-01

    The purpose of this study was to explore the effect of instrument-specific rater training on interrater reliability (IRR) and counseling skills performance differentiation. Strong IRR is of primary concern to effective program evaluation (McCullough, Kuhn, Andrews, Valen, Hatch, & Osimo, 2003; Schanche, Nielsen, McCullough, Valen, &…

  12. Reliability and Validity of Korean Version of Apraxia Screen of TULIA (K-AST).

    Science.gov (United States)

    Kim, Soo Jin; Yang, You-Na; Lee, Jong Won; Lee, Jin-Youn; Jeong, Eunhwa; Kim, Bo-Ram; Lee, Jongmin

    2016-10-01

    To evaluate the reliability and validity of Korean version of AST (K-AST) as a bedside screening test of apraxia in patients with stroke for early and reliable detection. AST was translated into Korean, and the translated version received authorization from the author of AST. The performances of K-AST in 26 patients (21 males, 5 females; mean age 65.42±17.31 years) with stroke (23 ischemic, 3 hemorrhagic) were videotaped. To test the reliability and validity of K-AST, the recorded performances were assessed by two physiatrists and two occupational therapists twice at a 1-week interval. The patient performances at admission in Korean version of Mini-Mental State Examination (K-MMSE), self-care and transfer categories of Functional Independence Measure (FIM), and motor praxis area of Loewenstein Occupational Therapy Cognitive Assessment, the second edition (LOTCA-II) were also evaluated. Scores of motor praxis area of LOTCA-II was used to assess the validity of K-AST. Inter-rater reliabilities were 0.983 (preliable and valid test for bedside screening of apraxia.

  13. Development of a valid and reliable test to assess trauma radiograph interpretation performance

    International Nuclear Information System (INIS)

    Neep, M.J.; Steffens, T.; Riley, V.; Eastgate, P.; McPhail, S.M.

    2017-01-01

    Objectives: The purpose of this investigation was to develop and examine the preliminary validity and reliability among radiographers of a test to assess trauma radiograph interpretation performance suitable for use among health professionals. Methods: Stage 1 examined 14,159 consecutive appendicular and axial examinations from a hospital emergency department over a 12 month period to quantify a typical anatomical region case-mix of trauma radiographs. A sample of radiographic cases representative of affected anatomical regions was then developed into the Image Interpretation Test (IIT). Stage 2 involved prospective investigations of the IIT's reliability (inter-rater, intra-rater, internal consistency) and validity (concurrent) among 41 radiographers. Results: The IIT included 60 cases. The median (interquartile range) clinical experience of participants was 5 (2–10) years. Case scores were internally consistent (Cronbach's alpha = 0.90). Favourable inter-rater reliability (kappa > 0.70 for 58/60 cases, Intra-class correlation coefficient (ICC) > 0.99 for total score) and intra-rater reliability (kappa > 0.90 for 60/60 cases, ICC > 0.99 for total score) was observed. There was a positive association between radiographers' confidence in image interpretation and IIT score (coefficient = 1.52, r-squared = 0.60, p < 0.001). Conclusions: The IIT developed during this investigation included a selection of radiographic cases consistent with anatomical regions represented in an adult trauma case-mix. This study has also provided foundational preliminary evidence to support the reliability and validity of the IIT among radiographers. The findings suggest that it is possible to assess image interpretation performance of adult trauma radiographs with this test. - Highlights: • Development of an Image Interpretation Test (IIT). • Cases consistent with anatomical regions represented in a typical adult trauma case-mix. • Development of a

  14. The reliability, validity, and applicability of an English language version of the Mini-ICF-APP.

    Science.gov (United States)

    Molodynski, Andrew; Linden, Michael; Juckel, George; Yeeles, Ksenija; Anderson, Catriona; Vazquez-Montes, Maria; Burns, Tom

    2013-08-01

    This study aimed at establishing the validity and reliability of an English language version of the Mini-ICF-APP. One hundred and five patients under the care of secondary mental health care services were assessed using the Mini-ICF-APP and several well-established measures of functioning and symptom severity. 47 (45 %) patients were interviewed on two occasions to ascertain test-retest reliability and 50 (48 %) were interviewed by two researchers simultaneously to determine the instrument's inter-rater reliability. Occupational and sick leave status were also recorded to assess construct validity. The Mini-ICF-APP was found to have substantial internal consistency (Chronbach's α 0.869-0.912) and all 13 items correlated highly with the total score. Analysis also showed that the Mini-ICF-APP had good test-retest (ICC 0.832) and inter-rater (ICC 0.886) reliability. No statistically significant association with length of sick leave was found, but the unemployed scored higher on the Mini ICF-APP than those in employment (mean 18.4, SD 9.1 vs. 9.4, SD 6.4, p Mini-ICF-APP correlated highly with the other measures of illness severity and functioning considered in the study. The English version of the Mini-ICF-APP is a reliable and valid measure of disorders of capacity as defined by the International Classification of Functioning. Further work is necessary to establish whether the scale could be divided into sub scales which would allow the instrument to more sensitively measure an individual's specific impairments.

  15. Validity and reliability of using photography for measuring knee range of motion: a methodological study

    Directory of Open Access Journals (Sweden)

    Adie Sam

    2011-04-01

    Full Text Available Abstract Background The clinimetric properties of knee goniometry are essential to appreciate in light of its extensive use in the orthopaedic and rehabilitative communities. Intra-observer reliability is thought to be satisfactory, but the validity and inter-rater reliability of knee goniometry often demonstrate unacceptable levels of variation. This study tests the validity and reliability of measuring knee range of motion using goniometry and photographic records. Methods Design: Methodology study assessing the validity and reliability of one method ('Marker Method' which uses a skin marker over the greater trochanter and another method ('Line of Femur Method' which requires estimation of the line of femur. Setting: Radiology and orthopaedic departments of two teaching hospitals. Participants: 31 volunteers (13 arthritic and 18 healthy subjects. Knee range of motion was measured radiographically and photographically using a goniometer. Three assessors were assessed for reliability and validity. Main outcomes: Agreement between methods and within raters was assessed using concordance correlation coefficient (CCCs. Agreement between raters was assessed using intra-class correlation coefficients (ICCs. 95% limits of agreement for the mean difference for all paired comparisons were computed. Results Validity (referenced to radiographs: Each method for all 3 raters yielded very high CCCs for flexion (0.975 to 0.988, and moderate to substantial CCCs for extension angles (0.478 to 0.678. The mean differences and 95% limits of agreement were narrower for flexion than they were for extension. Intra-rater reliability: For flexion and extension, very high CCCs were attained for all 3 raters for both methods with slightly greater CCCs seen for flexion (CCCs varied from 0.981 to 0.998. Inter-rater reliability: For both methods, very high ICCs (min to max: 0.891 to 0.995 were obtained for flexion and extension. Slightly higher coefficients were obtained

  16. An examination of the interrater reliability between practitioners and researchers on the static-99.

    Science.gov (United States)

    Quesada, Stephen P; Calkins, Cynthia; Jeglic, Elizabeth L

    2014-11-01

    Many studies have validated the psychometric properties of the Static-99, the most widely used measure of sexual offender recidivism risk. However much of this research relied on instrument coding completed by well-trained researchers. This study is the first to examine the interrater reliability (IRR) of the Static-99 between practitioners in the field and researchers. Using archival data from a sample of 1,973 formerly incarcerated sex offenders, field raters' scores on the Static-99 were compared with those of researchers. Overall, clinicians and researchers had excellent IRR on Static-99 total scores, with IRR coefficients ranging from "substantial" to "outstanding" for the individual 10 items of the scale. The most common causes of discrepancies were coding manual errors, followed by item subjectivity, inaccurate item scoring, and calculation errors. These results offer important data with regard to the frequency and perceived nature of scoring errors. © The Author(s) 2013.

  17. Reliability and validity of food portion size estimation from images using manual flexible digital virtual meshes

    Science.gov (United States)

    The eButton takes frontal images at 4 second intervals throughout the day. A three-dimensional (3D) manually administered wire mesh procedure has been developed to quantify portion sizes from the two-dimensional (2D) images. This paper reports a test of the interrater reliability and validity of use...

  18. Reliability, validity and minimal detectable change of the Mini-BESTest in Greek participants with chronic stroke.

    Science.gov (United States)

    Lampropoulou, Sofia I; Billis, Evdokia; Gedikoglou, Ingrid A; Michailidou, Christina; Nowicky, Alexander V; Skrinou, Dimitra; Michailidi, Fotini; Chandrinou, Danae; Meligkoni, Margarita

    2018-02-23

    This study aimed to investigate the psychometric characteristics of reliability, validity and ability to detect change of a newly developed balance assessment tool, the Mini-BESTest, in Greek patients with stroke. A prospective, observational design study with test-retest measures was conducted. A convenience sample of 21 Greek patients with chronic stroke (14 male, 7 female; age of 63 ± 16 years) was recruited. Two independent examiners administered the scale, for the inter-rater reliability, twice within 10 days for the test-retest reliability. Bland Altman Analysis for repeated measures assessed the absolute reliability and the Standard Error of Measurement (SEM) and the Minimum Detectable Change at 95% confidence interval (MDC 95% ) were established. The Greek Mini-BESTest (Mini-BESTest GR ) was correlated with the Greek Berg Balance Scale (BBS GR ) for assessing the concurrent validity and with the Timed Up and Go (TUG), the Functional Reach Test (FRT) and the Greek Falls Efficacy Scale-International (FES-I GR ) for the convergent validity. The Mini-BESTestGR demonstrated excellent inter-rater reliability (ICC (95%CI) = 0.997 (0.995-0.999, SEM = 0.46) with the scores of two raters within the limits of agreement (mean dif  = -0.143 ± 0.727, p > 0.05) and test-retest reliability (ICC (95%CI) = 0.966 (0.926-0.988), SEM = 1.53). Additionally, the Mini-BESTest GR yielded very strong to moderate correlations with BBS GR (r = 0.924, p reliability and the equally good validity of the Mini-BESTest GR , strongly support its utility in Greek people with chronic stroke. Its ability to identify clinically meaningful changes and falls risk need further investigation.

  19. Toward feasible, valid, and reliable video-based assessments of technical surgical skills in the operating room

    DEFF Research Database (Denmark)

    Aggarwal, R.; Grantcharov, T.; Moorthy, K.

    2008-01-01

    .72). Conclusions: Video-based technical skills evaluation in the operating room is feasible, valid and reliable. Global rating scales hold promise for summative assessment, though further work is necessary to elucidate the value of procedural rating scales Udgivelsesdato: 2008/2......Objective: To determine the feasibility, validity, inter-rater, and intertest reliability of 4 previously published video-based rating scales, for technical skills assessment on a benchmark laparoscopic procedure. Summary Background Data: Assessment of technical skills is crucial...... to the demonstration and maintenance of competent healthcare practitioners. Traditional assessment methods are prone to subjectivity through a lack of proven validity and reliability. Methods: Nineteen surgeons (6 novice and 13 experienced) performed a median of 2 laparoscopic cholecystectomies each (range 1-5) on 53...

  20. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age.

    Science.gov (United States)

    van Daalen, Emma; Kemner, Chantal; Dietz, Claudine; Swinkels, Sophie H N; Buitelaar, Jan K; van Engeland, Herman

    2009-11-01

    To examine the inter-rater reliability and stability of autism spectrum disorder (ASD) diagnoses made at a very early age in children identified through a screening procedure around 14 months of age. In a prospective design, preschoolers were recruited from a screening study for ASD. The inter-rater reliability of the diagnosis of ASD was measured through an independent assessment of a randomly selected subsample of 38 patients by two other psychiatrists. The diagnoses at 23 months and 42 months of 131 patients, based on the clinical assessment and the diagnostic classifications of standardised instruments, were compared to evaluate stability of the diagnosis of ASD. Inter-rater reliability on a diagnosis of ASD versus non-ASD at 23 months was 87% with a weighted kappa of 0.74 (SE 0.11). The stability of the different diagnoses in the autism spectrum was 63% for autistic disorder, 54% for pervasive developmental disorder, not otherwise specified (PDD-NOS), and 91% for the whole category of ASD. Most diagnostic changes at 42 months were within the autism spectrum from autistic disorder to PDD-NOS and were mainly due to diminished symptom severity. Children who moved outside the ASD category at 42 months made significantly larger gains in cognitive and language skills than children with a stable ASD diagnosis. In conclusion, the inter-rater reliability and stability of the diagnoses of ASD established at 23 months in this population-based sample of very young children are good.

  1. Inter-rater and test-retest reliability of quality assessments by novice student raters using the Jadad and Newcastle-Ottawa Scales.

    Science.gov (United States)

    Oremus, Mark; Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

    2012-01-01

    Quality assessment of included studies is an important component of systematic reviews. The authors investigated inter-rater and test-retest reliability for quality assessments conducted by inexperienced student raters. Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle-Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13-20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. McMaster Integrative Neuroscience Discovery and Study Program. 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test-retest reliability using ICC(2,1). Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI -0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were -0.14 (95% CI -0.28 to 0.00) to 0.39 (95% CI -0.02 to 0.81) for the NOS cohort and -0.20 (95% CI -0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case-control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test-retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were -0.19 (95% CI -0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case-control, the ICC(2,1)s were 0.46 (95% CI -0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95). Inter-rater reliability was generally poor

  2. [Reliability and Validity of the Behavioral Check List for Preschool Children to Measure Attention Deficit Hyperactivity Behaviors].

    Science.gov (United States)

    Tsuno, Kanami; Yoshimasu, Kouichi; Hayashi, Takashi; Tatsuta, Nozomi; Ito, Yuki; Kamijima, Michihiro; Nakai, Kunihiko

    2018-01-01

    Nowadays, attention deficit hyperactivity (ADH) problems are observed commonly among school-age children. However, questionnaires specific to ADH behaviors among preschool children are very few. The aim of this study was to investigate the reliability and validity of the 25-item Behavioral Check List (BCL), which was developed from interviews of parents with children who were diagnosed as having Attention-deficit/hyperactivity disorder (ADHD) and measures ADH behaviors in preschool age. We recruited 22 teachers from 10 nurseries/kindergartens in Miyagi Prefecture, Japan. A total of 138 preschool children were assessed using the BCL. To investigate inter-rater reliability, two teachers from each facility assess seven to twenty children in their class, and intraclass correlation coefficients (ICCs) were calculated. The teachers additionally answered questions in the 1/5-5 Caregiver-Teacher Report Form (C-TRF) to investigate the criterion validity of the BCL. To investigate structural validity, exploratory factor analysis with promax rotation and confirmatory factor analysis were performed. The internal consistency reliability of the BCL was good (α = 0.92) and correlation analyses also confirmed its excellent criterion validity. Although exploratory factor analysis for the BCL yielded a five-factor model that consisted of a factor structure different from that of the original one, the results were similar to the original six factors. The ICCs of the BCL were 0.38-0.99 and it was not high enough for inter-rater reliability in some facilities. However, there is a possibility to improve it by giving raters adequate explanations when using BCL. The present study showed acceptable levels of reliability and validity of the BCL among Japanese preschool children.

  3. [Reliability and validity of warning signs checklist for screening psychological, behavioral and developmental problems of children].

    Science.gov (United States)

    Huang, X N; Zhang, Y; Feng, W W; Wang, H S; Cao, B; Zhang, B; Yang, Y F; Wang, H M; Zheng, Y; Jin, X M; Jia, M X; Zou, X B; Zhao, C X; Robert, J; Jing, Jin

    2017-06-02

    Objective: To evaluate the reliability and validity of warning signs checklist developed by the National Health and Family Planning Commission of the People's Republic of China (NHFPC), so as to determine the screening effectiveness of warning signs on developmental problems of early childhood. Method: Stratified random sampling method was used to assess the reliability and validity of checklist of warning sign and 2 110 children 0 to 6 years of age(1 513 low-risk subjects and 597 high-risk subjects) were recruited from 11 provinces of China. The reliability evaluation for the warning signs included the test-retest reliability and interrater reliability. With the use of Age and Stage Questionnaire (ASQ) and Gesell Development Diagnosis Scale (GESELL) as the criterion scales, criterion validity was assessed by determining the correlation and consistency between the screening results of warning signs and the criterion scales. Result: In terms of the warning signs, the screening positive rates at different ages ranged from 10.8%(21/141) to 26.2%(51/137). The median (interquartile) testing time for each subject was 1(0.6) minute. Both the test-retest reliability and interrater reliability of warning signs reached 0.7 or above, indicating that the stability was good. In terms of validity assessment, there was remarkable consistency between ASQ and warning signs, with the Kappa value of 0.63. With the use of GESELL as criterion, it was determined that the sensitivity of warning signs in children with suspected developmental delay was 82.2%, and the specificity was 77.7%. The overall Youden index was 0.6. Conclusion: The reliability and validity of warning signs checklist for screening early childhood developmental problems have met the basic requirements of psychological screening scales, with the characteristics of short testing time and easy operation. Thus, this warning signs checklist can be used for screening psychological and behavioral problems of early childhood

  4. Reliable and valid assessment of Lichtenstein hernia repair skills.

    Science.gov (United States)

    Carlsen, C G; Lindorff-Larsen, K; Funch-Jensen, P; Lund, L; Charles, P; Konge, L

    2014-08-01

    Lichtenstein hernia repair is a common surgical procedure and one of the first procedures performed by a surgical trainee. However, formal assessment tools developed for this procedure are few and sparsely validated. The aim of this study was to determine the reliability and validity of an assessment tool designed to measure surgical skills in Lichtenstein hernia repair. Key issues were identified through a focus group interview. On this basis, an assessment tool with eight items was designed. Ten surgeons and surgical trainees were video recorded while performing Lichtenstein hernia repair, (four experts, three intermediates, and three novices). The videos were blindly and individually assessed by three raters (surgical consultants) using the assessment tool. Based on these assessments, validity and reliability were explored. The internal consistency of the items was high (Cronbach's alpha = 0.97). The inter-rater reliability was very good with an intra-class correlation coefficient (ICC) = 0.93. Generalizability analysis showed a coefficient above 0.8 even with one rater. The coefficient improved to 0.92 if three raters were used. One-way analysis of variance found a significant difference between the three groups which indicates construct validity, p fashion with the new procedure-specific assessment tool. We recommend this tool for future assessment of trainees performing Lichtenstein hernia repair to ensure that the objectives of competency-based surgical training are met.

  5. Reliability, validity and description of timed performance of the Jebsen-Taylor Test in patients with muscular dystrophies.

    Science.gov (United States)

    Artilheiro, Mariana Cunha; Fávero, Francis Meire; Caromano, Fátima Aparecida; Oliveira, Acary de Souza Bulle; Carvas, Nelson; Voos, Mariana Callil; Sá, Cristina Dos Santos Cardoso de

    2017-12-08

    The Jebsen-Taylor Test evaluates upper limb function by measuring timed performance on everyday activities. The test is used to assess and monitor the progression of patients with Parkinson disease, cerebral palsy, stroke and brain injury. To analyze the reliability, internal consistency and validity of the Jebsen-Taylor Test in people with Muscular Dystrophy and to describe and classify upper limb timed performance of people with Muscular Dystrophy. Fifty patients with Muscular Dystrophy were assessed. Non-dominant and dominant upper limb performances on the Jebsen-Taylor Test were filmed. Two raters evaluated timed performance for inter-rater reliability analysis. Test-retest reliability was investigated by using intraclass correlation coefficients. Internal consistency was assessed using the Cronbach alpha. Construct validity was conducted by comparing the Jebsen-Taylor Test with the Performance of Upper Limb. The internal consistency of Jebsen-Taylor Test was good (Cronbach's α=0.98). A very high inter-rater reliability (0.903-0.999), except for writing with an Intraclass correlation coefficient of 0.772-1.000. Strong correlations between the Jebsen-Taylor Test and the Performance of Upper Limb Module were found (rho=-0.712). The Jebsen-Taylor Test is a reliable and valid measure of timed performance for people with Muscular Dystrophy. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.

  6. Intra-rater and inter-rater reliability of the standardized ultrasound protocol for assessing subacromial structures

    DEFF Research Database (Denmark)

    Hougs Kjær, Birgitte; Ellegaard, Karen; Wieland, Ina

    2017-01-01

    BACKGROUND: US-examinations related to shoulder impingement (SI) often vary due to methodological differences, examiner positions, transducers, and recording parameters. Reliable US protocols for examination of different structures related to shoulder impingement are therefore needed. OBJECTIVES...... of the supraspinatus tendon (SUPRA) and subacromial subdeltoid (SASD) bursa in two imaging positions, and the acromial humeral distance (AHD) in one position. Additionally, agreement on dynamic impingement (DI) examination was performed. The intra- and inter-rater reliability was carried out on the same day...

  7. [Reliability and validity of the standardized Mini Mental State Examination in the diagnosis of mild dementia in Turkish population].

    Science.gov (United States)

    Güngen, Can; Ertan, Turan; Eker, Engin; Yaşar, Resmiye; Engin, Funda

    2002-01-01

    Reliability and validity of the Mini Mental State Examination in differentiating mild dementia from normal controls in Turkish population. The Standardized Mini Mental State Examination (SMMSE) and its instruction were translated into Turkish. A total of 212 subjects with mean age of 77 +/- 6, were recruited for the study. 71 were diagnosed to be demented and 141 were evaluated as normal controls. The scale total score was analysed for discriminant validity using Student's t-test. Sensitivity, specificity, positive and negative predictive values and kappa score were calculated for all of the scores between 18 and 29. Kappa value was calculated for the comparison of the dementia diagnosis between the two investigators using the best cut off score obtained in the analysis above. Statistical analysis revealed that the Turkish version of the SMMSE has high discriminant validity and interrater reliability in the diagnosis of mild dementia. The cut off score 23/24 was found to have the highest sensitivity (0.91), specificity (0.95), positive and negative predictive values (0.90 and 0.95) and kappa score (0.86). Interrater reliability analysis showed high correlation (r:0.99) and kappa value (0.92). The results of this study showed that the Turkish version of the SMMSE has high reliability and validity for the diagnosis of mild dementia in Turkish population.

  8. Inter-rater reliability and agreement of the 6-minute walk test in females with hip fractures

    DEFF Research Database (Denmark)

    Overgaard, Jan; Larsen, Camilla Marie; Tange Kristensen, Morten

    physiotherapy students independently examined (randomized order) a convenient sample of 20 participants; their assessments were separated by two days, and testing followed instructions from the American Thoracic Society. Hip pain was assessed with the Verbal Ranking Scale. Participants (all women) with a mean...... (SD) age of 78.1 ± 5.9 years performed the test within a mean of 31.5 ± 5.8 days post-surgery; 10 had a cervical and 10 a trochanteric fracture. Excellent inter-rater reliability; ICC2.1 = 0.92 (95% CI, 0.81 - 0.97) was found, and the standard error of measurement (SEM) and smallest real difference.......6 meters longer, at the second trial (P = 0.002). Participants with moderate hip fracture-related pain walked a shorter distance than those with no or light pain during the first test (P = 0.04), while this was not the case during the second (P = 0.25). Excellent inter-rater reliability was found...

  9. Developing a contributing factor classification scheme for Rasmussen's AcciMap: Reliability and validity evaluation.

    Science.gov (United States)

    Goode, N; Salmon, P M; Taylor, N Z; Lenné, M G; Finch, C F

    2017-10-01

    One factor potentially limiting the uptake of Rasmussen's (1997) Accimap method by practitioners is the lack of a contributing factor classification scheme to guide accident analyses. This article evaluates the intra- and inter-rater reliability and criterion-referenced validity of a classification scheme developed to support the use of Accimap by led outdoor activity (LOA) practitioners. The classification scheme has two levels: the system level describes the actors, artefacts and activity context in terms of 14 codes; the descriptor level breaks the system level codes down into 107 specific contributing factors. The study involved 11 LOA practitioners using the scheme on two separate occasions to code a pre-determined list of contributing factors identified from four incident reports. Criterion-referenced validity was assessed by comparing the codes selected by LOA practitioners to those selected by the method creators. Mean intra-rater reliability scores at the system (M = 83.6%) and descriptor (M = 74%) levels were acceptable. Mean inter-rater reliability scores were not consistently acceptable for both coding attempts at the system level (M T1  = 68.8%; M T2  = 73.9%), and were poor at the descriptor level (M T1  = 58.5%; M T2  = 64.1%). Mean criterion referenced validity scores at the system level were acceptable (M T1  = 73.9%; M T2  = 75.3%). However, they were not consistently acceptable at the descriptor level (M T1  = 67.6%; M T2  = 70.8%). Overall, the results indicate that the classification scheme does not currently satisfy reliability and validity requirements, and that further work is required. The implications for the design and development of contributing factors classification schemes are discussed. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Investigating the reliability and validity of the waterlow risk assessment scale: a literature review.

    LENUS (Irish Health Repository)

    Walsh, Breda

    2012-02-01

    The aim of this review was to examine health literature on the reliability and validity of the Waterlow pressure sore assessment scale. A systematic review of published studies relating to the topic was conducted and literature was examined for its relevancy to the topic under investigation. Findings suggest that despite the availability of over 40 assessment tools, the Waterlow assessment scale is the most frequently used by health care staff. Research suggests that the Waterlow Scale is an unreliable method of assessing individuals at risk of pressure sore development with all studies indicating a poor interrater reliability status. Its validity has also been criticized because of its high-sensitivity but low-specificity levels.

  11. Investigating the reliability and validity of the waterlow risk assessment scale: a literature review.

    LENUS (Irish Health Repository)

    Walsh, Breda

    2011-05-01

    The aim of this review was to examine health literature on the reliability and validity of the Waterlow pressure sore assessment scale. A systematic review of published studies relating to the topic was conducted and literature was examined for its relevancy to the topic under investigation. Findings suggest that despite the availability of over 40 assessment tools, the Waterlow assessment scale is the most frequently used by health care staff. Research suggests that the Waterlow Scale is an unreliable method of assessing individuals at risk of pressure sore development with all studies indicating a poor interrater reliability status. Its validity has also been criticized because of its high-sensitivity but low-specificity levels.

  12. Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

    Science.gov (United States)

    van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M.

    2018-01-01

    In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…

  13. Reliability, validity, and minimal detectable change of the push-off test scores in assessing upper extremity weight-bearing ability.

    Science.gov (United States)

    Mehta, Saurabh P; George, Hannah R; Goering, Christian A; Shafer, Danielle R; Koester, Alan; Novotny, Steven

    2017-11-01

    Clinical measurement study. The push-off test (POT) was recently conceived and found to be reliable and valid for assessing weight bearing through injured wrist or elbow. However, further research with larger sample can lend credence to the preliminary findings supporting the use of the POT. This study examined the interrater reliability, construct validity, and measurement error for the POT in patients with wrist conditions. Participants with musculoskeletal (MSK) wrist conditions were recruited. The performance on the POT, grip isometric strength of wrist extensors was assessed. The shortened version of the Disabilities of the Arm, Shoulder and Hand and numeric pain rating scale were completed. The intraclass correlation coefficient assessed interrater reliability of the POT. Pearson correlation coefficients (r) examined the concurrent relationships between the POT and other measures. The standard error of measurement and the minimal detectable change at 90% confidence interval were assessed as measurement error and index of true change for the POT. A total of 50 participants with different elbow or wrist conditions (age: 48.1 ± 16.6 years) were included in this study. The results of this study strongly supported the interrater reliability (intraclass correlation coefficient: 0.96 and 0.93 for the affected and unaffected sides, respectively) of the POT in patients with wrist MSK conditions. The POT showed convergent relationships with the grip strength on the injured side (r = 0.89) and the wrist extensor strength (r = 0.7). The POT showed smaller standard error of measurement (1.9 kg). The minimal detectable change at 90% confidence interval for the POT was 4.4 kg for the sample. This study provides additional evidence to support the reliability and validity of the POT. This is the first study that provides the values for the measurement error and true change on the POT scores in patients with wrist MSK conditions. Further research should examine the

  14. Measuring the quality of life in mild to very severe dementia: testing the inter-rater and intra-rater reliability of the German version of the QUALIDEM.

    Science.gov (United States)

    Dichter, Martin Nikolaus; Schwab, Christian G G; Meyer, Gabriele; Bartholomeyczik, Sabine; Dortmann, Olga; Halek, Margareta

    2014-05-01

    Quality of life (Qol) is an increasingly used outcome measure in dementia research. The QUALIDEM is a dementia-specific and proxy-rated Qol instrument. We aimed to determine the inter-rater and intra-rater reliability in residents with dementia in German nursing homes. The QUALIDEM consists of nine subscales that were applied to a sample of 108 people with mild to severe dementia and six consecutive subscales that were applied to a sample of 53 people with very severe dementia. The proxy raters were 49 registered nurses and nursing assistants. Inter-rater and intra-rater reliability scores were calculated on the subscale and item level. None of the QUALIDEM subscales showed strong inter-rater reliability based on the single-measure Intra-Class Correlation Coefficient (ICC) for absolute agreement ≥ 0.70. Based on the average-measure ICC for four raters, eight subscales for people with mild to severe dementia (care relationship, positive affect, negative affect, restless tense behavior, social relations, social isolation, feeling at home and having something to do) and five subscales for very severe dementia (care relationship, negative affect, restless tense behavior, social relations and social isolation) yielded a strong inter-rater agreement (ICC: 0.72-0.86). All of the QUALIDEM subscales, regardless of dementia severity, showed strong intra-rater agreement. The ICC values ranged between 0.70 and 0.79 for people with mild to severe dementia and between 0.75 and 0.87 for people with very severe dementia. This study demonstrated insufficient inter-rater reliability and sufficient intra-rater reliability for all subscales of both versions of the German QUALIDEM. The degree of inter-rater reliability can be improved by collaborative Qol rating by more than one nurse. The development of a measurement manual with accurate item definitions and a standardized education program for proxy raters is recommended.

  15. Reliability and Validity of 3 Methods of Assessing Orthopedic Resident Skill in Shoulder Surgery.

    Science.gov (United States)

    Bernard, Johnathan A; Dattilo, Jonathan R; Srikumaran, Uma; Zikria, Bashir A; Jain, Amit; LaPorte, Dawn M

    Traditional measures for evaluating resident surgical technical skills (e.g., case logs) assess operative volume but not level of surgical proficiency. Our goal was to compare the reliability and validity of 3 tools for measuring surgical skill among orthopedic residents when performing 3 open surgical approaches to the shoulder. A total of 23 residents at different stages of their surgical training were tested for technical skill pertaining to 3 shoulder surgical approaches using the following measures: Objective Structured Assessment of Technical Skills (OSATS) checklists, the Global Rating Scale (GRS), and a final pass/fail assessment determined by 3 upper extremity surgeons. Adverse events were recorded. The Cronbach α coefficient was used to assess reliability of the OSATS checklists and GRS scores. Interrater reliability was calculated with intraclass correlation coefficients. Correlations among OSATS checklist scores, GRS scores, and pass/fail assessment were calculated with Spearman ρ. Validity of OSATS checklists was determined using analysis of variance with postgraduate year (PGY) as a between-subjects factor. Significance was set at p shoulder approaches. Checklist scores showed superior interrater reliability compared with GRS and subjective pass/fail measurements. GRS scores were positively correlated across training years. The incidence of adverse events was significantly higher among PGY-1 and PGY-2 residents compared with more experienced residents. OSATS checklists are a valid and reliable assessment of technical skills across 3 surgical shoulder approaches. However, checklist scores do not measure quality of technique. Documenting adverse events is necessary to assess quality of technique and ultimate pass/fail status. Multiple methods of assessing surgical skill should be considered when evaluating orthopedic resident surgical performance. Copyright © 2016 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights

  16. Toward a Common Language for Measuring Patient Mobility in the Hospital: Reliability and Construct Validity of Interprofessional Mobility Measures.

    Science.gov (United States)

    Hoyer, Erik H; Young, Daniel L; Klein, Lisa M; Kreif, Julie; Shumock, Kara; Hiser, Stephanie; Friedman, Michael; Lavezza, Annette; Jette, Alan; Chan, Kitty S; Needham, Dale M

    2018-02-01

    The lack of common language among interprofessional inpatient clinical teams is an important barrier to achieving inpatient mobilization. In The Johns Hopkins Hospital, the Activity Measure for Post-Acute Care (AM-PAC) Inpatient Mobility Short Form (IMSF), also called "6-Clicks," and the Johns Hopkins Highest Level of Mobility (JH-HLM) are part of routine clinical practice. The measurement characteristics of these tools when used by both nurses and physical therapists for interprofessional communication or assessment are unknown. The purposes of this study were to evaluate the reliability and minimal detectable change of AM-PAC IMSF and JH-HLM when completed by nurses and physical therapists and to evaluate the construct validity of both measures when used by nurses. A prospective evaluation of a convenience sample was used. The test-retest reliability and the interrater reliability of AM-PAC IMSF and JH-HLM for inpatients in the neuroscience department (n = 118) of an academic medical center were evaluated. Each participant was independently scored twice by a team of 2 nurses and 1 physical therapist; a total of 4 physical therapists and 8 nurses participated in reliability testing. In a separate inpatient study protocol (n = 69), construct validity was evaluated via an assessment of convergent validity with other measures of function (grip strength, Katz Activities of Daily Living Scale, 2-minute walk test, 5-times sit-to-stand test) used by 5 nurses. The test-retest reliability values (intraclass correlation coefficients) for physical therapists and nurses were 0.91 and 0.97, respectively, for AM-PAC IMSF and 0.94 and 0.95, respectively, for JH-HLM. The interrater reliability values (intraclass correlation coefficients) between physical therapists and nurses were 0.96 for AM-PAC IMSF and 0.99 for JH-HLM. Construct validity (Spearman correlations) ranged from 0.25 between JH-HLM and right-hand grip strength to 0.80 between AM-PAC IMSF and the Katz Activities of

  17. Can Physicians Identify Inappropriate Nuclear Stress Tests? An Examination of Inter-rater Reliability for the 2009 Appropriate Use Criteria for Radionuclide Imaging

    Science.gov (United States)

    Ye, Siqin; Rabbani, LeRoy E.; Kelly, Christopher R.; Kelly, Maureen R.; Lewis, Matthew; Paz, Yehuda; Peck, Clara L.; Rao, Shaline; Bokhari, Sabahat; Weiner, Shepard D.; Einstein, Andrew J.

    2014-01-01

    Background We sought to determine inter-rater reliability of the 2009 Appropriate Use Criteria (AUC) for radionuclide imaging (RNI) and whether physicians at various levels of training can effectively identify nuclear stress tests with inappropriate indications. Methods and Results Four hundred patients were randomly selected from a consecutive cohort of patients undergoing nuclear stress testing at an academic medical center. Raters with different levels of training (including cardiology attending physicians, cardiology fellows, internal medicine hospitalists, and internal medicine interns) classified individual nuclear stress tests using the 2009 AUC. Consensus classification by two cardiologists was considered the operational gold standard, and sensitivity and specificity of individual raters for identifying inappropriate tests was calculated. Inter-rater reliability of the AUC was assessed using Cohen’s kappa statistics for pairs of different raters. The mean age of patients was 61.5 years; 214 (54%) were female. The cardiologists rated 256 (64%) of 400 NSTs as appropriate, 68 (18%) as uncertain, 55 (14%) as inappropriate; 21 (5%) tests were unable to be classified. Inter-rater reliability for non-cardiologist raters was modest (unweighted Cohen’s kappa, 0.51, 95% confidence interval, 0.45 to 0.55). Sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, while specificity ranged from 85% to 97%. Conclusions Inter-rater reliability for the 2009 AUC for RNI is modest, and there is considerable variation in the ability of raters at different levels of training to identify inappropriate tests. PMID:25563660

  18. Interrater reliability of the Melbourne Assessment of Unilateral Upper Limb Function for children with hemiplegic cerebral palsy.

    LENUS (Irish Health Repository)

    Spirtos, Michelle

    2012-02-01

    OBJECTIVE: We examined the interrater reliability of the Melbourne Assessment of Unilateral Upper Limb Function. METHOD: Three occupational therapists independently scored 34 videotaped assessments of children with hemiplegic cerebral palsy aged 6 yr, 1 mo, to 14 yr, 5 mo. Intraclass correlation coefficients (ICCs) at a 95% confidence interval were calculated for total scores, category scores, and item scores. RESULTS: The correlation between raters\\' total scores was high (ICC = .961). The highest correlation for test components between raters was found for fluency (ICC = .902), followed by range of movement (ICC = .866), and the lowest correlation was found for quality of movement (ICC = .683). The ICCs for individual test item scores varied and ranged from .368 to .899. CONCLUSION: This study demonstrated high interrater reliability for total scores, with scoring of some individual components and items requiring further consideration from both a clinical and a research perspective.

  19. Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales

    Science.gov (United States)

    Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

    2012-01-01

    Introduction Quality assessment of included studies is an important component of systematic reviews. Objective The authors investigated inter-rater and test–retest reliability for quality assessments conducted by inexperienced student raters. Design Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle–Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13–20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. Setting McMaster Integrative Neuroscience Discovery and Study Program. Participants 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. Main outcome measures The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test–retest reliability using ICC(2,1). Results Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI −0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were −0.14 (95% CI −0.28 to 0.00) to 0.39 (95% CI −0.02 to 0.81) for the NOS cohort and −0.20 (95% CI −0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case–control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test–retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were −0.19 (95% CI −0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case–control, the ICC(2

  20. Reliability and Validity of the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A)

    OpenAIRE

    Bervoets, Liene; Van Noten, Caroline; Van Roosbroeck, Sofie; Hansen, Dominique; Van Hoorenbeeck, Kim; Verheyen, Els; Van Hal, Guido; Vankerckhoven, Vanessa

    2014-01-01

    Background This study was designed to validate the Dutch Physical Activity Questionnaires for Children (PAQ-C) and Adolescents (PAQ-A). Methods After adjustment of the original Canadian PAQ-C and PAQ-A (i.e. translation/back-translation and evaluation by expert committee), content validity of both PAQs was assessed and calculated using item-level (I-CVI) and scale-level (S-CVI) content validity indexes. Inter-item and inter-rater reliability of 196 PAQ-C and 95 PAQ-A filled in by both childre...

  1. Evaluation of the Validity and Reliability of the Waterlow Pressure Ulcer Risk Assessment Scale.

    Science.gov (United States)

    Charalambous, Charalambos; Koulori, Agoritsa; Vasilopoulos, Aristidis; Roupa, Zoe

    2018-04-01

    Prevention is the ideal strategy to tackle the problem of pressure ulcers. Pressure ulcer risk assessment scales are one of the most pivotal measures applied to tackle the problem, much criticisms has been developed regarding the validity and reliability of these scales. To investigate the validity and reliability of the Waterlow pressure ulcer risk assessment scale. The methodology used is a narrative literature review, the bibliography was reviewed through Cinahl, Pubmed, EBSCO, Medline and Google scholar, 26 scientific articles where identified. The articles where chosen due to their direct correlation with the objective under study and their scientific relevance. The construct and face validity of the Waterlow appears adequate, but with regards to content validity changes in the category age and gender can be beneficial. The concurrent validity cannot be assessed. The predictive validity of the Waterlow is characterized by high specificity and low sensitivity. The inter-rater reliability has been demonstrated to be inadequate, this may be due to lack of clear definitions within the categories and differentiating level of knowledge between the users. Due to the limitations presented regarding the validity and reliability of the Waterlow pressure ulcer risk assessment scale, the scale should be used in conjunction with clinical assessment to provide optimum results.

  2. Evaluation of the Validity and Reliability of the Waterlow Pressure Ulcer Risk Assessment Scale

    Science.gov (United States)

    Charalambous, Charalambos; Koulori, Agoritsa; Vasilopoulos, Aristidis; Roupa, Zoe

    2018-01-01

    Introduction Prevention is the ideal strategy to tackle the problem of pressure ulcers. Pressure ulcer risk assessment scales are one of the most pivotal measures applied to tackle the problem, much criticisms has been developed regarding the validity and reliability of these scales. Objective To investigate the validity and reliability of the Waterlow pressure ulcer risk assessment scale. Method The methodology used is a narrative literature review, the bibliography was reviewed through Cinahl, Pubmed, EBSCO, Medline and Google scholar, 26 scientific articles where identified. The articles where chosen due to their direct correlation with the objective under study and their scientific relevance. Results The construct and face validity of the Waterlow appears adequate, but with regards to content validity changes in the category age and gender can be beneficial. The concurrent validity cannot be assessed. The predictive validity of the Waterlow is characterized by high specificity and low sensitivity. The inter-rater reliability has been demonstrated to be inadequate, this may be due to lack of clear definitions within the categories and differentiating level of knowledge between the users. Conclusion Due to the limitations presented regarding the validity and reliability of the Waterlow pressure ulcer risk assessment scale, the scale should be used in conjunction with clinical assessment to provide optimum results. PMID:29736104

  3. The reliability, minimal detectable change and concurrent validity of a gravity-based bubble inclinometer and iphone application for measuring standing lumbar lordosis.

    Science.gov (United States)

    Salamh, Paul A; Kolber, Morey

    2014-01-01

    To investigate the reliability, minimal detectable change (MDC90) and concurrent validity of a gravity-based bubble inclinometer (inclinometer) and iPhone® application for measuring standing lumbar lordosis. Two investigators used both an inclinometer and an iPhone® with an inclinometer application to measure lumbar lordosis of 30 asymptomatic participants. ICC models 3,k and 2,k were used for the intrarater and interrater analysis, respectively. Good interrater and intrarater reliability was present for the inclinometer with Intraclass Correlation Coefficients (ICC) of 0.90 and 0.85, respectively and the iPhone® application with ICC values of 0.96 and 0.81. The minimal detectable change (MDC90) indicates that a change greater than or equal to 7° and 6° is needed to exceed the threshold of error using the iPhone® and inclinometer, respectively. The concurrent validity between the two instruments was good with a Pearson product-moment coefficient of correlation (r) of 0.86 for both raters. Ninety-five percent limits of agreement identified differences ranging from 9° greater in regards to the iPhone® to 8° less regarding the inclinometer. Both the inclinometer and iPhone® application possess good interrater reliability, intrarater reliability and concurrent validity for measuring standing lumbar lordosis. This investigation provides preliminary evidence to suggest that smart phone applications may offer clinical utility comparable to inclinometry for quantifying standing lumbar lordosis. Clinicians should recognize potential individual differences when using these devices interchangeably.

  4. The reliability and concurrent validity of measurements used to quantify lumbar spine mobility: an analysis of an iphone® application and gravity based inclinometry.

    Science.gov (United States)

    Kolber, Morey J; Pizzini, Matias; Robinson, Ashley; Yanez, Dania; Hanney, William J

    2013-04-01

    PURPOSEAIM: This purpose of this study was to investigate the reliability, minimal detectable change (MDC), and concurrent validity of active spinal mobility measurements using a gravity-based bubble inclinometer and iPhone® application. MATERIALSMETHODS: Two investigators each used a bubble inclinometer and an iPhone® with inclinometer application to measure total thoracolumbo-pelvic flexion, isolated lumbar flexion, total thoracolumbo-pelvic extension, and thoracolumbar lateral flexion in 30 asymptomatic participants using a blinded repeated measures design. The procedures used in this investigation for measuring spinal mobility yielded good intrarater and interrater reliability with Intraclass Correlation Coefficients (ICC) for bubble inclinometry ≥ 0.81 and the iPhone® ≥ 0.80. The MDC90 for the interrater analysis ranged from 4° to 9°. The concurrent validity between bubble inclinometry and the iPhone® application was good with ICC values of ≥ 0.86. The 95% level of agreement indicates that although these measuring instruments are equivalent individual differences of up to 18° may exist when using these devices interchangeably. The bubble inclinometer and iPhone® possess good intrarater and interrater reliability as well as concurrent validity when strict measurement procedures are adhered to. This study provides preliminary evidence to suggest that smart phone applications may offer clinical utility comparable to inclinometry for quantifying spinal mobility. Clinicians should be aware of the potential disagreement when using these devices interchangeably. 2b (Observational study of reliability).

  5. Interrater reliability of the Saint-Anne Dargassies Scale in assessing the neurological patterns of healthy preterm newborns

    Directory of Open Access Journals (Sweden)

    Carla Ismirna Santos Alves

    Full Text Available Abstract Objectives: to assess the interrater reliability of the Saint-Anne Dargassies Scale in assessing neurological patterns of healthy preterm newborns. Methods: twenty preterm newborns met the inclusion criteria for participation in this prospective study. The neurologic examination was performed using the Saint-Anne Dargassies Scale, showing normal serial cranial ultrasound examination. In order to test the reliability, the study was structured as follows: group I (rater 1/physiotherapist; rater 2/neonatologist; group II (rater 3/physiotherapist; rater 4/child neurologist and the gold standard (expert and professor in pediatric neurology. Results: high interrater agreement was observed between groups I - II compared with the gold standard in assessing postural pattern (p<0.01. Regarding the assessment ofprimitive reflexes, greater agreement was observed in the evaluation of palmar grasp reflex and Moro reflex (p< 0.01 for group I compared with the gold standard. An analysis of tone demonstrated heterogeneous agreement, without compromising the reliability of the scale. The probability of equality between measurements of head circumference in the two groups, compared with the gold standard, was observed. Conclusions: the Saint-Anne Dargassies Scale demonstrated high reliability and homogeneity with significant power of reproducibility and may be capable to identify preterm newborns suspected of having neurological deficits.

  6. Content validity and reliability of test of gross motor development in Chilean children

    Directory of Open Access Journals (Sweden)

    Marcelo Cano-Cappellacci

    2015-01-01

    Full Text Available ABSTRACT OBJECTIVE To validate a Spanish version of the Test of Gross Motor Development (TGMD-2 for the Chilean population. METHODS Descriptive, transversal, non-experimental validity and reliability study. Four translators, three experts and 92 Chilean children, from five to 10 years, students from a primary school in Santiago, Chile, have participated. The Committee of Experts has carried out translation, back-translation and revision processes to determine the translinguistic equivalence and content validity of the test, using the content validity index in 2013. In addition, a pilot implementation was achieved to determine test reliability in Spanish, by using the intraclass correlation coefficient and Bland-Altman method. We evaluated whether the results presented significant differences by replacing the bat with a racket, using T-test. RESULTS We obtained a content validity index higher than 0.80 for language clarity and relevance of the TGMD-2 for children. There were significant differences in the object control subtest when comparing the results with bat and racket. The intraclass correlation coefficient for reliability inter-rater, intra-rater and test-retest reliability was greater than 0.80 in all cases. CONCLUSIONS The TGMD-2 has appropriate content validity to be applied in the Chilean population. The reliability of this test is within the appropriate parameters and its use could be recommended in this population after the establishment of normative data, setting a further precedent for the validation in other Latin American countries.

  7. Inter-rater reliability of the evaluation of muscular chains associated with posture alterations in scoliosis

    Directory of Open Access Journals (Sweden)

    Fortin Carole

    2012-05-01

    Full Text Available Abstract Background In the Global postural re-education (GPR evaluation, posture alterations are associated with anterior or posterior muscular chain impairments. Our goal was to assess the reliability of the GPR muscular chain evaluation. Methods Design: Inter-rater reliability study. Fifty physical therapists (PTs and two experts trained in GPR assessed the standing posture from photographs of five youths with idiopathic scoliosis using a posture analysis grid with 23 posture indices (PI. The PTs and experts indicated the muscular chain associated with posture alterations. The PTs were also divided into three groups according to their experience in GPR. Experts’ results (after consensus were used to verify agreement between PTs and experts for muscular chain and posture assessments. We used Kappa coefficients (K and the percentage of agreement (%A to assess inter-rater reliability and intra-class coefficients (ICC for determining agreement between PTs and experts. Results For the muscular chain evaluation, reliability was moderate to substantial for 12 PI for the PTs (%A: 56 to 82; K: 0.42 to 0.76 and perfect for 19 PI for the experts. For posture assessment, reliability was moderate to substantial for 12 PI for the PTs (%A > 60%; K: 0.42 to 0.75 and moderate to perfect for 18 PI for the experts (%A: 80 to 100; K: 0.55 to 1.00. The agreement between PTs and experts was good for most muscular chain evaluations (18 PI; ICC: 0.82 to 0.99 and PI (19 PI; ICC: 0.78 to 1.00. Conclusions The GPR muscular chain evaluation has good reliability for most posture indices. GPR evaluation should help guide physical therapists in targeting affected muscles for treatment of abnormal posture patterns.

  8. 4-Meter Gait Speed Test in Chronic Obstructive Pulmonary Disease: INTERRATER RELIABILITY USING A STOPWATCH.

    Science.gov (United States)

    Bisca, Gianna Waldrich; Fava, Lucas Rodrigues; Morita, Andrea Akemi; Machado, Felipe Vilaça Cavallari; Pitta, Fabio; Hernandes, Nidia Aparecida

    2017-12-14

    4-meter gait speed (4MGS) is increasingly used to assess functional performance in patients with chronic obstructive pulmonary disease. However, the current literature lacks information regarding some technical standards for this test. Therefore, the purpose of this study was to compare and to evaluate the interrater reliability between a stopwatch and video recording used as timing systems for the 4MGS in patients with chronic obstructive pulmonary disease, as well as to verify the interrater reliability between 2 observers measuring the 4MGS time using a manual stopwatch. Fifty-one patients performed the 4MGS using 4 different protocols (random order): walking at the usual and maximum speed in a 4-meter course and walking at the same 2 speeds on an 8-m course using a 2-m acceleration zone, a 4-meter timing area, and a 2-m deceleration zone. Gait speed was measured simultaneously using a stopwatch and a video recording. In a subanalysis (n = 24), 2 independent observers timed the 4MGS using a stopwatch. There was no significant difference in comparison between the 2 timing methods (P > .05 for all), and the reliability between video recording and stopwatch was excellent in all 4MGS studied protocols (intraclass correlation coefficient ≥ 0.91). Moreover, when comparing gait speed measured by 2 observers using a stopwatch, no significant difference was found among all proposed protocols (P > .05 for all), and there was also excellent reliability between the 2 independent observers (intraclass correlation coefficient ≥ 0.94). The stopwatch, a low-cost and feasible tool, is reliable as a timing device for the 4MGS in patients with chronic obstructive pulmonary disease.

  9. "An Investigation Into The Interrater Reliability Of The Modified Ashworth Scale In The Assessment Of Muscle Spasticity In Hemiplegic Patients "

    Directory of Open Access Journals (Sweden)

    N. Nokhostin-Ansari

    2006-06-01

    Full Text Available Background and Aim: Spasticity is a velocity-dependent increase in tonic stretch reflexes (muscle tone with exaggerated tendon jerks, resulting from hyperexcitability of the stretch reflex. The measurement of spasticity is necessary to determine the effect of treatments. The Modified Ashworth Scale is the most widely used method for assessing muscle spasticity in clinical practice and research. The purpose of this study was to investigate the interrater reliability of Modified Ashworth Scale in hemiplegic patients. Materials and Methods: Thirty subjects (16 males, 14 females with a mean age of 59.40 (SD =14.013 recruited. Shoulder adductor , elbow flexor , wrist dorsiflexor , hip adductor , knee extensor and ankle plantarflexor on the hemiplegic side were tested by two physiotherapists. Results: In the upper limb, the interrater reliability for shoulder adductor and elbow flexor muscles was fair (0.372 and 0.369, respectively. The reliability for the wrist flexors was good (0.612. The difference in Kappa value for the proximal muscle (shoulder adductor; 0.372 and the distal muscle (wrist flexor; 0.612 was significant (²X=33.87, df=1, p0.05. The mean value for the upper limb (0.505 and the lower limb (0,.516 was not significantly different (²X=0.1407, df=1, p>0.05. Conclusion: The interrater reliability of Modified Ashworth Scale was not good . The limb, upper or lower, had no significant effect on the reliability. In the upper limb, the reliability for the proximal and distal muscle was significantly different. However. The difference in the lower limb was not significant.When using the scale, one should consider it's limitation.

  10. Is the Bayley Scales of Infant and Toddler Developmental Screening Test, Valid and Reliable for Persian Speaking Children?

    Science.gov (United States)

    Soleimani, Farin; Azari, Nadia; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud

    2016-10-01

    Advances in perinatal and neonatal care have substantially improved the survival of at-risk infants over the past two decades. The purpose of this study was to assess the reliability and validity of the Bayley Scales of infant and toddler developmental Screening test in Persian-speaking children. This was a cross-sectional prospective study of 403 children aged 1 - 42-months. The Bayley scales screening instrument, which consists of five domains (cognitive, receptive, and expressive communication and fine and gross motor items), was used to measure infants' and toddlers' development. The psychometric properties examined included the face and content validity of the scale, in addition to cultural and linguistic modifications to the scale and its test-retest and inter-rater reliability. An expert team changed some of the test items relating to cultural and linguistic issues. In almost all the age groups, cultural or linguistic changes were made to items in the communication domains. According to Cronbach's alpha for internal consistency, the reliability of the cognitive scale was r = 0.79, and the reliability of the receptive scale was r = 0.76. The reliability for expressive communication, fine motor, and gross motor scales was r = 0.81, r = 0.80, and r = 0.81, respectively. The construct validity of the tests was confirmed using a factor analysis and comparison of the mean scores of the age groups. The intra- and inter-rater reliabilities of the Bayley Scales were good-to-excellent. The results indicated that the Bayley Scales had a high level of reliability in the present study. Thus, the scale can be used in a Persian population.

  11. The impact of revised DSM-5 criteria on the relative distribution and inter-rater reliability of eating disorder diagnoses in a residential treatment setting.

    Science.gov (United States)

    Thomas, Jennifer J; Eddy, Kamryn T; Murray, Helen B; Tromp, Marilou D P; Hartmann, Andrea S; Stone, Melissa T; Levendusky, Philip G; Becker, Anne E

    2015-09-30

    This study evaluated the relative distribution and inter-rater reliability of revised DSM-5 criteria for eating disorders in a residential treatment program. Consecutive adolescent and young adult females (N=150) admitted to a residential eating disorder treatment facility were assigned both DSM-IV and DSM-5 diagnoses by a clinician (n=14) via routine clinical interview and a research assessor (n=4) via structured interview. We compared the frequency of diagnostic assignments under each taxonomy and by type of assessor. We evaluated concordance between clinician and researcher assignment through inter-rater reliability kappa and percent agreement. Significantly fewer patients received either clinician or researcher diagnoses of a residual eating disorder under DSM-5 (clinician-12.0%; researcher-31.3%) versus DSM-IV (clinician-28.7%; researcher-59.3%), with the majority of reassigned DSM-IV residual cases reclassified as DSM-5 anorexia nervosa. Researcher and clinician diagnoses showed moderate inter-rater reliability under DSM-IV (κ=.48) and DSM-5 (κ=.57), though agreement for specific DSM-5 other specified feeding or eating disorder (OSFED) presentations was poor (κ=.05). DSM-5 revisions were associated with significantly less frequent residual eating disorder diagnoses, but not with reduced inter-rater reliability. Findings support specific dimensions of clinical utility for revised DSM-5 criteria for eating disorders. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  12. The Communication Function Classification System: cultural adaptation, validity, and reliability of the Farsi version for patients with cerebral palsy.

    Science.gov (United States)

    Soleymani, Zahra; Joveini, Ghodsiye; Baghestani, Ahmad Reza

    2015-03-01

    This study developed a Farsi language Communication Function Classification System and then tested its reliability and validity. Communication Function Classification System is designed to classify the communication functions of individuals with cerebral palsy. Up until now, there has been no instrument for assessment of this communication function in Iran. The English Communication Function Classification System was translated into Farsi and cross-culturally modified by a panel of experts. Professionals and parents then assessed the content validity of the modified version. A backtranslation of the Farsi version was confirmed by the developer of the English Communication Function Classification System. Face validity was assessed by therapists and parents of 10 patients. The Farsi Communication Function Classification System was administered to 152 individuals with cerebral palsy (age, 2 to 18 years; median age, 10 years; mean age, 9.9 years; standard deviation, 4.3 years). Inter-rater reliability was analyzed between parents, occupational therapists, and speech and language pathologists. The test-retest reliability was assessed for 75 patients with a 14 day interval between tests. The inter-rater reliability of the Communication Function Classification System was 0.81 between speech and language pathologists and occupational therapists, 0.74 between parents and occupational therapists, and 0.88 between parents and speech and language pathologists. The test-retest reliability was 0.96 for occupational therapists, 0.98 for speech and language pathologists, and 0.94 for parents. The findings suggest that the Farsi version of Communication Function Classification System is a reliable and valid measure that can be used in clinical settings to assess communication function in patients with cerebral palsy. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Assessing physiotherapists' communication skills for promoting patient autonomy for self-management: reliability and validity of the communication evaluation in rehabilitation tool.

    Science.gov (United States)

    Murray, Aileen; Hall, Amanda; Williams, Geoffrey C; McDonough, Suzanne M; Ntoumanis, Nikos; Taylor, Ian; Jackson, Ben; Copsey, Bethan; Hurley, Deirdre A; Matthews, James

    2018-02-27

    To assess the inter-rater reliability and concurrent validity of the Communication Evaluation in Rehabilitation Tool, which aims to externally assess physiotherapists competency in using Self-Determination Theory-based communication strategies in practice. Audio recordings of initial consultations between 24 physiotherapists and 24 patients with chronic low back pain in four hospitals in Ireland were obtained as part of a larger randomised controlled trial. Three raters, all of whom had Ph.Ds in psychology and expertise in motivation and physical activity, independently listened to the 24 audio recordings and completed the 18-item Communication Evaluation in Rehabilitation Tool. Inter-rater reliability between all three raters was assessed using intraclass correlation coefficients. Concurrent validity was assessed using Pearson's r correlations with a reference standard, the Health Care Climate Questionnaire. The total score for the Communication Evaluation in Rehabilitation Tool is an average of all 18 items. Total scores demonstrated good inter-rater reliability (Intraclass Correlation Coefficient (ICC) = 0.8) and concurrent validity with the Health Care Climate Questionnaire total score (range: r = 0.7-0.88). Item-level scores of the Communication Evaluation in Rehabilitation Tool identified five items that need improvement. Results provide preliminary evidence to support future use and testing of the Communication Evaluation in Rehabilitation Tool. Implications for Rehabilitation Promoting patient autonomy is a learned skill and while interventions exist to train clinicians in these skills there are no tools to assess how well clinicians use these skills when interacting with a patient. The lack of robust assessment has severe implications regarding both the fidelity of clinician training packages and resulting outcomes for promoting patient autonomy. This study has developed a novel measurement tool Communication Evaluation in Rehabilitation Tool and a

  14. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

    Directory of Open Access Journals (Sweden)

    Kevin A. Hallgren

    2012-02-01

    Full Text Available Many research designs require the assessment of inter-rater reliability (IRR to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR.

  15. Reliability and validity of the German version of the Structured Interview of Personality Organization (STIPO)

    Science.gov (United States)

    2013-01-01

    Background The assessment of personality organization and its observable behavioral manifestations, i.e. personality functioning, has a long tradition in psychodynamic psychiatry. Recently, the DSM-5 Levels of Personality Functioning Scale has moved it into the focus of psychiatric diagnostics. Based on Kernberg’s concept of personality organization the Structured Interview of Personality Organization (STIPO) was developed for diagnosing personality functioning. The STIPO covers seven dimensions: (1) identity, (2) object relations, (3) primitive defenses, (4) coping/rigidity, (5) aggression, (6) moral values, and (7) reality testing and perceptual distortions. The English version of the STIPO has previously revealed satisfying psychometric properties. Methods Validity and reliability of the German version of the 100-item instrument have been evaluated in 122 psychiatric patients. All patients were diagnosed according to the Diagnostic and Statistical Manual for Mental Disorders (DSM-IV) and were assessed by means of the STIPO. Moreover, all patients completed eight questionnaires that served as criteria for external validity of the STIPO. Results Interrater reliability varied between intraclass correlations of .89 and 1.0, Crohnbach’s α for the seven dimensions was .69 to .93. All a priori selected questionnaire scales correlated significantly with the corresponding STIPO dimensions. Patients with personality disorder (PD) revealed significantly higher STIPO scores (i.e. worse personality functioning) than patients without PD; patients cluster B PD showed significantly higher STIPO scores than patients with cluster C PD. Conclusions Interrater reliability, Crohnbach’s α, concurrent validity, and differential validity of the STIPO are satisfying. The STIPO represents an appropriate instrument for the assessment of personality functioning in clinical and research settings. PMID:23941404

  16. Validity and reliability of a low-cost digital dynamometer for measuring isometric strength of lower limb.

    Science.gov (United States)

    Romero-Franco, Natalia; Jiménez-Reyes, Pedro; Montaño-Munuera, Juan A

    2017-11-01

    Lower limb isometric strength is a key parameter to monitor the training process or recognise muscle weakness and injury risk. However, valid and reliable methods to evaluate it often require high-cost tools. The aim of this study was to analyse the concurrent validity and reliability of a low-cost digital dynamometer for measuring isometric strength in lower limb. Eleven physically active and healthy participants performed maximal isometric strength for: flexion and extension of ankle, flexion and extension of knee, flexion, extension, adduction, abduction, internal and external rotation of hip. Data obtained by the digital dynamometer were compared with the isokinetic dynamometer to examine its concurrent validity. Data obtained by the digital dynamometer from 2 different evaluators and 2 different sessions were compared to examine its inter-rater and intra-rater reliability. Intra-class correlation (ICC) for validity was excellent in every movement (ICC > 0.9). Intra and inter-tester reliability was excellent for all the movements assessed (ICC > 0.75). The low-cost digital dynamometer demonstrated strong concurrent validity and excellent intra and inter-tester reliability for assessing isometric strength in the main lower limb movements.

  17. A Structured Clinical Interview for Kleptomania (SCI-K): preliminary validity and reliability testing.

    Science.gov (United States)

    Grant, Jon E; Kim, Suck Won; McCabe, James S

    2006-06-01

    Kleptomania presents difficulties in diagnosis for clinicians. This study aimed to develop and test a DSM-IV-based diagnostic instrument for kleptomania. To assess for current kleptomania the Structured Clinical Interview for Kleptomania (SCI-K) was administered to 112 consecutive subjects requesting psychiatric outpatient treatment for a variety of disorders. Reliability and validity were determined. Classification accuracy was examined using the longitudinal course of illness. The SCI-K demonstrated excellent test-retest (Phi coefficient = 0.956 (95% CI = 0.937, 0.970)) and inter-rater reliability (phi coefficient = 0.718 (95% CI = 0.506, 0.848)) in the diagnosis of kleptomania. Concurrent validity was observed with a self-report measure using DSM-IV kleptomania criteria (phi coefficient = 0.769 (95% CI = 0.653, 0.850)). Discriminant validity was observed with a measure of depression (point biserial coefficient = -0.020 (95% CI = -0.205, 0.166)). The SCI-K demonstrated both high sensitivity and specificity based on longitudinal assessment. The SCI-K demonstrated excellent reliability and validity in diagnosing kleptomania in subjects presenting with various psychiatric problems. These findings require replication in larger groups, including non-psychiatric populations, to examine their generalizability. Copyright (c) 2006 John Wiley & Sons, Ltd.

  18. Qualitative soil moisture assessment in semi-arid Africa - the role of experience and training on inter-rater reliability

    Science.gov (United States)

    Rinderer, M.; Komakech, H. C.; Müller, D.; Wiesenberg, G. L. B.; Seibert, J.

    2015-08-01

    Soil and water management is particularly relevant in semi-arid regions to enhance agricultural productivity. During periods of water scarcity, soil moisture differences are important indicators of the soil water deficit and are traditionally used for allocating water resources among farmers of a village community. Here we present a simple, inexpensive soil wetness classification scheme based on qualitative indicators which one can see or touch on the soil surface. It incorporates the local farmers' knowledge on the best soil moisture conditions for seeding and brick making in the semi-arid environment of the study site near Arusha, Tanzania. The scheme was tested twice in 2014 with farmers, students and experts (April: 40 persons, June: 25 persons) for inter-rater reliability, bias of individuals and functional relation between qualitative and quantitative soil moisture values. During the test in April farmers assigned the same wetness class in 46 % of all cases, while students and experts agreed on about 60 % of all cases. Students who had been trained in how to apply the method gained higher inter-rater reliability than their colleagues with only a basic introduction. When repeating the test in June, participants were given improved instructions, organized in small subgroups, which resulted in a higher inter-rater reliability among farmers. In 66 % of all classifications, farmers assigned the same wetness class and the spread of class assignments was smaller. This study demonstrates that a wetness classification scheme based on qualitative indicators is a robust tool and can be applied successfully regardless of experience in crop growing and education level when an in-depth introduction and training is provided. The use of a simple and clear layout of the assessment form is important for reliable wetness class assignments.

  19. Qualitative soil moisture assessment in semi-arid Africa: the role of experience and training on inter-rater reliability

    Science.gov (United States)

    Rinderer, M.; Komakech, H.; Müller, D.; Seibert, J.

    2015-03-01

    Soil and water management is particularly relevant in semi-arid regions to enhance agricultural productivity. During periods of water scarcity soil moisture differences are important indicators of the soil water deficit and are traditionally used for allocating water resources among farmers of a village community. Here we present a simple, inexpensive soil wetness classification scheme based on qualitative indicators which one can see or touch on the soil surface. It incorporates the local farmers' knowledge on the best soil moisture conditions for seeding and brick making in the semi-arid environment of the study site near Arusha, Tanzania. The scheme was tested twice in 2014 with farmers, students and experts (April: 40 persons, June: 25 persons) for inter-rater reliability, bias of individuals and functional relation between qualitative and quantitative soil moisture values. During the test in April farmers assigned the same wetness class in 46% of all cases while students and experts agreed in about 60% of all cases. Students who had been trained in how to apply the method gained higher inter-rater reliability than their colleagues with only a basic introduction. When repeating the test in June, participants were given improved instructions, organized in small sub-groups, which resulted in a higher inter-rater reliability among farmers. In 66% of all classifications farmers assigned the same wetness class and the spread of class assignments was smaller. This study demonstrates that a wetness classification scheme based on qualitative indicators is a robust tool and can be applied successfully regardless of experience in crop growing and education level when an in-depth introduction and training is provided. The use of a simple and clear layout of the assessment form is important for reliable wetness class assignments.

  20. Interrater and Test-Retest Reliability and Minimal Detectable Change of the Balance Evaluation Systems Test (BESTest) and Subsystems With Community-Dwelling Older Adults.

    Science.gov (United States)

    Wang-Hsu, Elizabeth; Smith, Susan S

    2017-01-10

    Falls are a common cause of injuries and hospital admissions in older adults. Balance limitation is a potentially modifiable factor contributing to falls. The Balance Evaluation Systems Test (BESTest), a clinical balance measure, categorizes balance into 6 underlying subsystems. Each of the subsystems is scored individually and summed to obtain a total score. The reliability of the BESTest and its individual subsystems has been reported in patients with various neurological disorders and cancer survivors. However, the reliability and minimal detectable change (MDC) of the BESTest with community-dwelling older adults have not been reported. The purposes of our study were to (1) determine the interrater and test-retest reliability of the BESTest total and subsystem scores; and (2) estimate the MDC of the BESTest and its individual subsystem scores with community-dwelling older adults. We used a prospective cohort methodological design. Community-dwelling older adults (N = 70; aged 70-94 years; mean = 85.0 [5.5] years) were recruited from a senior independent living community. Trained testers (N = 3) administered the BESTest. All participants were tested with the BESTest by the same tester initially and then retested 7 to 14 days later. With 32 of the participants, a second tester concurrently scored the retest for interrater reliability. Testers were blinded to each other's scores. Intraclass correlation coefficients [ICC(2,1)] were used to determine the interrater and test-retest reliability. Test-retest reliability was also analyzed using method error and the associated coefficients of variation (CVME). MDC was calculated using standard error of measurement. Interrater reliability (N = 32) of the BESTest total score was ICC(2, 1) = 0.97 (95% confidence interval [CI], 0.94-0.99). The ICCs for the individual subsystem scores ranged from 0.85 to 0.94. Test-retest reliability (N = 70) of the BESTest total score was ICC(2,1) = 0.93 (95% CI, 0.89-0.96). ICCs for the

  1. Establishment of the reliability and validity of the Stress Index for Children or Adolescents with Tourette Syndrome (SICATS).

    Science.gov (United States)

    Chao, Kuo-Yu; Wang, Huei-Shyong; Chang, Hsueh-Ling; Wang, Yi-Wen; See, Lai-Chu

    2010-02-01

    The aim of this study was to evaluate the validity and reliability of the stress index for 10-18-years-old children or adolescents with Tourette syndrome. Tourette syndrome is a chronic tic disorder, which occurs in childhood. Children with Tourette syndrome exhibit sudden and unexpected voices or movements that may have influence on their daily activities and cause interaction barriers for children with Tourette syndrome. Therefore, a self-report stress index is necessary for children with Tourette syndrome to quickly measure the stress they have. Eight experts rated appropriateness, comprehensiveness and relevance of the questionnaire to establish content validity. A total of 116 paediatric patients filled out the stress index for 10-18-years-old children or adolescents with Tourette syndrome to evaluate its construct validity using exploratory factor analysis and internal consistency. Data from 90 pairs of paediatric patients and their caregivers were used to evaluate the inter-rater reliability. The criterion validity index ranged from 80-98%. One item was deleted because of a small item-to-total correlation. Therefore, 26 items made up the final stress index for 10-18-years-old children or adolescents with Tourette syndrome. In exploratory factor analysis, four factors (unfairly treated, psychological, symptom control and future concern) were achieved and accounted for 52.3% of the total variance. Cronbach's alphas of the stress index for 10-18-years-old children or adolescents with Tourette syndrome were 0.89. The inter-rater reliability of stress Index for 10-18-years-old children or adolescents with Tourette syndrome (Pearson correlation coefficient between patients and their caregivers) was 0.56. The stress Index for 10-18-years-old children or adolescents with Tourette syndrome is a self-administered tool to assess the stress of children or adolescents with Tourette syndrome. Validity (content and construct) and reliability (internal consistency and inter-rater

  2. Interrater Reliability of the Categorization of Late Radiographic Changes After Lung Stereotactic Body Radiation Therapy

    Energy Technology Data Exchange (ETDEWEB)

    Faruqi, Salman [Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON (Canada); Giuliani, Meredith E., E-mail: meredith.giuliani@rmp.uhn.on.ca [Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON (Canada); Raziee, Hamid; Yap, Mei Ling [Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON (Canada); Roberts, Heidi [Department of Radiology, University Health Network, Toronto, Ontario (Canada); Le, Lisa W. [Department of Biostatistics, Princess Margaret Cancer Centre, Toronto, Ontario (Canada); Brade, Anthony; Cho, John; Sun, Alexander; Bezjak, Andrea; Hope, Andrew J. [Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, ON (Canada)

    2014-08-01

    Purpose: Radiographic changes after lung stereotactic body radiation therapy (SBRT) have been categorized into 4 groups: modified conventional pattern (A), mass-like fibrosis; (B), scar-like fibrosis (C), and no evidence of increased density (D). The purpose of this study was to assess the interrater reliability of this categorization system in patients with early-stage non-small cell lung cancer (NSCLC). Methods and Materials: Seventy-seven patients were included in this study, all treated with SBRT for early-stage (T1/2) NSCLC at a single institution, with a minimum follow-up of 6 months. Six experienced clinicians familiar with post-SBRT radiographic changes scored the serial posttreatment CT images independently in a blinded fashion. The proportion of patients categorized as A, B, C, or D at each interval was determined. Krippendorff's alpha (KA), Multirater kappa (M-kappa), and Gwet's AC1 (AC1) scores were used to establish interrater reliability. A leave-one-out analysis was performed to demonstrate the variability among raters. Interrater agreement of the first and last 20 patients scored was calculated to explore whether a training effect existed. Results: The number of ratings ranged from 450 at 6 months to 84 at 48 months of follow-up. The proportion of patients in each category was as follows: A, 45%; B, 16%; C, 13%; and D, 26%. KA and M-kappa ranged from 0.17 to 0.34. AC1 measure range was 0.22 to 0.48. KA increased from 0.24 to 0.36 at 12 months with training. The percent agreement for pattern A peaked at 12 month with a 54% chance of having >50% raters in agreement and decreased over time, whereas that for patterns B and C increased over time to a maximum of 20% and 22%, respectively. Conclusion: This post-SBRT radiographic change categorization system has modest interrater agreement, and there is a suggestion of a training effect. Patterns of fibrosis evolve after SBRT and alternative categorization systems should be evaluated.

  3. Test of gross motor development-2 for Filipino children with intellectual disability: validity and reliability.

    Science.gov (United States)

    Capio, Catherine M; Eguia, Kathlynne F; Simons, Johan

    2016-01-01

    This study aimed to examine aspects of validity and reliability of the Test of Gross Motor Development-2 (TGMD-2) in Filipino children with intellectual disability. Content and construct validity were verified, as well as inter-rater and intra-rater reliability. Two paediatric physiotherapists tested 81 children with intellectual disability (mean age = 9.29 ± 2.71 years) on locomotor and object control skills. Analysis of covariance, confirmatory factor analysis and analysis of variance were used to test validity, while Cronbach's alpha, intra-class correlation coefficients (ICC) and Bland-Altman plots were used to examine reliability. Age was a significant predictor of locomotor and object control scores (P = 0.004). The data fit the hypothesised two-factor model with fit indices as follows: χ(2) = 33.525, DF = 34, P = 0.491, χ(2)/DF = 0.986. As hypothesised, gender was a significant predictor for object control skills (P = 0.038). Participants' mean scores were significantly below mastery (locomotor, P intellectual disability.

  4. Assessment of Lower Limb Muscle Strength and Power Using Hand-Held and Fixed Dynamometry: A Reliability and Validity Study

    Science.gov (United States)

    Perraton, Luke G.; Bower, Kelly J.; Adair, Brooke; Pua, Yong-Hao; Williams, Gavin P.; McGaw, Rebekah

    2015-01-01

    Introduction Hand-held dynamometry (HHD) has never previously been used to examine isometric muscle power. Rate of force development (RFD) is often used for muscle power assessment, however no consensus currently exists on the most appropriate method of calculation. The aim of this study was to examine the reliability of different algorithms for RFD calculation and to examine the intra-rater, inter-rater, and inter-device reliability of HHD as well as the concurrent validity of HHD for the assessment of isometric lower limb muscle strength and power. Methods 30 healthy young adults (age: 23±5yrs, male: 15) were assessed on two sessions. Isometric muscle strength and power were measured using peak force and RFD respectively using two HHDs (Lafayette Model-01165 and Hoggan microFET2) and a criterion-reference KinCom dynamometer. Statistical analysis of reliability and validity comprised intraclass correlation coefficients (ICC), Pearson correlations, concordance correlations, standard error of measurement, and minimal detectable change. Results Comparison of RFD methods revealed that a peak 200ms moving window algorithm provided optimal reliability results. Intra-rater, inter-rater, and inter-device reliability analysis of peak force and RFD revealed mostly good to excellent reliability (coefficients ≥ 0.70) for all muscle groups. Concurrent validity analysis showed moderate to excellent relationships between HHD and fixed dynamometry for the hip and knee (ICCs ≥ 0.70) for both peak force and RFD, with mostly poor to good results shown for the ankle muscles (ICCs = 0.31–0.79). Conclusions Hand-held dynamometry has good to excellent reliability and validity for most measures of isometric lower limb strength and power in a healthy population, particularly for proximal muscle groups. To aid implementation we have created freely available software to extract these variables from data stored on the Lafayette device. Future research should examine the reliability

  5. Reliability and validity of the Performance Recorder 1 for measuring isometric knee flexor and extensor strength.

    Science.gov (United States)

    Neil, Sarah E; Myring, Alec; Peeters, Mon Jef; Pirie, Ian; Jacobs, Rachel; Hunt, Michael A; Garland, S Jayne; Campbell, Kristin L

    2013-11-01

    Muscular strength is a key parameter of rehabilitation programs and a strong predictor of functional capacity. Traditional methods to measure strength, such as manual muscle testing (MMT) and hand-held dynamometry (HHD), are limited by the strength and experience of the tester. The Performance Recorder 1 (PR1) is a strength assessment tool attached to resistance training equipment and may be a time- and cost-effective tool to measure strength in clinical practice that overcomes some limitations of MMT and HHD. However, reliability and validity of the PR1 have not been reported. Test-retest and inter-rater reliability was assessed using the PR1 in healthy adults (n  =  15) during isometric knee flexion and extension. Criterion-related validity was assessed through comparison of values obtained from the PR1 and Biodex® isokinetic dynamometer. Test-retest reliability was excellent for peak knee flexion (intra-class correlation coefficient [ICC] of 0.96, 95% CI: 0.85, 0.99) and knee extension (ICC  =  0.96, 95% CI: 0.87, 0.99). Inter-rater reliability was also excellent for peak knee flexion (ICC  =  0.95, 95% CI: 0.85, 0.99) and peak knee extension (ICC  =  0.97, 95% CI: 0.91, 0.99). Validity was moderate for peak knee flexion (ICC  =  0.75, 95% CI: 0.38, 0.92) but poor for peak knee extension (ICC  =  0.37, 95% CI: 0, 0.73). The PR1 provides a reliable measure of isometric knee flexor and extensor strength in healthy adults that could be used in the clinical setting, but absolute values may not be comparable to strength assessment by gold-standard measures.

  6. The Reliability and Predictive Validity of the Stalking Risk Profile.

    Science.gov (United States)

    McEwan, Troy E; Shea, Daniel E; Daffern, Michael; MacKenzie, Rachel D; Ogloff, James R P; Mullen, Paul E

    2018-03-01

    This study assessed the reliability and validity of the Stalking Risk Profile (SRP), a structured measure for assessing stalking risks. The SRP was administered at the point of assessment or retrospectively from file review for 241 adult stalkers (91% male) referred to a community-based forensic mental health service. Interrater reliability was high for stalker type, and moderate-to-substantial for risk judgments and domain scores. Evidence for predictive validity and discrimination between stalking recidivists and nonrecidivists for risk judgments depended on follow-up duration. Discrimination was moderate (area under the curve = 0.66-0.68) and positive and negative predictive values good over the full follow-up period ( Mdn = 170.43 weeks). At 6 months, discrimination was better than chance only for judgments related to stalking of new victims (area under the curve = 0.75); however, high-risk stalkers still reoffended against their original victim(s) 2 to 4 times as often as low-risk stalkers. Implications for the clinical utility and refinement of the SRP are discussed.

  7. Inter-rater reliability of the Sødring Motor Evaluation of Stroke patients (SMES).

    Science.gov (United States)

    Halsaa, K E; Sødring, K M; Bjelland, E; Finsrud, K; Bautz-Holter, E

    1999-12-01

    The Sødring Motor Evaluation of Stroke patients is an instrument for physiotherapists to evaluate motor function and activities in stroke patients. The rating reflects quality as well as quantity of the patient's unassisted performance within three domains: leg, arm and gross function. The inter-rater reliability of the method was studied in a sample of 30 patients admitted to a stroke rehabilitation unit. Three therapists were involved in the study; two therapists assessed the same patient on two consecutive days in a balanced design. Cohen's weighted kappa and McNemar's test of symmetry were used as measures of item reliability, and the intraclass correlation coefficient was used to express the reliability of the sumscores. For 24 out of 32 items the weighted kappa statistic was excellent (0.75-0.98), while 7 items had a kappa statistic within the range 0.53-0.74 (fair to good). The reliability of one item was poor (0.13). The intraclass correlation coefficient for the three sumscores was 0.97, 0.91 and 0.97. We conclude that the Sødring Motor Evaluation of Stroke patients is a reliable measure of motor function in stroke patients undergoing rehabilitation.

  8. Inter-rater agreement on PIVC-associated phlebitis signs, symptoms and scales.

    Science.gov (United States)

    Marsh, Nicole; Mihala, Gabor; Ray-Barruel, Gillian; Webster, Joan; Wallis, Marianne C; Rickard, Claire M

    2015-10-01

    Many peripheral intravenous catheter (PIVC) infusion phlebitis scales and definitions are used internationally, although no existing scale has demonstrated comprehensive reliability and validity. We examined inter-rater agreement between registered nurses on signs, symptoms and scales commonly used in phlebitis assessment. Seven PIVC-associated phlebitis signs/symptoms (pain, tenderness, swelling, erythema, palpable venous cord, purulent discharge and warmth) were observed daily by two raters (a research nurse and registered nurse). These data were modelled into phlebitis scores using 10 different tools. Proportions of agreement (e.g. positive, negative), observed and expected agreements, Cohen's kappa, the maximum achievable kappa, prevalence- and bias-adjusted kappa were calculated. Two hundred ten patients were recruited across three hospitals, with 247 sets of paired observations undertaken. The second rater was blinded to the first's findings. The Catney and Rittenberg scales were the most sensitive (phlebitis in >20% of observations), whereas the Curran, Lanbeck and Rickard scales were the most restrictive (≤2% phlebitis). Only tenderness and the Catney (one of pain, tenderness, erythema or palpable cord) and Rittenberg scales (one of erythema, swelling, tenderness or pain) had acceptable (more than two-thirds, 66.7%) levels of inter-rater agreement. Inter-rater agreement for phlebitis assessment signs/symptoms and scales is low. This likely contributes to the high degree of variability in phlebitis rates in literature. We recommend further research into assessment of infrequent signs/symptoms and the Catney or Rittenberg scales. New approaches to evaluating vein irritation that are valid, reliable and based on their ability to predict complications need exploration. © 2015 John Wiley & Sons, Ltd.

  9. Reliability and validity of CODA motion analysis system for measuring cervical range of motion in patients with cervical spondylosis and anterior cervical fusion.

    Science.gov (United States)

    Gao, Zhongyang; Song, Hui; Ren, Fenggang; Li, Yuhuan; Wang, Dong; He, Xijing

    2017-12-01

    The aim of the present study was to evaluate the reliability of the Cartesian Optoelectronic Dynamic Anthropometer (CODA) motion system in measuring the cervical range of motion (ROM) and verify the construct validity of the CODA motion system. A total of 26 patients with cervical spondylosis and 22 patients with anterior cervical fusion were enrolled and the CODA motion analysis system was used to measure the three-dimensional cervical ROM. Intra- and inter-rater reliability was assessed by interclass correlation coefficients (ICCs), standard error of measurement (SEm), Limits of Agreements (LOA) and minimal detectable change (MDC). Independent samples t-tests were performed to examine the differences of cervical ROM between cervical spondylosis and anterior cervical fusion patients. The results revealed that in the cervical spondylosis group, the reliability was almost perfect (intra-rater reliability: ICC, 0.87-0.95; LOA, -12.86-13.70; SEm, 2.97-4.58; inter-rater reliability: ICC, 0.84-0.95; LOA, -13.09-13.48; SEm, 3.13-4.32). In the anterior cervical fusion group, the reliability was high (intra-rater reliability: ICC, 0.88-0.97; LOA, -10.65-11.08; SEm, 2.10-3.77; inter-rater reliability: ICC, 0.86-0.96; LOA, -10.91-13.66; SEm, 2.20-4.45). The cervical ROM in the cervical spondylosis group was significantly higher than that in the anterior cervical fusion group in all directions except for left rotation. In conclusion, the CODA motion analysis system is highly reliable in measuring cervical ROM and the construct validity was verified, as the system was sufficiently sensitive to distinguish between the cervical spondylosis and anterior cervical fusion groups based on their ROM.

  10. The Irvine, Beatties, and Bresnahan (IBB) Forelimb Recovery Scale: An Assessment of Reliability and Validity

    Science.gov (United States)

    Irvine, Karen-Amanda; Ferguson, Adam R.; Mitchell, Kathleen D.; Beattie, Stephanie B.; Lin, Amity; Stuck, Ellen D.; Huie, J. Russell; Nielson, Jessica L.; Talbott, Jason F.; Inoue, Tomoo; Beattie, Michael S.; Bresnahan, Jacqueline C.

    2014-01-01

    The IBB scale is a recently developed forelimb scale for the assessment of fine control of the forelimb and digits after cervical spinal cord injury [SCI; (1)]. The present paper describes the assessment of inter-rater reliability and face, concurrent and construct validity of this scale following SCI. It demonstrates that the IBB is a reliable and valid scale that is sensitive to severity of SCI and to recovery over time. In addition, the IBB correlates with other outcome measures and is highly predictive of biological measures of tissue pathology. Multivariate analysis using principal component analysis (PCA) demonstrates that the IBB is highly predictive of the syndromic outcome after SCI (2), and is among the best predictors of bio-behavioral function, based on strong construct validity. Altogether, the data suggest that the IBB, especially in concert with other measures, is a reliable and valid tool for assessing neurological deficits in fine motor control of the distal forelimb, and represents a powerful addition to multivariate outcome batteries aimed at documenting recovery of function after cervical SCI in rats. PMID:25071704

  11. Assessing movement quality in persons with severe mental illness - Reliability and validity of the Body Awareness Scale Movement Quality and Experience.

    Science.gov (United States)

    Hedlund, Lena; Gyllensten, Amanda Lundvik; Waldegren, Tomas; Hansson, Lars

    2016-05-01

    Motor disturbances and disturbed self-recognition are common features that affect mobility in persons with schizophrenia spectrum disorder and bipolar disorder. Physiotherapists in Scandinavia assess and treat movement difficulties in persons with severe mental illness. The Body Awareness Scale Movement Quality and Experience (BAS MQ-E) is a new and shortened version of the commonly used Body Awareness Scale-Health (BAS-H). The purpose of this study was to investigate the inter-rater reliability and the concurrent validity of BAS MQ-E in persons with severe mental illness. The concurrent validity was examined by investigating the relationships between neurological soft signs, alexithymia, fatigue, anxiety, and mastery. Sixty-two persons with severe mental illness participated in the study. The results showed a satisfactory inter-rater reliability (n = 53) and a concurrent validity (n = 62) with neurological soft signs, especially cognitive and perceptual based signs. There was also a concurrent validity linked to physical fatigue and aspects of alexithymia. The scores of BAS MQ-E were in general higher for persons with schizophrenia compared to persons with other diagnoses within the schizophrenia spectrum disorders and bipolar disorder. The clinical implications are presented in the discussion.

  12. Computerized back postural assessment in physiotherapy practice: Intra-rater and inter-rater reliability of the MIDAS system.

    Science.gov (United States)

    McAlpine, R T; Bettany-Saltikov, J A; Warren, J G

    2009-01-01

    Assessment of spinal posture during physiotherapy practice is routine, yet few objective measures exist to this end. The Middlesbrough Integrated Digital Assessment System (MIDAS) is a low cost portable system able to record 3D information on posture. The purpose of this study was to assess both the intra-rater and inter-rater reliability of the MIDAS system. Twenty-five healthy subjects were recruited. A repeated measures design was used to record fifteen pre-palpated landmarks on the back of each subject. To limit the sources of variability, the principal researcher palpated the landmarks for each subject. Each of three raters took two measurements on each subject in a standardized upright posture. X (medio-lateral), Y (antero-posterior) and Z (height) landmark positions were recorded via a computer interface. Both intra-rater agreement (mean ICCs - rater 1 r=0.970, rater 2 r=0.965 and rater 3 r=0.965, pMIDAS demonstrated both high inter-rater and intra-rater reliability and provides an objective method for the assessment of posture in physiotherapy practice.

  13. Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department

    Directory of Open Access Journals (Sweden)

    Paul Walsh

    2014-11-01

    Full Text Available Objectives. To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so.Study Design and Setting. We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial ‘gestalt’ assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other’s assessment. Our primary analysis was graphical. We also calculated Cohen’s κ, Gwet’s agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement.Results. We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9–14.6, 99/159 (62% were boys and 22/159 (14% were admitted. Overall 118/159 (74% and 119/159 (75% were classified as well appearing on initial ‘gestalt’ impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet’s AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of ‘not ill appearing’ were more reliable than others.Conclusion. The inter-rater reliability of emergency providers’ assessment of overall clinical appearance was adequate when described graphically and by Gwet’s AC. Different summary statistics yield different results for the same dataset.

  14. Factor validity and reliability of the aberrant behavior checklist-community (ABC-C) in an Indian population with intellectual disability.

    Science.gov (United States)

    Lehotkay, R; Saraswathi Devi, T; Raju, M V R; Bada, P K; Nuti, S; Kempf, N; Carminati, G Galli

    2015-03-01

    In this study realised in collaboration with the department of psychology and parapsychology of Andhra University, validation of the Aberrant Behavior Checklist-Community (ABC-C) in Telugu, the official language of Andhra Pradesh, one of India's 28 states, was carried out. To assess the factor validity and reliability of this Telugu version, 120 participants with moderate to profound intellectual disability (94 men and 26 women, mean age 25.2, SD 7.1) were rated by the staff of the Lebenshilfe Institution for Mentally Handicapped in Visakhapatnam, Andhra Pradesh, India. Rating data were analysed with a confirmatory factor analysis. The internal consistency was estimated by Cronbach's alpha. To confirm the test-retest reliability, 50 participants were rated twice with an interval of 4 weeks, and 50 were rated by pairs of raters to assess inter-rater reliability. Confirmatory factor analysis revealed that the root mean square error of approximation (RMSEA) was equal to 0.06, the comparative fit index (CFI) was equal to 0.77, and the Tucker Lewis index (TLI) was equal to 0.77, which indicated that the model with five correlated factors had a good fit. Coefficient alpha ranged from 0.85 to 0.92 across the five subscales. Spearman's rank correlation coefficients for inter-rater reliability tests ranged from 0.65 to 0.75, and the correlations for test-retest reliability ranged from 0.58 to 0.76. All reliability coefficients were statistically significant (P reliability of Telugu version of the ABC-C evidenced factor validity and reliability comparable to the original English version and appears to be useful for assessing behaviour disorders in Indian people with intellectual disabilities. © 2014 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.

  15. Developing and validating the Communication Function Classification System for individuals with cerebral palsy

    Science.gov (United States)

    HIDECKER, MARY JO COOLEY; PANETH, NIGEL; ROSENBAUM, PETER L; KENT, RAYMOND D; LILLIE, JANET; EULENBERG, JOHN B; CHESTER, KEN; JOHNSON, BRENDA; MICHALSEN, LAUREN; EVATT, MORGAN; TAYLOR, KARA

    2011-01-01

    Aim The purpose of this study was to create and validate a Communication Function Classification System (CFCS) for children with cerebral palsy (CP) that can be used by a wide variety of individuals who are interested in CP. This paper reports the content validity, interrater reliability, and test–retest reliability of the CFCS for children with CP. Method An 11-member development team created comprehensive descriptions of the CFCS levels, and four nominal groups comprising 27 participants critiqued these levels. Within a Delphi survey, 112 participants commented on the clarity and usefulness of the CFCS. Interrater reliability was completed by 61 professionals and 68 parents/relatives who classified 69 children with CP aged 2 to 18 years. Test–retest reliability was completed by 48 professionals who allowed at least 2 weeks between classifications. The participants who assessed the CFCS were all relevant stakeholders: adults with CP, parents of children with CP, educators, occupational therapists, physical therapists, physicians, and speech–language pathologists. Results The interrater reliability of the CFCS was 0.66 between two professionals and 0.49 between a parent and a professional. Professional interrater reliability improved to 0.77 for classification of children older than 4 years. The test–retest reliability was 0.82. Interpretation The CFCS demonstrates content validity and shows very good test–retest reliability, good professional interrater reliability, and moderate parent–professional interrater reliability. Combining the CFCS with the Gross Motor Function Classification System and the Manual Ability Classification System contributes to a functional performance view of daily life for individuals with CP, in accordance with the World Health Organization’s International Classification of Functioning, Disability and Health. PMID:21707596

  16. The interrater and test-retest reliability of the Home Falls and Accidents Screening Tool (HOME FAST) in Malaysia: Using raters with a range of professional backgrounds.

    Science.gov (United States)

    Romli, Muhammad Hibatullah; Mackenzie, Lynette; Lovarini, Meryl; Tan, Maw Pin; Clemson, Lindy

    2017-06-01

    Falls can be a devastating issue for older people living in the community, including those living in Malaysia. Health professionals and community members have a responsibility to ensure that older people have a safe home environment to reduce the risk of falls. Using a standardised screening tool is beneficial to intervene early with this group. The Home Falls and Accidents Screening Tool (HOME FAST) should be considered for this purpose; however, its use in Malaysia has not been studied. Therefore, the aim of this study was to evaluate the interrater and test-retest reliability of the HOME FAST with multiple professionals in the Malaysian context. A cross-sectional design was used to evaluate interrater reliability where the HOME FAST was used simultaneously in the homes of older people by 2 raters and a prospective design was used to evaluate test-retest reliability with a separate group of older people at different times in their homes. Both studies took place in an urban area of Kuala Lumpur. Professionals from 9 professional backgrounds participated as raters in this study, and a group of 51 community older people were recruited for the interrater reliability study and another group of 30 for the test-retest reliability study. The overall agreement was moderate for interrater reliability and good for test-retest reliability. The HOME FAST was consistently rated by different professionals, and no bias was found among the multiple raters. The HOME FAST can be used with confidence by a variety of professionals across different settings. The HOME FAST can become a universal tool to screen for home hazards related to falls. © 2017 John Wiley & Sons, Ltd.

  17. Interrater reliability and accuracy of clinicians and trained research assistants performing prospective data collection in emergency department patients with potential acute coronary syndrome.

    Science.gov (United States)

    Cruz, Carlos O; Meshberg, Emily B; Shofer, Frances S; McCusker, Christine M; Chang, Anna Marie; Hollander, Judd E

    2009-07-01

    Clinical research requires high-quality data collection. Data collected at the emergency department evaluation is generally considered more precise than data collected through chart abstraction but is cumbersome and time consuming. We test whether trained research assistants without a medical background can obtain clinical research data as accurately as physicians. We hypothesize that they would be at least as accurate because they would not be distracted by clinical requirements. We conducted a prospective comparative study of 33 trained research assistants and 39 physicians (35 residents) to assess interrater reliability with respect to guideline-recommended clinical research data. Immediately after the research assistant and clinician evaluation, the data were compared by a tiebreaker third person who forced the patient to choose one of the 2 answers as the correct one when responses were discordant. Crude percentage agreement and interrater reliability were assessed (kappa statistic). One hundred forty-three patients were recruited (mean age 50.7 years; 47% female patients). Overall, the median agreement was 81% (interquartile range [IQR] 73% to 92%) and interrater reliability was fair (kappa value 0.36 [IQR 0.26 to 0.52]) but varied across categories of data: cardiac risk factors (median 86% [IQR 81% to 93%]; median 0.69 [IQR 0.62 to 0.83]), other cardiac history (median 93% [IQR 79% to 95%]; median 0.56 [IQR 0.29 to 0.77]), pain location (median 92% [IR 86% to 94%]; median 0.37 [IQR 0.25 to 0.29]), radiation (median 86% [IQR 85% to 87%]; median 0.37 [IQR 0.26 to 0.42]), quality (median 85% [IQR 75% to 94%]; median 0.29 [IQR 0.23 to 0.40]), and associated symptoms (median 74% [IQR 65% to 78%]; median 0.28 [IQR 0.20 to 0.40]). When discordant information was obtained, the research assistant was more often correct (median 64% [IQR 53% to 72%]). The relatively fair interrater reliability observed in our study is consistent with previous studies evaluating

  18. Reliability and cross-cultural validation of the Turkish version of Manual Ability Classification System (MACS) for children with cerebral palsy.

    Science.gov (United States)

    Akpinar, Pinar; Tezel, Canan G; Eliasson, Ann-Christin; Icagasioglu, Afitap

    2010-01-01

    To determine the reliability and cross-cultural validation of the Turkish translation of the Manual Ability Classification System (MACS) for children with cerebral palsy (CP) and to investigate the relation to gross motor function and other comorbidities. After the forward and backward translation procedures, inter-rater and test-retest reliability was assessed between parents, physiotherapists and physicians using the intra-class correlation coefficient (ICC). Children (N = 118, 4 to 18 years, mean age 9 years 4 months; 68 boys, 50 girls) with various types of CP were classified. Additional data on the Gross Motor Function Classification System (GMFCS), intellectual delay, visual acuity, and epilepsy were collected. The inter-rater reliability was high; the ICC ranged from 0.89 to 0.96 among different professionals and parents. Between two persons of the same profession it ranged from 0.97 to 0.98. For the test-retest reliability it ranged from 0.91 to 0.98. Total agreement between the GMFCS and the MACS occurred in only 45% of the children. The level of the MACS was found to correlate with the accompanying comorbidities, namely intellectual delay and epilepsy. The Turkish version of the MACS is found to be valid and reliable, and is suggested to be appropriate for the assessment of manual ability within the Turkish population.

  19. Intra- and interrater reliability of the Chicago Classification of achalasia subtypes in pediatric high-resolution esophageal manometry (HRM) recordings.

    Science.gov (United States)

    Singendonk, M M J; Rosen, R; Oors, J; Rommel, N; van Wijk, M P; Benninga, M A; Nurko, S; Omari, T I

    2017-11-01

    Subtyping achalasia by high-resolution manometry (HRM) is clinically relevant as response to therapy and prognosis have shown to vary accordingly. The aim of this study was to assess inter- and intrarater reliability of diagnosing achalasia and achalasia subtyping in children using the Chicago Classification (CC) V3.0. Six observers analyzed 40 pediatric HRM recordings (22 achalasia and 18 non-achalasia) twice by using dedicated analysis software (ManoView 3.0, Given Imaging, Los Angeles, CA, USA). Integrated relaxation pressure (IRP4s), distal contractile integral (DCI), intrabolus pressurization pattern (IBP), and distal latency (DL) were extracted and analyzed hierarchically. Cohen's κ (2 raters) and Fleiss' κ (>2 raters) and the intraclass correlation coefficient (ICC) were used for categorical and ordinal data, respectively. Based on the results of dedicated analysis software only, intra- and interrater reliability was excellent and moderate (κ=0.89 and κ=0.52, respectively) for differentiating achalasia from non-achalasia. For subtyping achalasia, reliability decreased to substantial and fair (κ=0.72 and κ=0.28, respectively). When observers were allowed to change the software-driven diagnosis according to their own interpretation of the manometric patterns, intra- and interrater reliability increased for diagnosing achalasia (κ=0.98 and κ=0.92, respectively) and for subtyping achalasia (κ=0.79 and κ=0.58, respectively). Intra- and interrater agreement for diagnosing achalasia when using HRM and the CC was very good to excellent when results of automated analysis software were interpreted by experienced observers. More variability was seen when relying solely on the software-driven diagnosis and for subtyping achalasia. Therefore, diagnosing and subtyping achalasia should be performed in pediatric motility centers with significant expertise. © 2017 John Wiley & Sons Ltd.

  20. Telepsychiatry clinical decision support system used by non-psychiatrists in remote areas: Validity & reliability of diagnostic module

    Science.gov (United States)

    Malhotra, Savita; Chakrabarti, Subho; Shah, Ruchita; Sharma, Minali; Sharma, Kanu Priya; Malhotra, Akanksha; Upadhyaya, Suneet K.; Margoob, Mushtaq A.; Maqbool, Dar; Jassal, Gopal D.

    2017-01-01

    Background & objectives: A knowledge-based, logically-linked online telepsychiatric decision support system for diagnosis and treatment of mental disorders was developed and validated. We evaluated diagnostic accuracy and reliability of the application at remote sites when used by non-psychiatrists who underwent a brief training in its use through video-conferencing. Methods: The study was conducted at a nodal telepsychiatry centre, and three geographically remote peripheral centres. The diagnostic tool of application had a screening followed by detailed criteria-wise diagnostic modules for 18 psychiatric disorders. A total of 100 consecutive consenting adult outpatients attending remote telepsychiatry centres were included. To assess inter-rater reliability, patients were interviewed face to face by non-specialists at remote sites using the application (active interviewer) and simultaneously on online application via video-conferencing by a passive assessor at nodal centre. Another interviewer at the nodal centre rated the patient using Mini-International Neuropsychiatric Interview (MINI) for diagnostic validation. Results: Screening sub-module had high sensitivity (80-100%), low positive predictive values (PPV) (0.10-0.71) but high negative predictive value (NPV) (0.97-1) for most disorders. For the diagnostic sub-modules, Cohen's kappa was >0.4 for all disorders, with kappa of 0.7-1.0 for most disorders. PPV and NPV were high for most disorders. Inter-rater agreement analysis revealed kappa >0.6 for all disorders. Interpretation & conclusions: Diagnostic tool showed acceptable to good validity and reliability when used by non-specialists at remote sites. Our findings show that diagnostic tool of the telepsychiatry application has potential to empower non-psychiatrist doctors and paramedics to diagnose psychiatric disorders accurately and reliably in remote sites. PMID:29265020

  1. Calf-raise senior: a new test for assessment of plantar flexor muscle strength in older adults: protocol, validity, and reliability.

    Science.gov (United States)

    André, Helô-Isa; Carnide, Filomena; Borja, Edgar; Ramalho, Fátima; Santos-Rocha, Rita; Veloso, António P

    2016-01-01

    This study aimed to develop a new field test protocol with a standardized measurement of strength and power in plantar flexor muscles targeted to functionally independent older adults, the calf-raise senior (CRS) test, and also evaluate its reliability and validity. Forty-one subjects aged 65 years and older of both sexes participated in five different cross-sectional studies: 1) pilot (n=12); 2) inter- and intrarater agreement (n=12); 3) construct (n=41); 4) criterion validity (n=33); and 5) test-retest reliability (n=41). Different motion parameters were compared in order to define a specifically designed protocol for seniors. Two raters evaluated each participant twice, and the results of the same individual were compared between raters and participants to assess the interrater and intrarater agreement. The validity and reliability studies involved three testing sessions that lasted 2 weeks, including a battery of functional fitness tests, CRS test in two occasions, accelerometry, and strength assessments in an isokinetic dynamometer. The CRS test presented an excellent test-retest reliability (intraclass correlation coefficient [ICC] =0.90, standard error of measurement =2.0) and interrater reliability (ICC =0.93-0.96), as well as a good intrarater agreement (ICC =0.79-0.84). Participants with better results in the CRS test were younger and presented higher levels of physical activity and functional fitness. A significant association between test results and all strength parameters (isometric, r =0.87, r 2 =0.75; isokinetic, r =0.86, r 2 =0.74; and rate of force development, r =0.77, r 2 =0.59) was shown. This study was successful in demonstrating that the CRS test can meet the scientific criteria of validity and reliability. The test can be a good indicator of ankle strength in older adults and proved to discriminate significantly between individuals with improved functionality and levels of physical activity.

  2. [Reliability and validity of the Braden Scale for predicting pressure sore risk].

    Science.gov (United States)

    Boes, C

    2000-12-01

    For more accurate and objective pressure sore risk assessment various risk assessment tools were developed mainly in the USA and Great Britain. The Braden Scale for Predicting Pressure Sore Risk is one such example. By means of a literature analysis of German and English texts referring to the Braden Scale the scientific control criteria reliability and validity will be traced and consequences for application of the scale in Germany will be demonstrated. Analysis of 4 reliability studies shows an exclusive focus on interrater reliability. Further, even though examination of 19 validity studies occurs in many different settings, such examination is limited to the criteria sensitivity and specificity (accuracy). The range of sensitivity and specificity level is 35-100%. The recommended cut off points rank in the field of 10 to 19 points. The studies prove to be not comparable with each other. Furthermore, distortions in these studies can be found which affect accuracy of the scale. The results of the here presented analysis show an insufficient proof for reliability and validity in the American studies. In Germany, the Braden scale has not yet been tested under scientific criteria. Such testing is needed before using the scale in different German settings. During the course of such testing, construction and study procedures of the American studies can be used as a basis as can the problems be identified in the analysis presented below.

  3. Reliable and valid assessment of competence in endoscopic ultrasonography and fine-needle aspiration for mediastinal staging of non-small cell lung cancer.

    Science.gov (United States)

    Konge, L; Vilmann, P; Clementsen, P; Annema, J T; Ringsted, C

    2012-10-01

    Fine-needle aspiration (FNA) guided by endoscopic ultrasonography (EUS) is important in mediastinal staging of non-small cell lung cancer (NSCLC). Training standards and implementation strategies of this technique are currently under discussion. The aim of this study was to explore the reliability and validity of a newly developed EUS Assessment Tool (EUSAT) designed to measure competence in EUS - FNA for mediastinal staging of NSCLC. A total of 30 patients with proven or suspected NSCLC underwent EUS - FNA for mediastinal staging by three trainees and three experienced physicians. Their performances were assessed prospectively by three experts in EUS under direct observation and again 2 months later in a blinded fashion using digital video-recordings. Based on the assessments, intra-rater reliability, inter-rater reliability, and construct validity were explored. The intra-rater reliability was good (Cronbach's α = 0.80), but comparison of results based on direct observations and blinded video-recordings indicated a significant bias favoring consultants (P = 0.022). Inter-rater reliability was very good (Cronbach's α = 0.93). However, one rater assessing five procedures or two raters each assessing four procedures were necessary to secure a generalizability coefficient of 0.80. The assessment tool demonstrated construct validity by discriminating between trainees and experienced physicians (P = 0.034). Competency in mediastinal staging of NSCLC using EUS and EUS - FNA can be assessed in a reliable and valid way using the EUSAT assessment tool. Measuring and defining competency and training requirements could improve EUS quality and benefit patient care. © Georg Thieme Verlag KG Stuttgart · New York.

  4. Children's Physical Activity While Gardening: Development of a Valid and Reliable Direct Observation Tool.

    Science.gov (United States)

    Myers, Beth M; Wells, Nancy M

    2015-04-01

    Gardens are a promising intervention to promote physical activity (PA) and foster health. However, because of the unique characteristics of gardening, no extant tool can capture PA, postures, and motions that take place in a garden. The Physical Activity Research and Assessment tool for Garden Observation (PARAGON) was developed to assess children's PA levels, tasks, postures, and motions, associations, and interactions while gardening. PARAGON uses momentary time sampling in which a trained observer watches a focal child for 15 seconds and then records behavior for 15 seconds. Sixty-five children (38 girls, 27 boys) at 4 elementary schools in New York State were observed over 8 days. During the observation, children simultaneously wore Actigraph GT3X+ accelerometers. The overall interrater reliability was 88% agreement, and Ebel was .97. Percent agreement values for activity level (93%), garden tasks (93%), motions (80%), associations (95%), and interactions (91%) also met acceptable criteria. Validity was established by previously validated PA codes and by expected convergent validity with accelerometry. PARAGON is a valid and reliable observation tool for assessing children's PA in the context of gardening.

  5. The validity and reliability of the diagnosis of hyperkinetic disorders in the Danish Psychiatric Central Research Registry

    DEFF Research Database (Denmark)

    Jensen, Christina Mohr; Vinkel Koch, S; Lauritsen, Marlene Briciet

    2016-01-01

    were used to validate the diagnosis. Patient files were systematically scored for the presence of ICD-10 criteria for HD and oppositional defiant disorder/conduct disorder (ODD/CD; F91). Further to this, an inter-rater reliability study was also conducted, whereby two experienced child and adolescent......OBJECTIVE: To validate the diagnosis of hyperkinetic disorders (HD) in the Danish Psychiatric Central Research Registry (DPCRR) for children and adolescents aged 4 to 15 given in the years 1995 to 2005. METHOD: From a total of 4568 participants, a representative random subsample of n=387 patients...... it was not possible to reach a conclusion for 5.1% of the cases, 3.8% of the diagnoses were registration errors, and in 4.3% of the files the diagnosis had to be rejected. Inter-rater agreement was high (κ=0.83, z=10.9, Pvalidity of hyperkinetic disorders, unspecified (F90.9) was lower and comorbid CD...

  6. Validity and reliability of global operative assessment of laparoscopic skills (GOALS) in novice trainees performing a laparoscopic cholecystectomy.

    Science.gov (United States)

    Kramp, Kelvin H; van Det, Marc J; Hoff, Christiaan; Lamme, Bas; Veeger, Nic J G M; Pierie, Jean-Pierre E N

    2015-01-01

    Global Operative Assessment of Laparoscopic Skills (GOALS) assessment has been designed to evaluate skills in laparoscopic surgery. A longitudinal blinded study of randomized video fragments was conducted to estimate the validity and reliability of GOALS in novice trainees. In total, 10 trainees each performed 6 consecutive laparoscopic cholecystectomies. Sixty procedures were recorded on video. Video fragments of (1) opening of the peritoneum; (2) dissection of Calot's triangle and achievement of critical view of safety; and (3) dissection of the gallbladder from the liver bed were blinded, randomized, and rated by 2 consultant surgeons using GOALS. Also, a grade was given for overall competence. The correlation of GOALS with live observation Objective Structured Assessment of Technical Skills (OSATS) scores was calculated. Construct validity was estimated using the Friedman 2-way analysis of variance by ranks and the Wilcoxon signed-rank test. The interrater reliability was calculated using the absolute and consistency agreement 2-way random-effects model intraclass correlation coefficient. A high correlation was found between mean GOALS score (r = 0.879, p = 0.021) and mean OSATS score. The GOALS score increased significantly across the 6 procedures (p = 0.002). The trainees performed significantly better on their sixth when compared with their first cholecystectomy (p = 0.004). The consistency agreement interrater reliability was 0.37 for the mean GOALS score (p = 0.002) and 0.55 for overall competence (p < 0.001) of the 3 video fragments. The validity observed in this randomized blinded longitudinal study supports the existing evidence that GOALS is a valid tool for assessment of novice trainees. A relatively low reliability was found in this study. Copyright © 2014 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.

  7. The reliability and validity of video analysis for the assessment of the clinical signs of concussion in Australian football.

    Science.gov (United States)

    Makdissi, Michael; Davis, Gavin

    2016-10-01

    The objective of this study was to determine the reliability and validity of identifying clinical signs of concussion using video analysis in Australian football. Prospective cohort study. All impacts and collisions potentially resulting in a concussion were identified during 2012 and 2013 Australian Football League seasons. Consensus definitions were developed for clinical signs associated with concussion. For intra- and inter-rater reliability analysis, two experienced clinicians independently assessed 102 randomly selected videos on two occasions. Sensitivity, specificity, positive and negative predictive values were calculated based on the diagnosis provided by team medical staff. 212 incidents resulting in possible concussion were identified in 414 Australian Football League games. The intra-rater reliability of the video-based identification of signs associated with concussion was good to excellent. Inter-rater reliability was good to excellent for impact seizure, slow to get up, motor incoordination, ragdoll appearance (2 of 4 analyses), clutching at head and facial injury. Inter-rater reliability for loss of responsiveness and blank and vacant look was only fair and did not reach statistical significance. The feature with the highest sensitivity was slow to get up (87%), but this sign had a low specificity (19%). Other video signs had a high specificity but low sensitivity. Blank and vacant look (100%) and motor incoordination (81%) had the highest positive predictive value. Video analysis may be a useful adjunct to the side-line assessment of a possible concussion. Video analysis however should not replace the need for a thorough multimodal clinical assessment. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  8. Inter-rater reliability of three musculoskeletal physical examination techniques used to assess motion in three planes while standing.

    Science.gov (United States)

    Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John

    2009-07-01

    The objective of the study was to measure the reliability between examiners of 3 basic maneuvers of the Total Body Functional Profile physical examination test. The hypothesis was musculoskeletal health care providers of different disciplines could reliably use the 3 basic maneuvers as part of the musculoskeletal physical examination. A prospective observational study was conducted. Twenty-eight adult volunteers were measured on both the left and right side by 2 independent raters on a single occasion. The subjects were recruited through advertisements placed by the orthopedic department at a tertiary university. Twenty-eight volunteers were recruited and completed the study. The volunteers were between the ages of 18 and 51 years of age, had no symptoms in the lower extremity or spine, had no previous history of surgery or tumor involving the lower extremity, and no medical conditions that would preclude participation. On a single occasion, 2 examiners per 1 volunteer were blinded to their own and each others' measurements. Each examiner assessed the distance of frontal and sagittal plane lunge and angle of motion for transverse plane testing. Inter-rater agreement is expressed with intraclass correlation coefficients (ICCs) and corresponding 95% confidence intervals (CIs). The difference between raters is reported with 95% CIs. Baseline demographics, University of California Los Angeles (UCLA), and Harris hip questionnaires were completed by all participants. The UCLA and Harris hip scores showed no significant activity restrictions or pain limitations in all participants. The inter-rater reliability for sagittal, frontal, and transverse plane matrix testing was good with ICCs of 0.86 (95% CI 0.77-0.91), 0.90 (95% CI 0.84-0.94), and 0.85 (95% CI 0.75-0.91), respectively. The rater reliability between disciplines for transverse, sagittal, and frontal plane matrix testing was good with ICCs of 0.89 (95% CI 0.80-0.94), 0.88 (95% CI 0.79-0.94), and 0.90 (95% CI 0

  9. Content Validity and Inter-Rater Reliability of the Halliwick-Concept-Based Instrument "Swimming with Independent Measure"

    Science.gov (United States)

    Srsen, Katja Groleger; Vidmar, Gaj; Pikl, Masa; Vrecar, Irena; Burja, Cirila; Krusec, Klavdija

    2012-01-01

    The Halliwick concept is widely used in different settings to promote joyful movement in water and swimming. To assess the swimming skills and progression of an individual swimmer, a valid and reliable measure should be used. The Halliwick-concept-based Swimming with Independent Measure (SWIM) was introduced for this purpose. We aimed to determine…

  10. Reliable and Valid Assessment of Clinical Bronchoscopy Performance

    DEFF Research Database (Denmark)

    Konge, Lars; Larsen, Klaus Richter; Clementsen, Paul

    2012-01-01

    : The interrater reliability was high, with Cronbach's a = 0.86. Assessment of 3 bronchoscopies by a single rater had a generalizability coefficient of 0.84. The correlation between experience and performance was good (Pearson correlation = 0.76). There were significant differences between the groups for all...

  11. Reliability and validity of a novel tool to comprehensively assess food and beverage marketing in recreational sport settings.

    Science.gov (United States)

    Prowse, Rachel J L; Naylor, Patti-Jean; Olstad, Dana Lee; Carson, Valerie; Mâsse, Louise C; Storey, Kate; Kirk, Sara F L; Raine, Kim D

    2018-05-31

    Current methods for evaluating food marketing to children often study a single marketing channel or approach. As the World Health Organization urges the removal of unhealthy food marketing in children's settings, methods that comprehensively explore the exposure and power of food marketing within a setting from multiple marketing channels and approaches are needed. The purpose of this study was to test the inter-rater reliability and the validity of a novel settings-based food marketing audit tool. The Food and beverage Marketing Assessment Tool for Settings (FoodMATS) was developed and its psychometric properties evaluated in five public recreation and sport facilities (sites) and subsequently used in 51 sites across Canada for a cross-sectional analysis of food marketing. Raters recorded the count of food marketing occasions, presence of child-targeted and sports-related marketing techniques, and the physical size of marketing occasions. Marketing occasions were classified by healthfulness. Inter-rater reliability was tested using Cohen's kappa (κ) and intra-class correlations (ICC). FoodMATS scores for each site were calculated using an algorithm that represented the theoretical impact of the marketing environment on food preferences, purchases, and consumption. Higher FoodMATS scores represented sites with higher exposure to, and more powerful (unhealthy, child-targeted, sports-related, large) food marketing. Validity of the scoring algorithm was tested through (1) Pearson's correlations between FoodMATS scores and facility sponsorship dollars, and (2) sequential multiple regression for predicting "Least Healthy" food sales from FoodMATS scores. Inter-rater reliability was very good to excellent (κ = 0.88-1.00, p marketing in recreation facilities, the FoodMATS provides a novel means to comprehensively track changes in food marketing environments that can assist in developing and monitoring the impact of policies and interventions.

  12. A clinician-administered severity rating scale for illness anxiety: development, reliability, and validity of the H-YBOCS-M.

    Science.gov (United States)

    Skritskaya, Natalia A; Carson-Wong, Amanda R; Moeller, James R; Shen, Sa; Barsky, Arthur J; Fallon, Brian A

    2012-07-01

    Clinician-administered measures to assess severity of illness anxiety and response to treatment are few. The authors evaluated a modified version of the hypochondriasis-Y-BOCS (H-YBOCS-M), a 19-item, semistructured, clinician-administered instrument designed to rate severity of illness-related thoughts, behaviors, and avoidance. The scale was administered to 195 treatment-seeking adults with DSM-IV hypochondriasis. Test-retest reliability was assessed in a subsample of 20 patients. Interrater reliability was assessed by 27 interviews independently rated by four raters. Sensitivity to change was evaluated in a subsample of 149 patients. Convergent and discriminant validity was examined by comparing H-YBOCS-M scores to other measures administered. Item clustering was examined with confirmatory and exploratory factor analyses. The H-YBOCS-M demonstrated good internal consistency, interrater and test-retest reliability, and sensitivity to symptom change with treatment. Construct validity was supported by significant higher correlations with scores on other measures of hypochondriasis than with nonhypochondriacal measures. Improvement over time in response to treatment correlated with improvement both on measures of hypochondriasis and on measures of somatization, depression, anxiety, and functional status. Confirmatory factor analysis did not show adequate fit for a three-factor model. Exploratory factor analysis revealed a five-factor solution with the first two factors consistent with the separation of the H-YBOCS-M items into the subscales of illness-related avoidance and compulsions. H-YBOCS-M appears to be valid, reliable, and appropriate as an outcome measure for treatment studies of illness anxiety. Study results highlight "avoidance" as a key feature of illness anxiety-with potentially important nosologic and treatment implications. © 2012 Wiley Periodicals, Inc.

  13. A Spanish validation of the Coma Recovery Scale-Revised (CRS-R).

    Science.gov (United States)

    Tamashiro, Mercedes; Rivas, Maria Elisa; Ron, Melania; Salierno, Fernando; Dalera, Marisol; Olmos, Lisandro

    2014-01-01

    Analysis of inter-rater reliability and concurrent validity. To determine measurement properties of a Spanish version of The Coma Recovery Scale-Revised (CRS-R). A sample of 35 in-patients with severe acquired brain injury. To test concurrent validity of the translated scale, the Glasgow Coma Scale (GSC) and Disability Rating Scale (DRS) were also administered. Two experts in the field were recruited to assess inter-rater agreement. Inter-rater reliability was good for total CRS-R scores (Cronbach α = 0.973, p = 0.001). Sub-scale analysis showed moderate-to-high inter-rater agreement. Total CRS-R scores correlated significantly (p < 0.05) with total GCS (r = 0.74) and DRS (r = 0.54) scores, indicating acceptable concurrent validity. The Spanish version of CRS-R can be administered reliably by trained and experienced examiners. CRS-R appears capable of differentiating patients in Emergence from Minimally Conscious State (EMCS) or in Minimally Conscious State (MCS) from those in a Vegetative State (VS).

  14. Intra- and inter-rater reliability of 3D passive intervertebral motion in subjects with nonspecific neck pain assessed by physical therapy students: A pilot study.

    Science.gov (United States)

    Rossettini, Giacomo; Rondoni, Angie; Lovato, Tommaso; Strobe, Marco; Verzè, Elisa; Vicentini, Marco; Testa, Marco

    2016-06-03

    Passive Intervertebral Movements (PIVMs) are commonly used to assess and treat patients with nonspecific neck pain. Only very few studies have investigated 3D movements until now. This study assessed intra- and inter-rater reliability of three-dimensional (3D) cervical PIVMs performed by physical therapy students in patients with nonspecific neck pain. Thirty-one patients, mean age 47.2 ± 7.2 years, were independently evaluated by 2 physical therapy students. The raters (A and B) assessed mobility, end-feel and pain provocation performing bilaterally the 3D cervical segmental side-bending test (3D CSSB) from levels C2-C3 to C6-C7. Percentage agreement (raw, positive and negative), Cohen's kappa (95% CI), prevalence index and bias index were calculated to estimate intra- and inter-reliability. Intra-rater reliability showed kappa values ranging between fair and substantial (k 0.29-0.80) for pain provocation, mobility and end-feel, with percentage agreements between 61%-90%. Inter-rater reliability presented kappa values ranging between fair and substantial (k 0.22-0.62) for pain provocation, mobility and end-feel, with percentage agreements between 61% and 80%. Intra-rater reliability of 3D PIVMs was superior to inter-rater reliability in patients with nonspecific neck pain. The most repeatable evaluation parameter was pain. However overall poor reliability suggests avoiding the use of these techniques alone to examine patients and measure their outcome. Further studies are needed to investigate PIVMs reliability in combination with other assessment procedure in symptomatic patients.

  15. Validation of the one pass measure for motivational interviewing competence.

    Science.gov (United States)

    McMaster, Fiona; Resnicow, Ken

    2015-04-01

    This paper examines the psychometric properties of the OnePass coding system: a new, user-friendly tool for evaluating practitioner competence in motivational interviewing (MI). We provide data on reliability and validity with the current gold-standard: Motivational Interviewing Treatment Integrity tool (MITI). We compared scores from 27 videotaped MI sessions performed by student counselors trained in MI and simulated patients using both OnePass and MITI, with three different raters for each tool. Reliability was estimated using intra-class coefficients (ICCs), and validity was assessed using Pearson's r. OnePass had high levels of inter-rater reliability with 19/23 items found from substantial to almost perfect agreement. Taking the pair of scores with the highest inter-rater reliability on the MITI, the concurrent validity between the two measures ranged from moderate to high. Validity was highest for evocation, autonomy, direction and empathy. OnePass appears to have good inter-rater reliability while capturing similar dimensions of MI as the MITI. Despite the moderate concurrent validity with the MITI, the OnePass shows promise in evaluating both traditional and novel interpretations of MI. OnePass may be a useful tool for developing and improving practitioner competence in MI where access to MITI coders is limited. Copyright © 2015. Published by Elsevier Ireland Ltd.

  16. Reliability and convergent validity of the five-step test in people with chronic stroke.

    Science.gov (United States)

    Ng, Shamay S M; Tse, Mimi M Y; Tam, Eric W C; Lai, Cynthia Y Y

    2018-01-10

    (i) To estimate the intra-rater, inter-rater and test-retest reliabilities of the Five-Step Test (FST), as well as the minimum detectable change in FST completion times in people with stroke. (ii) To estimate the convergent validity of the FST with other measures of stroke-specific impairments. (iii) To identify the best cut-off times for distinguishing FST performance in people with stroke from that of healthy older adults. A cross-sectional study. University-based rehabilitation centre. Forty-eight people with stroke and 39 healthy controls. None. The FST, along with (for the stroke survivors only) scores on the Fugl-Meyer Lower Extremity Assessment (FMA-LE), the Berg Balance Scale (BBS), Limits of Stability (LOS) tests, and Activities-specific Balance Confidence (ABC) scale were tested. The FST showed excellent intra-rater (intra-class correlation coefficient; ICC = 0.866-0.905), inter-rater (ICC = 0.998), and test-retest (ICC = 0.838-0.842) reliabilities. A minimum detectable change of 9.16 s was found for the FST in people with stroke. The FST correlated significantly with the FMA-LE, BBS, and LOS results in the forward and sideways directions (r = -0.411 to -0.716, p people with stroke and healthy older adults. The FST is a reliable, easy-to-administer clinical test for assessing stroke survivors' ability to negotiate steps and stairs.

  17. A Turkish Version of the Critical-Care Pain Observation Tool: Reliability and Validity Assessment.

    Science.gov (United States)

    Aktaş, Yeşim Yaman; Karabulut, Neziha

    2017-08-01

    The study aim was to evaluate the validity and reliability of the Critical-Care Pain Observation Tool in critically ill patients. A repeated measures design was used for the study. A convenience sample of 66 patients who had undergone open-heart surgery in the cardiovascular surgery intensive care unit in Ordu, Turkey, was recruited for the study. The patients were evaluated by using the Critical-Care Pain Observation Tool at rest, during a nociceptive procedure (suctioning), and 20 minutes after the procedure while they were conscious and intubated after surgery. The Turkish version of the Critical-Care Pain Observation Tool has shown statistically acceptable levels of validity and reliability. Inter-rater reliability was supported by moderate-to-high-weighted κ coefficients (weighted κ coefficient = 0.55 to 1.00). For concurrent validity, significant associations were found between the scores on the Critical-Care Pain Observation Tool and the Behavioral Pain Scale scores. Discriminant validity was also supported by higher scores during suctioning (a nociceptive procedure) versus non-nociceptive procedures. The internal consistency of the Critical-Care Pain Observation Tool was 0.72 during a nociceptive procedure and 0.71 during a non-nociceptive procedure. The validity and reliability of the Turkish version of the Critical-Care Pain Observation Tool was determined to be acceptable for pain assessment in critical care, especially for patients who cannot communicate verbally. Copyright © 2016 American Society of PeriAnesthesia Nurses. Published by Elsevier Inc. All rights reserved.

  18. Assessing communication skills in dietetic consultations: the development of the reliable and valid DIET-COMMS tool.

    Science.gov (United States)

    Whitehead, K A; Langley-Evans, S C; Tischler, V A; Swift, J A

    2014-04-01

    There is an increasing emphasis on the development of communication skills for dietitians but few evidence-based assessment tools available. The present study aimed to develop a dietetic-specific, short, reliable and valid assessment tool for measuring communication skills in patient consultations: DIET-COMMS. A literature review and feedback from 15 qualified dietitians were used to establish face and content validity during the development of DIET-COMMS. In total, 113 dietetic students and qualified dietitians were video-recorded undertaking mock consultations, assessed using DIET-COMMS by the lead author, and used to establish intra-rater reliability, as well as construct and predictive validity. Twenty recorded consultations were reassessed by nine qualified dietitians to assess inter-rater reliability: eight of these assessors were interviewed to determine user evaluation. Significant improvements in DIET-COMMS scores were achieved as students and qualified staff progressed through their training and gained experience, demonstrating construct validity, and also by qualified staff attending a training course, indicating predictive validity (P skills in practice was questioned. DIET-COMMS is a short, user-friendly, reliable and valid tool for measuring communication skills in patient consultations with both pre- and post-registration dietitians. Additional work is required to develop a training package for assessors and to identify how DIET-COMMS assessment can acceptably be incorporated into practice. © 2013 The British Dietetic Association Ltd.

  19. The intra- and inter-rater reliability of five clinical muscle performance tests in patients with and without neck pain

    Science.gov (United States)

    2013-01-01

    Background This study investigates the reliability of muscle performance tests using cost- and time-effective methods similar to those used in clinical practice. When conducting reliability studies, great effort goes into standardising test procedures to facilitate a stable outcome. Therefore, several test trials are often performed. However, when muscle performance tests are applied in the clinical setting, clinicians often only conduct a muscle performance test once as repeated testing may produce fatigue and pain, thus variation in test results. We aimed to investigate whether cervical muscle performance tests, which have shown promising psychometric properties, would remain reliable when examined under conditions similar to those of daily clinical practice. Methods The intra-rater (between-day) and inter-rater (within-day) reliability was assessed for five cervical muscle performance tests in patients with (n = 33) and without neck pain (n = 30). The five tests were joint position error, the cranio-cervical flexion test, the neck flexor muscle endurance test performed in supine and in a 45°-upright position and a new neck extensor test. Results Intra-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.48-0.82), the cranio-cervical flexion test (ICC ≥ 0.69), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.68) and in a 45°-upright position (ICC ≥ 0.41) with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement (ICC = 0.14-0.41). Likewise, inter-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.51-0.75), the cranio-cervical flexion test (ICC ≥ 0.85), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.70) and in a 45°-upright position (ICC ≥ 0.56). However, only slight to fair agreement was found for the neck extensor test (ICC

  20. Reliability and validity of the upper-body dressing scale in Japanese patients with vascular dementia with hemiparesis.

    Science.gov (United States)

    Endo, Arisa; Suzuki, Makoto; Akagi, Atsumi; Chiba, Naoyuki; Ishizaka, Ikuyo; Matsunaga, Atsuhiko; Fukuda, Michinari

    2015-03-01

    The purpose of this study was to examine the reliability and validity of the Upper-body Dressing Scale (UBDS) for buttoned shirt dressing, which evaluates the learning process of new component actions of upper-body dressing in patients diagnosed with dementia and hemiparesis. This was a preliminary correlational study of concurrent validity and reliability in which 10 vascular dementia patients with hemiparesis were enrolled and assessed repeatedly by six occupational therapists by means of the UBDS and the dressing item of the Functional Independence Measure (FIM). Intraclass correlation coefficient was 0.97 for intra-rater reliability and 0.99 for inter-rater reliability. The level of correlation between UBDS score and FIM dressing item scores was -0.93. UBDS scores for paralytic hand passed into the sleeve and sleeve pulled up beyond the shoulder joint were worse than the scores for the other components of the task. The UBDS has good reliability and validity for vascular dementia patients with hemiparesis. Further research is needed to investigate the relation between UBDS score and the effect of intervention and to clarify sensitivity or responsiveness of the scale to clinical change. Copyright © 2014 John Wiley & Sons, Ltd.

  1. Inter-rater reliability of data elements from a prototype of the Paul Coverdell National Acute Stroke Registry

    Directory of Open Access Journals (Sweden)

    Wehner Susan

    2008-06-01

    Full Text Available Abstract Background The Paul Coverdell National Acute Stroke Registry (PCNASR is a U.S. based national registry designed to monitor and improve the quality of acute stroke care delivered by hospitals. The registry monitors care through specific performance measures, the accuracy of which depends in part on the reliability of the individual data elements used to construct them. This study describes the inter-rater reliability of data elements collected in Michigan's state-based prototype of the PCNASR. Methods Over a 6-month period, 15 hospitals participating in the Michigan PCNASR prototype submitted data on 2566 acute stroke admissions. Trained hospital staff prospectively identified acute stroke admissions, abstracted chart information, and submitted data to the registry. At each hospital 8 randomly selected cases were re-abstracted by an experienced research nurse. Inter-rater reliability was estimated by the kappa statistic for nominal variables, and intraclass correlation coefficient (ICC for ordinal and continuous variables. Factors that can negatively impact the kappa statistic (i.e., trait prevalence and rater bias were also evaluated. Results A total of 104 charts were available for re-abstraction. Excellent reliability (kappa or ICC > 0.75 was observed for many registry variables including age, gender, black race, hemorrhagic stroke, discharge medications, and modified Rankin Score. Agreement was at least moderate (i.e., 0.75 > kappa ≥; 0.40 for ischemic stroke, TIA, white race, non-ambulance arrival, hospital transfer and direct admit. However, several variables had poor reliability (kappa Conclusion The excellent reliability of many of the data elements supports the use of the PCNASR to monitor and improve care. However, the poor reliability for several variables, particularly time-related events in the emergency department, indicates the need for concerted efforts to improve the quality of data collection. Specific recommendations

  2. Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of traditional Chinese medicine.

    Science.gov (United States)

    Mist, Scott; Ritenbaugh, Cheryl; Aickin, Mikel

    2009-07-01

    To investigate whether a training process that focused on a questionnaire-based diagnosis in Traditional Chinese Medicine (TCM), and developing diagnostic consensus, would improve the agreement of TCM diagnoses among 10 TCM practitioners evaluating patients with temporomandibular joint disorder (TMJD). Evaluation of a diagnostic training program at the Department of Family and Community Medicine, University of Arizona, Tucson, Arizona, and the Oregon College of Oriental Medicine, Portland, Oregon. Screened participants for a study of TCM for TMJD. PRACTITIONERS: Ten (10) licensed acupuncturists with a minimum of 5 years licensure and education in Chinese herbs. A training session using a questionnaire-based diagnostic form was conducted, followed by waves of diagnostic sessions. Between sessions, practitioners discussed the results of the previous round of participants with a focus on reducing variability in primary diagnosis and severity rating of each diagnosis: 3 waves of 5 patients were assessed by 4 practitioner pairs for a total of 120 diagnoses. At 18 months, practitioners completed a recalibration exercise with a similar format with a total of 32 diagnoses. These diagnoses were then examined with respect to the rate of agreement among the 10 practitioners using inter-rater correlations and kappas. The inter-rater correlation with respect to the TCM diagnoses among the 10 practitioners increased from 0.112 to 0.618 with training. Statistically significant improvements were found between the baseline and 18 month exercises (p reliability of TCM diagnosis may be improved through a training process and a questionnaire-based diagnosis process. The improvements varied by diagnosis, with the greatest congruence among primary and more severe diagnoses. Future TCM studies should consider including calibration training to improve the validity of results.

  3. Actors' portrayals of depression to test interrater reliability in clinical trials.

    Science.gov (United States)

    Rosen, Jules; Mulsant, Benoit H; Bruce, Martha L; Mittal, Vikas; Fox, Debra

    2004-10-01

    This study determined if actors could portray depressed patients to establish the interrater reliability of raters using the Hamilton Depression Rating Scale (HDRS). Actors portrayed depressed patients using scripts derived from HDRS assessments obtained at three points during treatment. Four experienced raters blindly viewed videotapes of two patients and two actors. They guessed if each interviewee was a patient or an actor and rated the certainty of their guesses. For each interview, they also rated the realism of the portrayal and completed the HDRS. Experienced raters could not distinguish actors and patients better than chance and were equally certain of their right and wrong guesses. Actors and patients received high scores on the realism of their portrayals. The HDRS scores of the actor-patient pairs were correlated. Actors can effectively portray depressed patients. Future studies will determine if actors can accurately portray patients with anxiety and psychosis.

  4. Safety, reliability, and validity of a physiologic definition of bronchopulmonary dysplasia.

    Science.gov (United States)

    Walsh, Michele C; Wilson-Costello, Deanna; Zadell, Arlene; Newman, Nancy; Fanaroff, Avroy

    2003-09-01

    Bronchopulmonary dysplasia (BPD) is the focus of many intervention trials, yet the outcome measure when based solely on oxygen administration may be confounded by differing criteria for oxygen administration between physicians. Thus, we wished to define BPD by a standardized oxygen saturation monitoring at 36 weeks corrected age, and compare this physiologic definition with the standard clinical definition of BPD based solely on oxygen administration. A total of 199 consecutive very low birthweight infants (VLBW, 501 to 1500 g birthweight) were assessed prospectively at 36+/-1 weeks corrected age. Neonates on positive pressure support or receiving >30% supplemental oxygen were assigned the outcome BPD. Those receiving or =88% for 60 minutes) or "BPD" (saturation reliability, test-retest reliability, and validity of the physiologic definition vs the clinical definition were assessed. A total of 199 VLBW were assessed, of whom 45 (36%) were diagnosed with BPD by the clinical definition of oxygen use at 36 weeks corrected age. The physiologic definition identified 15 infants treated with oxygen who successfully passed the saturation monitoring test in room air. The physiologic definition diagnosed BPD in 30 (24%) of the cohort. All infants were safely studied. The test was highly reliable (inter-rater reliability, kappa=1.0; test-retest reliability, kappa=0.83) and highly correlated with discharge home in oxygen, length of hospital stay, and hospital readmissions in the first year of life. The physiologic definition of BPD is safe, feasible, reliable, and valid and improves the precision of the diagnosis of BPD. This may be of benefit in future multicenter clinical trials.

  5. The validity and reliability of the Turkish version of Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) in patients with mild and moderate Alzheimer's disease and normal subjects.

    Science.gov (United States)

    Mavioglu, H; Gedizlioglu, M; Akyel, S; Aslaner, T; Eser, E

    2006-03-01

    The cognitive subscale of the Alzheimer's Disease Assesment Scale (ADAS-Cog) is the most widely used test in clinical trials dealing with Alzheimer's disease (AD). The aim of this study was to investigate the validity and reliability of the Turkish version of ADAS-Cog. Twenty-nine patients with AD, fulfilling NINCDS-ADRDA criteria of probable AD, who were in stage 3-5 according to the Global Deterioration Scale (GDS), and 27 non-demented control subjects with similar age, gender and educational status were recruited for the study. The Turkish version of ADAS-Cog, Standardized Mini Mental Status Examination (MMSE) and Short Orientation-Memory-Concentration Test (SOMCT) were applied to both of the groups. Inter-rater reliability, internal consistency, test-retest reliability; face validity, differential validity and convergent validity were statistically analyzed. Both MMSE and ADAS-Cog have significantly differentiated patients with AD and control subjects (p ADAS-Cog scores in AD group (r: -0.739). ADAS-Cog was also highly significantly correlated with GDS (r: 0.720) and SOMCT (r: 0.738). For the group with AD, control and whole cohort coefficients of internal consistency, Cronbach's alpha: 0.800, 0.515, 0.873 were found respectively. Inter-rater reliability for total ADAS-Cog score was found as ICC: 0.99 and 0.98 and test-retest reliability was found as ICC: 0.91 and 0.95 for demented and nondemented subjects, respectively. The Turkish version of ADAS-Cog has been found to be highly reliable and valid in differentiating patients with mild and moderate AD from nondemented subjects.

  6. The risk of bias in systematic reviews tool showed fair reliability and good construct validity.

    Science.gov (United States)

    Bühn, Stefanie; Mathes, Tim; Prengel, Peggy; Wegewitz, Uta; Ostermann, Thomas; Robens, Sibylle; Pieper, Dawid

    2017-11-01

    There is a movement from generic quality checklists toward a more domain-based approach in critical appraisal tools. This study aimed to report on a first experience with the newly developed risk of bias in systematic reviews (ROBIS) tool and compare it with A Measurement Tool to Assess Systematic Reviews (AMSTAR), that is, the most common used tool to assess methodological quality of systematic reviews while assessing validity, reliability, and applicability. Validation study with four reviewers based on 16 systematic reviews in the field of occupational health. Interrater reliability (IRR) of all four raters was highest for domain 2 (Fleiss' kappa κ = 0.56) and lowest for domain 4 (κ = 0.04). For ROBIS, median IRR was κ = 0.52 (range 0.13-0.88) for the experienced pair of raters compared to κ = 0.32 (range 0.12-0.76) for the less experienced pair of raters. The percentage of "yes" scores of each review of ROBIS ratings was strongly correlated with the AMSTAR ratings (r s  = 0.76; P = 0.01). ROBIS has fair reliability and good construct validity to assess the risk of bias in systematic reviews. More validation studies are needed to investigate reliability and applicability, in particular. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. The Achievement of Therapeutic Objectives Scale: Interrater Reliability and Sensitivity to Change in Short-Term Dynamic Psychotherapy and Cognitive Therapy

    Science.gov (United States)

    Valen, Jakob; Ryum, Truls; Svartberg, Martin; Stiles, Tore C.; McCullough, Leigh

    2011-01-01

    This study examined interrater reliability and sensitivity to change of the Achievement of Therapeutic Objectives Scale (ATOS; McCullough, Larsen, et al., 2003) in short-term dynamic psychotherapy (STDP) and cognitive therapy (CT). The ATOS is a process scale originally developed to assess patients' achievements of treatment objectives in STDP,…

  8. The Outdoor MEDIA DOT: The development and inter-rater reliability of a tool designed to measure food and beverage outlets and outdoor advertising.

    Science.gov (United States)

    Poulos, Natalie S; Pasch, Keryn E

    2015-07-01

    Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8-229 per school). Overall inter-rater reliability of the developed tool ranged from 69-89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Reliability and concurrent validity of the iPhone® Compass application to measure thoracic rotation range of motion (ROM) in healthy participants

    Science.gov (United States)

    Schram, Ben; Cox, Alistair J.; Anderson, Sarah L.; Keogh, Justin

    2018-01-01

    Background Several water-based sports (swimming, surfing and stand up paddle boarding) require adequate thoracic mobility (specifically rotation) in order to perform the appropriate activity requirements. The measurement of thoracic spine rotation is problematic for clinicians due to a lack of convenient and reliable measurement techniques. More recently, smartphones have been used to quantify movement in various joints in the body; however, there appears to be a paucity of research using smartphones to assess thoracic spine movement. Therefore, the aim of this study is to determine the reliability (intra and inter rater) and validity of the iPhone® app (Compass) when assessing thoracic spine rotation ROM in healthy individuals. Methods A total of thirty participants were recruited for this study. Thoracic spine rotation ROM was measured using both the current clinical gold standard, a universal goniometer (UG) and the Smart Phone Compass app. Intra-rater and inter-rater reliability was determined with a Intraclass Correlation Coefficient (ICC) and associated 95% confidence intervals (CI). Validation of the Compass app in comparison to the UG was measured using Pearson’s correlation coefficient and levels of agreement were identified with Bland–Altman plots and 95% limits of agreement. Results Both the UG and Compass app measurements both had excellent reproducibility for intra-rater (ICC 0.94–0.98) and inter-rater reliability (ICC 0.72–0.89). However, the Compass app measurements had higher intra-rater reliability (ICC = 0.96 − 0.98; 95% CI [0.93–0.99]; vs. ICC = 0.94 − 0.98; 95% CI [0.88–0.99]) and inter-rater reliability (ICC = 0.87 − 0.89; 95% CI [0.74–0.95] vs. ICC = 0.72 − 0.82; 95% CI [0.21–0.94]). A strong and significant correlation was found between the UG and the Compass app, demonstrating good concurrent validity (r = 0.835, p reliable tool for measuring thoracic spine rotation which produces greater

  10. Intra-Rater, Inter-Rater and Test-Retest Reliability of an Instrumented Timed Up and Go (iTUG Test in Patients with Parkinson's Disease.

    Directory of Open Access Journals (Sweden)

    Rob C van Lummel

    Full Text Available The "Timed Up and Go" (TUG is a widely used measure of physical functioning in older people and in neurological populations, including Parkinson's Disease. When using an inertial sensor measurement system (instrumented TUG [iTUG], the individual components of the iTUG and the trunk kinematics can be measured separately, which may provide relevant additional information.The aim of this study was to determine intra-rater, inter-rater and test-retest reliability of the iTUG in patients with Parkinson's Disease.Twenty eight PD patients, aged 50 years or older, were included. For the iTUG the DynaPort Hybrid (McRoberts, The Hague, The Netherlands was worn at the lower back. The device measured acceleration and angular velocity in three directions at a rate of 100 samples/s. Patients performed the iTUG five times on two consecutive days. Repeated measurements by the same rater on the same day were used to calculate intra-rater reliability. Repeated measurements by different raters on the same day were used to calculate intra-rater and inter-rater reliability. Repeated measurements by the same rater on different days were used to calculate test-retest reliability.Nineteen ICC values (15% were ≥ 0.9 which is considered as excellent reliability. Sixty four ICC values (49% were ≥ 0.70 and < 0.90 which is considered as good reliability. Thirty one ICC values (24% were ≥ 0.50 and < 0.70, indicating moderate reliability. Sixteen ICC values (12% were ≥ 0.30 and < 0.50 indicating poor reliability. Two ICT values (2% were < 0.30 indicating very poor reliability.In conclusion, in patients with Parkinson's disease the intra-rater, inter-rater, and test-retest reliability of the individual components of the instrumented TUG (iTUG was excellent to good for total duration and for turning durations, and good to low for the sub durations and for the kinematics of the SiSt and StSi. The results of this fully automated analysis of instrumented TUG movements

  11. Transcultural Adaptation of GRID Hamilton Rating Scale For Depression (GRID-HAMD) to Brazilian Portuguese and Evaluation of the Impact of Training Upon Inter-Rater Reliability.

    Science.gov (United States)

    Henrique-Araújo, Ricardo; Osório, Flávia L; Gonçalves Ribeiro, Mônica; Soares Monteiro, Ivandro; Williams, Janet B W; Kalali, Amir; Alexandre Crippa, José; Oliveira, Irismar Reis De

    2014-07-01

    GRID-HAMD is a semi-structured interview guide developed to overcome flaws in HAM-D, and has been incorporated into an increasing number of studies. Carry out the transcultural adaptation of GRID-HAMD into the Brazilian Portuguese language, evaluate the inter-rater reliability of this instrument and the training impact upon this measure, and verify the raters' opinions of said instrument. The transcultural adaptation was conducted by appropriate methodology. The measurement of inter-rater reliability was done by way of videos that were evaluated by 85 professionals before and after training for the use of this instrument. The intraclass correlation coefficient (ICC) remained between 0.76 and 0.90 for GRID-HAMD-21 and between 0.72 and 0.91 for GRID-HAMD-17. The training did not have an impact on the ICC, except for a few groups of participants with a lower level of experience. Most of the participants showed high acceptance of GRID-HAMD, when compared to other versions of HAM-D. The scale presented adequate inter-rater reliability even before training began. Training did not have an impact on this measure, except for a few groups with less experience. GRID-HAMD received favorable opinions from most of the participants.

  12. Creation and Initial Validation of the International Dysphagia Diet Standardisation Initiative Functional Diet Scale.

    Science.gov (United States)

    Steele, Catriona M; Namasivayam-MacDonald, Ashwini M; Guida, Brittany T; Cichero, Julie A; Duivestein, Janice; Hanson, Ben; Lam, Peter; Riquelme, Luis F

    2018-05-01

    To assess consensual validity, interrater reliability, and criterion validity of the International Dysphagia Diet Standardisation Initiative Functional Diet Scale, a new functional outcome scale intended to capture the severity of oropharyngeal dysphagia, as represented by the degree of diet texture restriction recommended for the patient. Participants assigned International Dysphagia Diet Standardisation Initiative Functional Diet Scale scores to 16 clinical cases. Consensual validity was measured against reference scores determined by an author reference panel. Interrater reliability was measured overall and across quartile subsets of the dataset. Criterion validity was evaluated versus Functional Oral Intake Scale (FOIS) scores assigned by survey respondents to the same case scenarios. Feedback was requested regarding ease and likelihood of use. Web-based survey. Respondents (N=170) from 29 countries. Not applicable. Consensual validity (percent agreement and Kendall τ), criterion validity (Spearman rank correlation), and interrater reliability (Kendall concordance and intraclass coefficients). The International Dysphagia Diet Standardisation Initiative Functional Diet Scale showed strong consensual validity, criterion validity, and interrater reliability. Scenarios involving liquid-only diets, transition from nonoral feeding, or trial diet advances in therapy showed the poorest consensus, indicating a need for clear instructions on how to score these situations. The International Dysphagia Diet Standardisation Initiative Functional Diet Scale showed greater sensitivity than the FOIS to specific changes in diet. Most (>70%) respondents indicated enthusiasm for implementing the International Dysphagia Diet Standardisation Initiative Functional Diet Scale. This initial validation study suggests that the International Dysphagia Diet Standardisation Initiative Functional Diet Scale has strong consensual and criterion validity and can be used reliably by clinicians

  13. Inter-rater reliability of h-index scores calculated by Web of Science and Scopus for clinical epidemiology scientists.

    Science.gov (United States)

    Walker, Benjamin; Alavifard, Sepand; Roberts, Surain; Lanes, Andrea; Ramsay, Tim; Boet, Sylvain

    2016-06-01

    We investigated the inter-rater reliability of Web of Science (WoS) and Scopus when calculating the h-index of 25 senior scientists in the Clinical Epidemiology Program of the Ottawa Hospital Research Institute. Bibliometric information and the h-indices for the subjects were computed by four raters using the automatic calculators in WoS and Scopus. Correlation and agreement between ratings was assessed using Spearman's correlation coefficient and a Bland-Altman plot, respectively. Data could not be gathered from Google Scholar due to feasibility constraints. The Spearman's rank correlation between the h-index of scientists calculated with WoS was 0.81 (95% CI 0.72-0.92) and with Scopus was 0.95 (95% CI 0.92-0.99). The Bland-Altman plot showed no significant rater bias in WoS and Scopus; however, the agreement between ratings is higher in Scopus compared to WoS. Our results showed a stronger relationship and increased agreement between raters when calculating the h-index of a scientist using Scopus compared to WoS. The higher inter-rater reliability and simple user interface used in Scopus may render it the more effective database when calculating the h-index of senior scientists in epidemiology. © 2016 Health Libraries Group.

  14. Pain Assessment in Critically İll Adult Patients: Validity and Reliability Research of the Turkish Version of the Critical-Care Pain Observation Tool

    Directory of Open Access Journals (Sweden)

    Onur Gündoğan

    2016-12-01

    Full Text Available Objective: Critical-Care Pain Observation Tool (CPOT and the Behavioral Pain Scale (BPS are behavioral pain assessment scales for unconscious intensive care unit (ICU patients. The aim is to determine the validation and reliability of the CPOT in Turkish in mechanically ventilated adult ICU patients. Material and Method: This prospective observational cohort study included 50 mechanically ventilated mixed ICU patients who were unable to report pain. CPOT and BPS was translated into Turkish and language validity was performed by ten intensive care specialists. Pain was assessed in the course of painless and painful routine care procedures using the CPOT and the BPS by a resident and an intensivist concomitantly. Tests reliability, interrater reliability, and validity of the CPOT and the BPS were evaluated. Results: The mean age was 57.4 years and the mean APACHE II score was 18.7. A total of 100 assessments were recorded from 50 patients using CPOT and BPS. Scores of CPOT and BPS during the painful procedures were both significantly higher than painless procedures. The agreement between CPOT and BPS during painful and painless stimuli was ranged as; sensitivity 66.7%-90.3%; specificity 89.7%-97.9%; kappa value 0.712-0.892. The agreement between resident and intensivist during painful and painless stimuli was ranged from 97% to 100% and the kappa value was between 0.904 and 1.0. Conclusion: The Turkish version of the CPOT showed good correlation with the BPS. Interrater reliability between resident and intensivist was good. The study showed that the Turkish version of BPS and CPOT are reliable and valid tools to assess pain in daily clinical practice for intubated and unconscious ICU patients who are mechanically ventilated.

  15. Modified sphygmomanometer test for the assessment of strength of the trunk, upper and lower limbs muscles in subjects with subacute stroke: reliability and validity.

    Science.gov (United States)

    Aguiar, Larissa T; Lara, Eliza M; Martins, Julia C; Teixeira-Salmela, Luci F; Quintino, Ludmylla F; Christo, Paulo P; DE Morais Fairaa, Christina

    2016-10-01

    Limitations in activities have been related to weakness of the upper limbs (UL), lower limbs (LL) and trunk muscles after stroke. Therefore, the measurement of strength after stroke becomes essential. The Modified Sphygmomanometer Test (MST) is an alternative method for the measurement of strength, since it is cheap and provides objective values. However, no studies have investigated the measurement properties of the MST in sub-acute stroke. To investigate the test-retest and inter-rater reliabilities and criterion-related validity of the MST for the measurement of strength of the UL, LL, and trunk muscles in subjects with sub-acute stroke, and verify whether the number of trials would affect the results. Diagnostic accuracy. Local community, out-patient clinics, and university laboratory. Sixty- five subjects with sub-acute stroke (62±14 years) participated of the present study. The strength of 36 muscular groups was measured with the MST and dynamometers (criterion standard). To investigate whether the number of trials would affect the results, analysis of variance was applied. For the test-retest and inter-rater reliabilities and criterion-related validity of the MST, intra-class correlation coefficients (ICC), Pearson correlation coefficients, and coefficients of determination were calculated. Similar results were found for all muscular groups and number of trials (0.01≤F≤0.14; 0.87≤p≤0.99) with significant and adequate values of test-retest (0.57≤ICC≥0.98) (exception: first trial of the non-paretic ankle dorsiflexors) and inter-rater (0.50≤ICC≥0.99) (exception: non-paretic ankle plantar flexors) reliabilities and validity (0.70≤r≥0.95; p≤0.001). The values obtained with the MST were good predictors of those obtained with the dynamometers (0.54≤r2≤0.90). In general, the MST showed adequate reliabilities and criterion-related validity for measuring strength of subjects with sub-acute stroke, and only one trial, after familiarization

  16. Validity and reliability of chronic tic disorder and obsessive-compulsive disorder diagnoses in the Swedish National Patient Register.

    Science.gov (United States)

    Rück, Christian; Larsson, K Johan; Lind, Kristina; Perez-Vigil, Ana; Isomura, Kayoko; Sariaslan, Amir; Lichtenstein, Paul; Mataix-Cols, David

    2015-06-22

    The usefulness of cases diagnosed in administrative registers for research purposes is dependent on diagnostic validity. This study aimed to investigate the validity and inter-rater reliability of recorded diagnoses of tic disorders and obsessive-compulsive disorder (OCD) in the Swedish National Patient Register (NPR). Chart review of randomly selected register cases and controls. 100 tic disorder cases and 100 OCD cases were randomly selected from the NPR based on codes from the International Classification of Diseases (ICD) 8th, 9th and 10th editions, together with 50 epilepsy and 50 depression control cases. The obtained psychiatric records were blindly assessed by 2 senior psychiatrists according to the criteria of the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) and ICD-10. Positive predictive value (PPV; cases diagnosed correctly divided by the sum of true positives and false positives). Between 1969 and 2009, the NPR included 7286 tic disorder and 24,757 OCD cases. The vast majority (91.3% of tic cases and 80.1% of OCD cases) are coded with the most recent ICD version (ICD-10). For tic disorders, the PPV was high across all ICD versions (PPV=89% in ICD-8, 86% in ICD-9 and 97% in ICD-10). For OCD, only ICD-10 codes had high validity (PPV=91-96%). None of the epilepsy or depression control cases were wrongly diagnosed as having tic disorders or OCD, respectively. Inter-rater reliability was outstanding for both tic disorders (κ=1) and OCD (κ=0.98). The validity and reliability of ICD codes for tic disorders and OCD in the Swedish NPR is generally high. We propose simple algorithms to further increase the confidence in the validity of these codes for epidemiological research. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  17. Inter-rater reliability of PATH observations for assessment of ergonomic risk factors in hospital work.

    Science.gov (United States)

    Park, Jung-Keun; Boyer, Jon; Tessler, Jamie; Casey, Jeffrey; Schemm, Linda; Gore, Rebecca; Punnett, Laura

    2009-07-01

    This study examined the inter-rater reliability of expert observations of ergonomic risk factors by four analysts. Ten jobs were observed at a hospital using a newly expanded version of the PATH method (Buchholz et al. 1996), to which selected upper extremity exposures had been added. Two of the four raters simultaneously observed each worker onsite for a total of 443 observation pairs containing 18 categorical exposure items each. For most exposure items, kappa coefficients were 0.4 or higher. For some items, agreement was higher both for the jobs with less rapid hand activity and for the analysts with a higher level of ergonomic job analysis experience. These upper extremity exposures could be characterised reliably with real-time observation, given adequate experience and training of the observers. The revised version of PATH is applicable to the analysis of jobs where upper extremity musculoskeletal strain is of concern.

  18. The reliability and validity of cervical auscultation in the diagnosis of dysphagia: a systematic review.

    Science.gov (United States)

    Lagarde, Marloes L J; Kamalski, Digna M A; van den Engel-Hoek, Lenie

    2016-02-01

    To systematically review the available evidence for the reliability and validity of cervical auscultation in diagnosing the several aspects of dysphagia in adults and children suffering from dysphagia. Medline (PubMed), Embase and the Cochrane Library databases. The systematic review was carried out applying the steps of the PRISMA-statement. The methodological quality of the included studies were evaluated using the Dutch 'Cochrane checklist for diagnostic accuracy studies'. A total of 90 articles were identified through the search strategy, and after applying the inclusion and exclusion criteria, six articles were included in this review. In the six studies, 197 patients were assessed with cervical auscultation. Two of the six articles were considered to be of 'good' quality and three studies were of 'moderate' quality. One article was excluded because of a 'poor' methodological quality. Sensitivity ranges from 23%-94% and specificity ranges from 50%-74%. Inter-rater reliability was 'poor' or 'fair' in all studies. The intra-rater reliability shows a wide variance among speech language therapists. In this systematic review, conflicting evidence is found for the validity of cervical auscultation. The reliability of cervical auscultation is insufficient when used as a stand-alone tool in the diagnosis of dysphagia in adults. There is no available evidence for the validity and reliability of cervical auscultation in children. Cervical auscultation should not be used as a stand-alone instrument to diagnose dysphagia. © The Author(s) 2015.

  19. Reliability of CSF turbulence and choroid plexus visualization on fast-sequence MRI in pediatric hydrocephalus.

    Science.gov (United States)

    Rozzelle, Curtis J; Madura, Casey; Reeder, Ron W

    2018-01-01

    OBJECTIVE Endoscopic third ventriculostomy with choroid plexus cauterization for the treatment of neonatal and infant hydrocephalus has gained popularity in the past decade. Identifying treatment failure is critically important. Results of a pilot study of 2 novel imaging markers seen on fast-sequence T2-weighted axial MRI showed potential clinical utility. However, the reliability of multiple raters detecting these markers must be established before a multicenter validation study can be performed. METHODS Two sets of de-identified single-shot T2-weighted turbo spin-echo axial images were prepared from scans of patients before and after they underwent endoscopic third ventriculostomy with choroid plexus cauterization between March 2013 and January 2016. The first set showed the lateral and third ventricles for visualization of turbulent CSF dynamics, and the second set showed the lateral ventricular atria for choroid plexus glomus detection. Three raters (Group 1) received written instructions before evaluating each image set once and then again 1 week later. Another 8 raters (Group 2) evaluated both image sets after oral instruction and group training on a pretest image set. Fleiss' kappa coefficients with 95% CIs were calculated for intrarater and interrater reliability in Group 1 and interrater reliability in Group 2. RESULTS Intrarater reliability kappa coefficients for Group 1 were ≥ 0.74 for turbulence and ≥ 0.80 for choroid plexus; their interrater kappa coefficients at the initial assessment were 0.50 (95% CI 0.37-0.62) and 0.56 (95% CI 0.43-0.69), respectively. The Group 2 interrater kappa scores were 0.82 (95% CI 0.78-0.86) for turbulence and 0.62 (95% CI 0.58-0.66) for choroid plexus. CONCLUSIONS With minimal training, intrarater reliability on visualization of turbulence and the choroid plexus was substantial, but interrater reliability was only moderate. After modestly increasing training, interrater reliability improved to near perfect and to

  20. Reliability and concurrent validity of the iPhone® Compass application to measure thoracic rotation range of motion (ROM in healthy participants

    Directory of Open Access Journals (Sweden)

    James Furness

    2018-03-01

    Full Text Available Background Several water-based sports (swimming, surfing and stand up paddle boarding require adequate thoracic mobility (specifically rotation in order to perform the appropriate activity requirements. The measurement of thoracic spine rotation is problematic for clinicians due to a lack of convenient and reliable measurement techniques. More recently, smartphones have been used to quantify movement in various joints in the body; however, there appears to be a paucity of research using smartphones to assess thoracic spine movement. Therefore, the aim of this study is to determine the reliability (intra and inter rater and validity of the iPhone® app (Compass when assessing thoracic spine rotation ROM in healthy individuals. Methods A total of thirty participants were recruited for this study. Thoracic spine rotation ROM was measured using both the current clinical gold standard, a universal goniometer (UG and the Smart Phone Compass app. Intra-rater and inter-rater reliability was determined with a Intraclass Correlation Coefficient (ICC and associated 95% confidence intervals (CI. Validation of the Compass app in comparison to the UG was measured using Pearson’s correlation coefficient and levels of agreement were identified with Bland–Altman plots and 95% limits of agreement. Results Both the UG and Compass app measurements both had excellent reproducibility for intra-rater (ICC 0.94–0.98 and inter-rater reliability (ICC 0.72–0.89. However, the Compass app measurements had higher intra-rater reliability (ICC = 0.96 − 0.98; 95% CI [0.93–0.99]; vs. ICC = 0.94 − 0.98; 95% CI [0.88–0.99] and inter-rater reliability (ICC = 0.87 − 0.89; 95% CI [0.74–0.95] vs. ICC = 0.72 − 0.82; 95% CI [0.21–0.94]. A strong and significant correlation was found between the UG and the Compass app, demonstrating good concurrent validity (r = 0.835, p < 0.001. Levels of agreement between the two devices were 24.8° (LoA –9

  1. Validity and reliability of a novel 3D scanner for assessment of the shape and volume of amputees' residual limb models.

    Directory of Open Access Journals (Sweden)

    Elena Seminati

    Full Text Available Objective assessment methods to monitor residual limb volume following lower-limb amputation are required to enhance practitioner-led prosthetic fitting. Computer aided systems, including 3D scanners, present numerous advantages and the recent Artec Eva scanner, based on laser free technology, could potentially be an effective solution for monitoring residual limb volumes.The aim of this study was to assess the validity and reliability of the Artec Eva scanner (practical measurement against a high precision laser 3D scanner (criterion measurement for the determination of residual limb model shape and volume.Three observers completed three repeat assessments of ten residual limb models, using both the scanners. Validity of the Artec Eva scanner was assessed (mean percentage error <2% and Bland-Altman statistics were adopted to assess the agreement between the two scanners. Intra and inter-rater reliability (repeatability coefficient <5% of the Artec Eva scanner was calculated for measuring indices of residual limb model volume and shape (i.e. residual limb cross sectional areas and perimeters.Residual limb model volumes ranged from 885 to 4399 ml. Mean percentage error of the Artec Eva scanner (validity was 1.4% of the criterion volumes. Correlation coefficients between the Artec Eva and the Romer determined variables were higher than 0.9. Volume intra-rater and inter-rater reliability coefficients were 0.5% and 0.7%, respectively. Shape percentage maximal error was 2% at the distal end of the residual limb, with intra-rater reliability coefficients presenting the lowest errors (0.2%, both for cross sectional areas and perimeters of the residual limb models.The Artec Eva scanner is a valid and reliable method for assessing residual limb model shapes and volumes. While the method needs to be tested on human residual limbs and the results compared with the current system used in clinical practice, it has the potential to quantify shape and volume

  2. Assessing the suitability of written stroke materials: an evaluation of the interrater reliability of the suitability assessment of materials (SAM) checklist.

    Science.gov (United States)

    Hoffmann, Tammy; Ladner, Yvette

    2012-01-01

    Written materials are frequently used to provide education to stroke patients and their carers. However, poor quality materials are a barrier to effective information provision. A quick and reliable method of evaluating material quality is needed. This study evaluated the interrater reliability of the Suitability Assessment of Materials (SAM) checklist in a sample of written stroke education materials. Two independent raters evaluated the materials (n = 25) using the SAM, and ratings were analyzed to reveal total percentage agreements and weighted kappa values for individual items and overall SAM rating. The majority of the individual SAM items had high interrater reliability, with 17 of the 22 items achieving substantial, almost perfect, or perfect weighted kappa value scores. The overall SAM rating achieved a weighted kappa value of 0.60, with a percentage total agreement of 96%. Health care professionals should evaluate the content and design characteristics of written education materials before using them with patients. A tool such as the SAM checklist can be used; however, raters should exercise caution when interpreting results from items with more subjective scoring criteria. Refinements to the scoring criteria for these items are recommended. The value of the SAM is that it can be used to identify specific elements that should be modified before education materials are provided to patients.

  3. Inter-Rater Reliability of Neck Reflex Points in Women with Chronic Neck Pain.

    Science.gov (United States)

    Weinschenk, Stefan; Göllner, Richard; Hollmann, Markus W; Hotz, Lorenz; Picardi, Susanne; Hubbert, Katharina; Strowitzki, Thomas; Meuser, Thomas

    2016-01-01

    Neck reflex points (NRP) are tender soft tissue areas of the cervical region that display reflectory changes in response to chronic inflammations of correlated regions in the visceral cranium. Six bilateral areas, NRP C0, C1, C2, C3, C4 and C7, are detectable by palpating the lateral neck. We investigated the inter-rater reliability of NRP to assess their potential clinical relevance. 32 consecutive patients with chronic neck pain were examined for NRP tenderness by an experienced physician and an inexperienced medical student in a blinded design. A detailed description of the palpation technique is included in this section. Absence of pain was defined as pain index (PI) = 0, slight tenderness = 1, and marked pain = 2. Findings were evaluated either by pair-wise Cohen's kappa (ĸ) or by percentage of agreement (PA). Examiners identified 40% and 41% of positive NRP, respectively (PI > 0, physician: 155, student: 157) with a slight preference for the left side (1.2:1). The number of patients identified with >6 positive NRP by the examiners was similar (13 vs. 12 patients). ĸ values ranged from 0.52 to 0.95. The overall kappa was ĸ = 0.80 for the left and ĸ = 0.74 for the right side. PA varied from 78.1% to 96.9% with strongest agreement at NRP C0, NRP C2, and NRP C7. Inter-rater agreement was independent of patients' age, gender, body mass index and examiner's experience. The high reproducibility suggests the clinical relevance of NRP in women. © 2016 S. Karger GmbH, Freiburg.

  4. Validity and reliability of the Portuguese version of the quality of life in epilepsy inventory (QOLIE-31) for Brazil.

    Science.gov (United States)

    da Silva, Tatiana Indelicato; Ciconelli, Rozana Mesquita; Alonso, Neide Barreira; Azevedo, Auro Mauro; Westphal-Guitti, Ana Carolina; Pascalicchio, Tatiana Frascarelli; Marques, Carolina Mattos; Caboclo, Luís Otávio Sales Ferreira; Cramer, Joyce A; Sakamoto, Américo Ceiki; Yacubian, Elza Márcia Targas

    2007-03-01

    We report the cultural adaptation and psychometric properties of the Quality of Life in Epilepsy-31 Inventory (QOLIE-31) for the Portuguese language and Brazilian culture. This study involved 150 outpatients: 50 presurgical patients with refractory temporal lobe epilepsy (TLE) related to mesial temporal sclerosis (MTS), 50 patients with juvenile myoclonic epilepsy (JME), and 50 seizure-free patients with TLE. They completed the QOLIE-31, Nottingham Health Profile (NHP), Beck Depression Inventory (BDI), and Adverse Events Profile (AEP) and underwent a neuropsychological evaluation (NE). Internal consistency reliability, interrater and test-retest reliability, and construct validity were assessed. QOLIE-31 mean scores were 33.1 (Social Function), 68.9 (Overall Quality of Life), 56.5 (Seizure Worry), 64.1 (Emotional Well-Being), 63.7 (Energy/Fatigue), 38.9 (Cognitive Function), and 49.7 (Medication Effects). Internal consistency was high (Cronbach's alpha), as were the associations between QOLIE-31 and the BDI, NHP, AEP, and NE. The Portuguese/Brazilian version of the QOLIE-31 inventory showed good reliability, validity, and construct validity.

  5. Detection of early psychotic symptoms: Validation of the Spanish version of the "Symptom Onset in Schizophrenia (SOS) inventory".

    Science.gov (United States)

    Mezquida, Gisela; Cabrera, Bibiana; Martínez-Arán, Anabel; Vieta, Eduard; Bernardo, Miguel

    2018-03-01

    The period of subclinical signs that precedes the onset of psychosis is referred to as the prodrome or high-risk mental state. The "Symptom Onset in Schizophrenia (SOS) inventory" is an instrument to characterize and date the initial symptoms of a psychotic illness. The present study aims to provide reliability and validity data for clinical and research use of the Spanish version of the SOS. Thirty-six participants with a first-episode of psychosis meeting DSM-IV criteria for schizophrenia/schizoaffective/schizophreniform disorder were administered the translated SOS and other clinical assessments. The internal validity, intrarater and interrater reliability were studied. We found strong interrater reliability. To detect the presence/absence of prodromal symptoms, Kappa coefficients ranged between 0.8 and 0.7. Similarly, the raters obtained an excellent level of agreement regarding the onset of each symptom and the duration of symptoms until first treatment (intraclass correlation coefficients between 0.9 and 1.0). Cronbach's alpha was 0.9-1.0 for all the items. The interrater reliability and concurrent validity were also excellent in both cases. This study provides robust psychometric properties of the Spanish version of the SOS. The translated version is adequate in terms of good internal validity, intrarater and interrater reliability, and is as time-efficient as the original version. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. An Investigation of Interrater Reliability for the Rorschach Performance Assessment System (R-PAS) in a Nonpatient U.S. Sample.

    Science.gov (United States)

    Kivisalu, Trisha M; Lewey, Jennifer H; Shaffer, Thomas W; Canfield, Merle L

    2016-01-01

    The Rorschach Performance Assessment System (R-PAS) aims to provide an evidence-based approach to administration, coding, and interpretation of the Rorschach Inkblot Method (RIM). R-PAS analyzes individualized communications given by respondents to each card to code a wide pool of possible variables. Due to the large number of possible codes that can be assigned to these responses, it is important to consider the concordance rates among different assessors. This study investigated interrater reliability for R-PAS protocols. Data were analyzed from a nonpatient convenience sample of 50 participants who were recruited through networking, local marketing, and advertising efforts from January 2013 through October 2014. Blind recoding was used and discrepancies between the initial and blind coders' ratings were analyzed for each variable with SPSS yielding percent agreement and intraclass correlation values. Data for Location, Space, Contents, Synthesis, Vague, Pairs, Form Quality, Populars, Determinants, and Cognitive and Thematic codes are presented. Rates of agreement for 1,168 responses were higher for more simplistic coding (e.g., Location), whereas agreement was lower for more complex codes (e.g., Cognitive and Thematic codes). Overall, concordance rates achieved good to excellent agreement. Results suggest R-PAS is an effective method with high interrater reliability supporting its empirical basis.

  7. Number of test trials needed for performance stability and interrater reliability of the one leg stand test in patients with a major non-traumatic lower limb amputation

    DEFF Research Database (Denmark)

    Kristensen, Morten Tange; Nielsen, Anni Østergaard; Madsen Topp, Ulla

    2014-01-01

    Balance is beneficial for daily functioning of patients with a lower limb amputation and sometimes assessed by the one-leg stand test (OLST). The aims of the study were to examine (1) the number of trials needed to achieve performance stability, (2) the interrater reliability of the OLST in patie......Balance is beneficial for daily functioning of patients with a lower limb amputation and sometimes assessed by the one-leg stand test (OLST). The aims of the study were to examine (1) the number of trials needed to achieve performance stability, (2) the interrater reliability of the OLST...... in patients with a major non-traumatic lower limb amputation, and (3) to provide a test procedure....

  8. Reliability of the Cooking Task in adults with acquired brain injury.

    Science.gov (United States)

    Poncet, Frédérique; Swaine, Bonnie; Taillefer, Chantal; Lamoureux, Julie; Pradat-Diehl, Pascale; Chevignard, Mathilde

    2015-01-01

    Acquired brain injury (ABI) often leads to deficits in executive functioning (EF) responsible for severe and long-standing disabilities in daily life activities. The Cooking Task is an ecological and valid test of EF involving multi-tasking in a real environment. Given its complex scoring system, it is important to establish the tool's reliability. The objective of the study was to examine the reliability of the Cooking Task (internal consistency, inter-rater and test-retest reliability). A total of 160 patients with ABI (113 men, mean age 37 years, SD = 14.3) were tested using the Cooking Task. For test-retest reliability, patients were assessed by the same rater on two occasions (mean interval 11 days) while two raters independently and simultaneously observed and scored patients' performances to estimate inter-rater reliability. Internal consistency was high for the global scale (Cronbach α = .74). Inter-rater reliability (n = 66) for total errors was also high (ICC = .93), however the test-retest reliability (n = 11) was poor (ICC = .36). In general the Cooking Task appears to be a reliable tool. The low test-retest results were expected given the importance of EF in the performance of novel tasks.

  9. Rating of Everyday Arm-Use in the Community and Home (REACH scale for capturing affected arm-use after stroke: development, reliability, and validity.

    Directory of Open Access Journals (Sweden)

    Lisa A Simpson

    Full Text Available To develop a brief, valid and reliable tool [the Rating of Everyday Arm-use in the Community and Home (REACH scale] to classify affected upper limb use after stroke outside the clinical setting.Focus groups with clinicians, patients and caregivers (n = 33 and a literature review were employed to develop the REACH scale. A sample of community-dwelling individuals with stroke was used to assess the validity (n = 96 and inter-rater reliability (n = 73 of the new scale.The REACH consists of separate scales for dominant and non-dominant affected upper limbs, and takes five minutes to administer. Each scale consists of six categories that capture 'no use' to 'full use'. The intraclass correlation coefficient and weighted kappa for inter-rater reliability were 0.97 (95% confidence interval: 0.95-0.98 and 0.91 (0.89-0.93 respectively. REACH scores correlated with external measures of upper extremity use, function and impairment (rho = 0.64-0.94.The REACH scale is a reliable, quick-to-administer tool that has strong relationships to other measures of upper limb use, function and impairment. By providing a rich description of how the affected upper limb is used outside of the clinical setting, the REACH scale fills an important gap among current measures of upper limb use and is useful for understanding the long term effects of stroke rehabilitation.

  10. Validity and reliability of an application review process using dedicated reviewers in one stage of a multi-stage admissions model.

    Science.gov (United States)

    Zeeman, Jacqueline M; McLaughlin, Jacqueline E; Cox, Wendy C

    2017-11-01

    With increased emphasis placed on non-academic skills in the workplace, a need exists to identify an admissions process that evaluates these skills. This study assessed the validity and reliability of an application review process involving three dedicated application reviewers in a multi-stage admissions model. A multi-stage admissions model was utilized during the 2014-2015 admissions cycle. After advancing through the academic review, each application was independently reviewed by two dedicated application reviewers utilizing a six-construct rubric (written communication, extracurricular and community service activities, leadership experience, pharmacy career appreciation, research experience, and resiliency). Rubric scores were extrapolated to a three-tier ranking to select candidates for on-site interviews. Kappa statistics were used to assess interrater reliability. A three-facet Many-Facet Rasch Model (MFRM) determined reviewer severity, candidate suitability, and rubric construct difficulty. The kappa statistic for candidates' tier rank score (n = 388 candidates) was 0.692 with a perfect agreement frequency of 84.3%. There was substantial interrater reliability between reviewers for the tier ranking (kappa: 0.654-0.710). Highest construct agreement occurred in written communication (kappa: 0.924-0.984). A three-facet MFRM analysis explained 36.9% of variance in the ratings, with 0.06% reflecting application reviewer scoring patterns (i.e., severity or leniency), 22.8% reflecting candidate suitability, and 14.1% reflecting construct difficulty. Utilization of dedicated application reviewers and a defined tiered rubric provided a valid and reliable method to effectively evaluate candidates during the application review process. These analyses provide insight into opportunities for improving the application review process among schools and colleges of pharmacy. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. The reliability of language performance measurement in language sample analysis of children aged 5-6 years

    Directory of Open Access Journals (Sweden)

    Zahra Soleymani

    2014-04-01

    Full Text Available Background and Aim: The language sample analysis (LSA is more common in other languages than Persian to study language development and assess language pathology. We studied some psychometric properties of language sample analysis in this research such as content validity of written story and its pictures, test-retest reliability, and inter-rater reliability.Methods: We wrote a story based on Persian culture from Schneider’s study. The validity of written story and drawn pictures was approved by experts. To study test-retest reliability, 30 children looked at the pictures and told their own story twice with 7-10 days interval. Children generated the story themselves and tester did not give any cue about the story. Their audio-taped story was transcribed and analyzed. Sentence and word structures were detected in the analysis.Results: Mean of experts' agreement with the validity of written story was 92.28 percent. Experts scored the quality of pictures high and excellent. There was correlation between variables in sentence and word structure (p<0.05 in test-retest, except complex sentences (p=0.137. The agreement rate was 97.1 percent in inter-rater reliability assessment of transcription. The results of inter-rater reliability of language analysis showed that correlation coefficients were significant.Conclusion: The results confirmed that the tool was valid for eliciting language sample. The consistency of language performance in repeated measurement varied from mild to high in language sample analysis approach.

  12. The validity, reliability and normative scores of the parent, teacher and self report versions of the Strengths and Difficulties Questionnaire in China

    Directory of Open Access Journals (Sweden)

    Coghill David

    2008-04-01

    Full Text Available Abstract Background The Strengths and Difficulties Questionnaire (SDQ has become one of the most widely used measurement tools in child and adolescent mental health work across the globe. The SDQ was originally developed and validated within the UK and whilst its reliability and validity have been replicated in several countries important cross cultural issues have been raised. We describe normative data, reliability and validity of the Chinese translation of the SDQ (parent, teacher and self report versions in a large group of children from Shanghai. Methods The SDQ was administered to the parents and teachers of students from 12 of Shanghai's 19 districts, aged between 3 and 17 years old, and to those young people aged between 11 and 17 years. Retest data was collected from parents and teachers for 45 students six weeks later. Data was analysed to describe normative scores, bandings and cut-offs for normal, borderline and abnormal scores. Reliability was assessed from analyses of internal consistency, inter-rater agreement, and temporal stability. Structural validity, convergent and discriminant validity were assessed. Results Full parent and teacher data was available for 1965 subjects and self report data for 690 subjects. Normative data for this Chinese urban population with bandings and cut-offs for borderline and abnormal scores are described. Principle components analysis indicates partial agreement with the original five factored subscale structure however this appears to hold more strongly for the Prosocial Behaviour, Hyperactivity – Inattention and Emotional Symptoms subscales than for Conduct Problems and Peer Problems. Internal consistency as measured by Cronbach's α coefficient were generally low ranging between 0.30 and 0.83 with only parent and teacher Hyperactivity – Inattention and teacher Prosocial Behaviour subscales having α > 0.7. Inter-rater correlations were similar to those reported previously (range 0.23 – 0

  13. Pneumothorax size measurements on digital chest radiographs: Intra- and inter- rater reliability.

    Science.gov (United States)

    Thelle, Andreas; Gjerdevik, Miriam; Grydeland, Thomas; Skorge, Trude D; Wentzel-Larsen, Tore; Bakke, Per S

    2015-10-01

    Detailed and reliable methods may be important for discussions on the importance of pneumothorax size in clinical decision-making. Rhea's method is widely used to estimate pneumothorax size in percent based on chest X-rays (CXRs) from three measure points. Choi's addendum is used for anterioposterior projections. The aim of this study was to examine the intrarater and interrater reliability of the Rhea and Choi method using digital CXR in the ward based PACS monitors. Three physicians examined a retrospective series of 80 digital CXRs showing pneumothorax, using Rhea and Choi's method, then repeated in a random order two weeks later. We used the analysis of variance technique by Eliasziw et al. to assess the intrarater and interrater reliability in altogether 480 estimations of pneumothorax size. Estimated pneumothorax sizes ranged between 5% and 100%. The intrarater reliability coefficient was 0.98 (95% one-sided lower-limit confidence interval C 0.96), and the interrater reliability coefficient was 0.95 (95% one-sided lower-limit confidence interval 0.93). This study has shown that the Rhea and Choi method for calculating pneumothorax size has high intrarater and interrater reliability. These results are valid across gender, side of pneumothorax and whether the patient is diagnosed with primary or secondary pneumothorax. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  14. Reliability of histologic assessment in patients with eosinophilic oesophagitis.

    Science.gov (United States)

    Warners, M J; Ambarus, C A; Bredenoord, A J; Verheij, J; Lauwers, G Y; Walsh, J C; Katzka, D A; Nelson, S; van Viegen, T; Furuta, G T; Gupta, S K; Stitt, L; Zou, G; Parker, C E; Shackelton, L M; D Haens, G R; Sandborn, W J; Dellon, E S; Feagan, B G; Collins, M H; Jairath, V; Pai, R K

    2018-04-01

    The validity of the eosinophilic oesophagitis (EoE) histologic scoring system (EoEHSS) has been demonstrated, but only preliminary reliability data exist. Formally assess the reliability of the EoEHSS and additional histologic features. Four expert gastrointestinal pathologists independently reviewed slides from adult patients with EoE (N = 45) twice, in random order, using standardised training materials and scoring conventions for the EoEHSS and additional histologic features agreed upon during a modified Delphi process. Intra- and inter-rater reliability for scoring the EoEHSS, a visual analogue scale (VAS) of overall histopathologic disease severity, and additional histologic features were assessed using intra-class correlation coefficients (ICCs). Almost perfect intra-rater reliability was observed for the composite EoEHSS scores and the VAS. Inter-rater reliability was also almost perfect for the composite EoEHSS scores and substantial for the VAS. Of the EoEHSS items, eosinophilic inflammation was associated with the highest ICC estimates and consistent with almost perfect intra- and inter-rater reliability. With the exception of dyskeratotic epithelial cells and surface epithelial alteration, ICC estimates for the remaining EoEHSS items were above the benchmarks for substantial intra-rater, and moderate inter-rater reliability. Estimation of peak eosinophil count and number of lamina propria eosinophils were associated with the highest ICC estimates among the exploratory items. The composite EoEHSS and most component items are associated with substantial reliability when assessed by central pathologists. Future studies should assess responsiveness of the score to change after a therapeutic intervention to facilitate its use in clinical trials. © 2018 John Wiley & Sons Ltd.

  15. Converting three general-cognitive function scales into Persian and assessment of their validity and reliability

    Directory of Open Access Journals (Sweden)

    Payam Moin

    2011-01-01

    Full Text Available Objectives: Glasgow Outcome Scale Extended (GOSE, Galveston Amnesia and orientation Test (GOAT and Disability Rating Scale (DRS are three popular outcome measure tools used principally in traumatic brain injury (TBI patients. We conducted this study to provide a Farsi version of these outcome scales for use in Iran. Methods: Following a comprehensive literature review, Farsi transcripts were prepared by "forward-backward" translation and reviewed by subject experts. After a pretest on a few patients, the final versions were obtained. 38 patients with closed head injury were interviewed simultaneously by two interviewers. Main statistics used to assess validity and reliability included "Factor analysis" for construct validity, Cronbach′s alpha for internal consistency, and Pearson Correlation and Kappa Coefficient for inter-rater agreement. Results: Factor analysis for Farsi-GOAT (FGOAT revealed 5 independent factors with a total distribution variance of 80.2%. For Farsi-DRS (FDRS, 3 independent factors were found with a 92.3% variance. The Cronbach′s alpha (95% confidence interval was 0.84 (0.763- 0.919 and 0.91 (0.901-0.919 for FGOAT and FDRS, respectively. Pearson Correlation between total scores of two raters was 0.98 and 0.97 for FGOAT and FDRS, in order. Kappa coefficient (95% CI between outcome rankings of raters was 0.73 (0.618-0.852 and 0.68 (0.594-0.770 for FGOAT and FDRS, respectively. As for Farsi-GOSE scale, Kappa value was 0.4 (0.285-0.507 for 8-level outcome ranking and improved to 0.7 (0.585-0.817 for 5-level scale. We found a good correlation between FDRS and FGOSE predicted prognoses (Spearman′s rho= 0.74, 95% CI: 0.676-0.802. Conclusions: FDRS and FGOAT had appropriate validity and reliability. The 8-level outcome FGOSE scale disclosed a low inter-rater agreement, but a suitable observer agreement was achieved when the 5-level outcome was applied.

  16. Reliability and Validity of a Survey of Cat Caregivers on Their Cats’ Socialization Level in the Cat’s Normal Environment

    Directory of Open Access Journals (Sweden)

    Margaret Slater

    2013-12-01

    Full Text Available Stray cats routinely enter animal welfare organizations each year and shelters are challenged with determining the level of human socialization these cats may possess as quickly as possible. However, there is currently no standard process to guide this determination. This study describes the development and validation of a caregiver survey designed to be filled out by a cat’s caregiver so it accurately describes a cat’s personality, background, and full range of behavior with people when in its normal environment. The results from this survey provided the basis for a socialization score that ranged from unsocialized to well socialized with people. The quality of the survey was evaluated based on inter-rater and test-retest reliability and internal consistency and estimates of construct and criterion validity. In general, our results showed moderate to high levels of inter-rater (median of 0.803, range 0.211–0.957 and test-retest agreement (median 0.92, range 0.211–0.999. Cronbach’s alpha showed high internal consistency (0.962. Estimates of validity did not highlight any major shortcomings. This survey will be used to develop and validate an effective assessment process that accurately differentiates cats by their socialization levels towards humans based on direct observation of cats’ behavior in an animal shelter.

  17. Reliability and validity of a treatment fidelity assessment for motivational interviewing targeting sexual risk behaviors in people living with HIV/AIDS.

    Science.gov (United States)

    Seng, Elizabeth K; Lovejoy, Travis I

    2013-12-01

    This study psychometrically evaluates the Motivational Interviewing Treatment Integrity Code (MITI) to assess fidelity to motivational interviewing to reduce sexual risk behaviors in people living with HIV/AIDS. 74 sessions from a pilot randomized controlled trial of motivational interviewing to reduce sexual risk behaviors in people living with HIV were coded with the MITI. Participants reported sexual behavior at baseline, 3-month, and 6-months. Regarding reliability, excellent inter-rater reliability was achieved for measures of behavior frequency across the 12 sessions coded by both coders; global scales demonstrated poor intraclass correlations, but adequate percent agreement. Regarding validity, principle components analyses indicated that a two-factor model accounted for an adequate amount of variance in the data. These factors were associated with decreases in sexual risk behaviors after treatment. The MITI is a reliable and valid measurement of treatment fidelity for motivational interviewing targeting sexual risk behaviors in people living with HIV/AIDS.

  18. Reliability and Validity of Digital Imagery Methodology for Measuring Starting Portions and Plate Waste from School Salad Bars.

    Science.gov (United States)

    Bean, Melanie K; Raynor, Hollie A; Thornton, Laura M; Sova, Alexandra; Dunne Stewart, Mary; Mazzeo, Suzanne E

    2018-04-12

    Scientifically sound methods for investigating dietary consumption patterns from self-serve salad bars are needed to inform school policies and programs. To examine the reliability and validity of digital imagery for determining starting portions and plate waste of self-serve salad bar vegetables (which have variable starting portions) compared with manual weights. In a laboratory setting, 30 mock salads with 73 vegetables were made, and consumption was simulated. Each component (initial and removed portion) was weighed; photographs of weighed reference portions and pre- and post-consumption mock salads were taken. Seven trained independent raters visually assessed images to estimate starting portions to the nearest ¼ cup and percentage consumed in 20% increments. These values were converted to grams for comparison with weighed values. Intraclass correlations between weighed and digital imagery-assessed portions and plate waste were used to assess interrater reliability and validity. Pearson's correlations between weights and digital imagery assessments were also examined. Paired samples t tests were used to evaluate mean differences (in grams) between digital imagery-assessed portions and measured weights. Interrater reliabilities were excellent for starting portions and plate waste with digital imagery. For accuracy, intraclass correlations were moderate, with lower accuracy for determining starting portions of leafy greens compared with other vegetables. However, accuracy of digital imagery-assessed plate waste was excellent. Digital imagery assessments were not significantly different from measured weights for estimating overall vegetable starting portions or waste; however, digital imagery assessments slightly underestimated starting portions (by 3.5 g) and waste (by 2.1 g) of leafy greens. This investigation provides preliminary support for use of digital imagery in estimating starting portions and plate waste from school salad bars. Results might inform

  19. Reliability of the Structured Clinical Interview for DSM-5 Sleep Disorders Module.

    Science.gov (United States)

    Taylor, Daniel J; Wilkerson, Allison K; Pruiksma, Kristi E; Williams, Jacob M; Ruggero, Camilo J; Hale, Willie; Mintz, Jim; Organek, Katherine Marczyk; Nicholson, Karin L; Litz, Brett T; Young-McCaughan, Stacey; Dondanville, Katherine A; Borah, Elisa V; Brundige, Antoinette; Peterson, Alan L

    2018-03-15

    To develop and demonstrate interrater reliability for a Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) Sleep Disorders (SCISD). The SCISD was designed to be a brief, reliable, and valid interview assessment of adult sleep disorders as defined by the DSM-5. A sample of 106 postdeployment active-duty military members seeking cognitive behavioral therapy for insomnia in a randomized clinical trial were assessed with the SCISD prior to treatment to determine eligibility. Audio recordings of these interviews were double-scored for interrater reliability. The interview is 8 pages long, includes 20 to 51 questions, and takes 10 to 20 minutes to administer. Of the nine major disorders included in the SCISD, six had prevalence rates high enough (ie, n ≥ 5) to include in analyses. Cohen kappa coefficient (κ) was used to assess interrater reliability for insomnia, hypersomnolence, obstructive sleep apnea hypopnea (OSAH), circadian rhythm sleep-wake, nightmare, and restless legs syndrome disorders. There was excellent interrater reliability for insomnia (1.0) and restless legs syndrome (0.83); very good reliability for nightmare disorder (0.78) and OSAH (0.73); and good reliability for hypersomnolence (0.50) and circadian rhythm sleep-wake disorders (0.50). The SCISD is a brief, structured clinical interview that is easy for clinicians to learn and use. The SCISD showed moderate to excellent interrater reliability for six of the major sleep disorders in the DSM-5 among active duty military seeking cognitive behavioral therapy for insomnia in a randomized clinical trial. Replication and extension studies are needed. Registry: ClinicalTrials.gov; Title: Comparing Internet and In-Person Brief Cognitive Behavioral Therapy of Insomnia; Identifier: NCT01549899; URL: https://clinicaltrials.gov/ct2/show/NCT01549899. © 2018 American Academy of Sleep Medicine.

  20. KAMUTHE video microanalysis system for use in Brazil: translation, cross-cultural adaptation and evidence of validity and reliability

    Directory of Open Access Journals (Sweden)

    Gustavo Schulz Gattino

    2016-11-01

    Full Text Available Background KAMUTHE is a video microanalysis system which observes preverbal communication within the music therapy setting. This system is indicated for children with autism spectrum disorder (ASD or multiple disabilities. The purpose of this study was to translate, adapt to Brazilian Portuguese language and analyze some psychometric properties (reliability and validity evidence of KAMUTHE administration in Brazil for individuals with ASD. Participants and procedure Translation, back translation, analysis by judges, and pilot application were performed to obtain evidence of content and face validity. The second part of this study was to administer KAMUTHE in 39 consecutive children with ASD. An individual session of improvisational music therapy was applied to assess the different behaviors included in KAMUTHE. The intra-rater reliability, concurrent validity and convergent validity were analyzed. Results Translation and cross-cultural adaptation were followed and some cultural adaptations were needed. Inter-rater reliability was very good (ICCs 0.95-0.99 for the three child’s behaviors analyzed. Criteria validity with a moderate negative association was found (r = –.38, p = .017 comparing the behavior “Gazes at therapist” and the level of ASD along with the Childhood Autism Rating Scale (CARS. Convergent validity was established between the behavior “Gazes at therapist” and the two nonlinguistic communication scales (social interaction and interests of the Children’s Communication Checklist (CCC with a moderate correlation (r = –.43, p = .005. Conclusions The administration of the KAMUTHE video microanalysis system showed positive results in children with ASD. Further studies are needed to improve the reliability and validity of the instrument in Brazil.

  1. The Construct Validity and Reliability of an Assessment Tool for Competency in Cochlear Implant Surgery

    Directory of Open Access Journals (Sweden)

    Patorn Piromchai

    2014-01-01

    Full Text Available Introduction. We introduce a rating tool that objectively evaluates the skills of surgical trainees performing cochlear implant surgery. Methods. Seven residents and seven experts performed cochlear implant surgery sessions from mastoidectomy to cochleostomy on a standardized virtual reality temporal bone. A total of twenty-eight assessment videos were recorded and two consultant otolaryngologists evaluated the performance of each participant using these videos. Results. Interrater reliability was calculated using the intraclass correlation coefficient for both the global and checklist components of the assessment instrument. The overall agreement was high. The construct validity of this instrument was strongly supported by the significantly higher scores in the expert group for both components. Conclusion. Our results indicate that the proposed assessment tool for cochlear implant surgery is reliable, accurate, and easy to use. This instrument can thus be used to provide objective feedback on overall and task-specific competency in cochlear implantation.

  2. Development, reliability, and validity of the Posttraumatic Stress Disorder Interview for Vietnamese refugees: a diagnostic instrument for Vietnamese refugees.

    Science.gov (United States)

    Dao, Tam K; Poritz, Julia M P; Moody, Rachel P; Szeto, Kim

    2012-08-01

    The Posttraumatic Stress Disorder Interview for Vietnamese Refugees (PTSD-IVR) was created specifically to assess for the presence of current and lifetime history of premigration, migration, encampment, and postmigration traumas in Vietnamese refugees. The purpose of the present study was to describe the development of and investigate the interrater and test-retest reliability of the PTSD-IVR and its validity in relation to the diagnoses obtained from the Longitudinal, Expert, and All Data (LEAD; Spitzer, 1983) standard. Clinicians conducted the diagnosis process with 127 Vietnamese refugees using the LEAD standard and the PTSD-IVR. Assessment of the reliability and validity of the PTSD-IVR yielded good to excellent AUC (area under the receiver operating characteristic curve; .86, .87) and κ values (.66, .74) indicating the reliability of the PTSD-IVR and the agreement between the LEAD procedure and the PTSD-IVR. The results of the present study suggest that the PTSD-IVR performs successfully as a diagnostic instrument specifically created for Vietnamese refugees in their native language. Copyright © 2012 International Society for Traumatic Stress Studies.

  3. The reliability and validity of the informant AD8 by comparison with a series of cognitive assessment tools in primary healthcare.

    Science.gov (United States)

    Shaik, Muhammad Amin; Xu, Xin; Chan, Qun Lin; Hui, Richard Jor Yeong; Chong, Steven Shih Tsze; Chen, Christopher Li-Hsian; Dong, YanHong

    2016-03-01

    The validity and reliability of the informant AD8 in primary healthcare has not been established. Therefore, the present study examined the validity and reliability of the informant AD8 in government subsidized primary healthcare centers in Singapore. Eligible patients (≥60 years old) were recruited from primary healthcare centers and their informants received the AD8. Patient-informant dyads who agreed for further cognitive assessments received the Mini-Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), Clinical Dementia Rating (CDR), and a locally validated formal neuropsychological battery at a research center in a tertiary hospital. 1,082 informants completed AD8 assessment at two primary healthcare centers. Of these, 309 patients-informant dyads were further assessed, of whom 243 (78.6%) were CDR = 0; 22 (7.1%) were CDR = 0.5; and 44 (14.2%) were CDR≥1. The mean administration time of the informant AD8 was 2.3 ± 1.0 minutes. The informant AD8 demonstrated good internal consistency (Cronbach's α = 0.85); inter-rater reliability (Intraclass Correlation Coefficient (ICC) = 0.85); and test-retest reliability (weighted κ = 0.80). Concurrent validity, as measured by the correlation between total AD8 scores and CDR global (R = 0.65, p validity, as measured by convergent validity (R ≥ 0.4) between individual items of AD8 with CDR and neuropsychological domains was acceptable. The informant AD8 demonstrated good concurrent and construct validity and is a reliable measure to detect cognitive dysfunction in primary healthcare.

  4. Reliability and validity of the international dementia alliance schedule for the assessment and staging of care in China.

    Science.gov (United States)

    Wang, Xiao; Sun, Zhenghai; Xiong, Lingchuan; Semrau, Maya; He, Jianhua; Li, Yang; Zhu, Jianzhong; Zhang, Nan; Wang, Aimin; Jiang, Qinpu; Mu, Nan; Zhao, Yuping; Chen, Wei; Wu, Donghui; Zheng, Zhanjie; Sun, Yongan; Zhang, Jing; Xu, Jun; Meng, Xue; Zhao, Mei; Zhang, Haifeng; Lv, Xiaozhen; Sartorius, Norman; Li, Tao; Yu, Xin; Wang, Huali

    2017-11-21

    Clinical and social services both are important for dementia care. The International Dementia Alliance (IDEAL) Schedule for the Assessment and Staging of Care was developed to guide clinical and social care for dementia. Our study aimed to assess the validity and reliability of the IDEAL schedule in China. Two hundred eighty-two dementia patients and their caregivers were recruited from 15 hospitals in China. Each patient-caregiver dyad was assessed with the IDEAL schedule by a rater and an observer simultaneously. The Clinical Dementia Rating (CDR), Mini-Mental Status Examination (MMSE), and Caregiver Burden Inventory (CBI) were assessed for criterion validity. IDEAL repeated assessment was conducted 7-10 days after the initial interview for 62 dyads. Two hundred seventy-seven patient-caregiver dyads completed the IDEAL assessment. Inter-rater reliability for the total score of the IDEAL schedule was 0.93 (95%CI = 0.92-0.95). The inter-class coefficient for the total score of IDEAL was 0.95 for the interviewers and 0.93 for the silent raters. The IDEAL total score correlated with the global CDR score (ρ = 0.72, p valid and reliable tool for the staging of care for dementia in the Chinese population.

  5. Validity and Reliability of Field-Based Measures for Assessing Movement Skill Competency in Lifelong Physical Activities: A Systematic Review.

    Science.gov (United States)

    Hulteen, Ryan M; Lander, Natalie J; Morgan, Philip J; Barnett, Lisa M; Robertson, Samuel J; Lubans, David R

    2015-10-01

    It has been suggested that young people should develop competence in a variety of 'lifelong physical activities' to ensure that they can be active across the lifespan. The primary aim of this systematic review is to report the methodological properties, validity, reliability, and test duration of field-based measures that assess movement skill competency in lifelong physical activities. A secondary aim was to clearly define those characteristics unique to lifelong physical activities. A search of four electronic databases (Scopus, SPORTDiscus, ProQuest, and PubMed) was conducted between June 2014 and April 2015 with no date restrictions. Studies addressing the validity and/or reliability of lifelong physical activity tests were reviewed. Included articles were required to assess lifelong physical activities using process-oriented measures, as well as report either one type of validity or reliability. Assessment criteria for methodological quality were adapted from a checklist used in a previous review of sport skill outcome assessments. Movement skill assessments for eight different lifelong physical activities (badminton, cycling, dance, golf, racquetball, resistance training, swimming, and tennis) in 17 studies were identified for inclusion. Methodological quality, validity, reliability, and test duration (time to assess a single participant), for each article were assessed. Moderate to excellent reliability results were found in 16 of 17 studies, with 71% reporting inter-rater reliability and 41% reporting intra-rater reliability. Only four studies in this review reported test-retest reliability. Ten studies reported validity results; content validity was cited in 41% of these studies. Construct validity was reported in 24% of studies, while criterion validity was only reported in 12% of studies. Numerous assessments for lifelong physical activities may exist, yet only assessments for eight lifelong physical activities were included in this review

  6. Validity and reliability of The Johns Hopkins Adapted Cognitive Exam for critically ill patients.

    Science.gov (United States)

    Lewin, John J; LeDroux, Shannon N; Shermock, Kenneth M; Thompson, Carol B; Goodwin, Haley E; Mirski, Erin A; Gill, Randeep S; Mirski, Marek A

    2012-01-01

    To validate The Johns Hopkins Adapted Cognitive Exam designed to assess and quantify cognition in critically ill patients. Prospective cohort study. Neurosciences, surgical, and medical intensive care units at The Johns Hopkins Hospital. One hundred six adult critically ill patients. One expert neurologic assessment and four measurements of the Adapted Cognitive Exam (all patients). Four measurements of the Folstein Mini-Mental State Examination in nonintubated patients only. Adapted Cognitive Exam and Mini-Mental State Examination were performed by 76 different raters. One hundred six patients were assessed, 46 intubated and 60 nonintubated, resulting in 424 Adapted Cognitive Exam and 240 Mini-Mental State Examination measurements. Criterion validity was assessed by comparing Adapted Cognitive Exam with a neurointensivist's assessment of cognitive status (ρ = 0.83, p validity was assessed by comparing Adapted Cognitive Exam with Mini-Mental State Examination in nonintubated patients (ρ = 0.81, p validity was assessed by surveying raters who used both the Adapted Cognitive Exam and Mini-Mental State Examination and indicated the Adapted Cognitive Exam was an accurate reflection of the patient's cognitive status, more sensitive a marker of cognition than the Mini-Mental State Examination, and easy to use. The Adapted Cognitive Exam demonstrated excellent interrater reliability (intraclass correlation coefficient = 0.997; 95% confidence interval 0.997-0.998) and interitem reliability of each of the five subscales of the Adapted Cognitive Exam and Mini-Mental State Examination (Cronbach's α: range for Adapted Cognitive Exam = 0.83-0.88; range for Mini-Mental State Examination = 0.72-0.81). The Adapted Cognitive Exam is the first valid and reliable examination for the assessment and quantification of cognition in critically ill patients. It provides a useful, objective tool that can be used by any member of the interdisciplinary critical care team to support

  7. Automated bony region identification using artificial neural networks: reliability and validation measurements

    International Nuclear Information System (INIS)

    Gassman, Esther E.; Kallemeyn, Nicole A.; DeVries, Nicole A.; Shivanna, Kiran H.; Powell, Stephanie M.; Magnotta, Vincent A.; Ramme, Austin J.; Adams, Brian D.; Grosland, Nicole M.

    2008-01-01

    The objective was to develop tools for automating the identification of bony structures, to assess the reliability of this technique against manual raters, and to validate the resulting regions of interest against physical surface scans obtained from the same specimen. Artificial intelligence-based algorithms have been used for image segmentation, specifically artificial neural networks (ANNs). For this study, an ANN was created and trained to identify the phalanges of the human hand. The relative overlap between the ANN and a manual tracer was 0.87, 0.82, and 0.76, for the proximal, middle, and distal index phalanx bones respectively. Compared with the physical surface scans, the ANN-generated surface representations differed on average by 0.35 mm, 0.29 mm, and 0.40 mm for the proximal, middle, and distal phalanges respectively. Furthermore, the ANN proved to segment the structures in less than one-tenth of the time required by a manual rater. The ANN has proven to be a reliable and valid means of segmenting the phalanx bones from CT images. Employing automated methods such as the ANN for segmentation, eliminates the likelihood of rater drift and inter-rater variability. Automated methods also decrease the amount of time and manual effort required to extract the data of interest, thereby making the feasibility of patient-specific modeling a reality. (orig.)

  8. Automated bony region identification using artificial neural networks: reliability and validation measurements

    Energy Technology Data Exchange (ETDEWEB)

    Gassman, Esther E.; Kallemeyn, Nicole A.; DeVries, Nicole A.; Shivanna, Kiran H. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); The University of Iowa, Center for Computer-Aided Design, Iowa City, IA (United States); Powell, Stephanie M. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); University of Iowa Hospitals and Clinics, The University of Iowa, Department of Radiology, Iowa City, IA (United States); Magnotta, Vincent A. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); The University of Iowa, Center for Computer-Aided Design, Iowa City, IA (United States); University of Iowa Hospitals and Clinics, The University of Iowa, Department of Radiology, Iowa City, IA (United States); Ramme, Austin J. [University of Iowa Hospitals and Clinics, The University of Iowa, Department of Radiology, Iowa City, IA (United States); Adams, Brian D. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); University of Iowa Hospitals and Clinics, The University of Iowa, Department of Orthopaedics and Rehabilitation, Iowa City, IA (United States); Grosland, Nicole M. [The University of Iowa, Department of Biomedical Engineering, Seamans Center for the Engineering Arts and Sciences, Iowa City, IA (United States); University of Iowa Hospitals and Clinics, The University of Iowa, Department of Orthopaedics and Rehabilitation, Iowa City, IA (United States); The University of Iowa, Center for Computer-Aided Design, Iowa City, IA (United States)

    2008-04-15

    The objective was to develop tools for automating the identification of bony structures, to assess the reliability of this technique against manual raters, and to validate the resulting regions of interest against physical surface scans obtained from the same specimen. Artificial intelligence-based algorithms have been used for image segmentation, specifically artificial neural networks (ANNs). For this study, an ANN was created and trained to identify the phalanges of the human hand. The relative overlap between the ANN and a manual tracer was 0.87, 0.82, and 0.76, for the proximal, middle, and distal index phalanx bones respectively. Compared with the physical surface scans, the ANN-generated surface representations differed on average by 0.35 mm, 0.29 mm, and 0.40 mm for the proximal, middle, and distal phalanges respectively. Furthermore, the ANN proved to segment the structures in less than one-tenth of the time required by a manual rater. The ANN has proven to be a reliable and valid means of segmenting the phalanx bones from CT images. Employing automated methods such as the ANN for segmentation, eliminates the likelihood of rater drift and inter-rater variability. Automated methods also decrease the amount of time and manual effort required to extract the data of interest, thereby making the feasibility of patient-specific modeling a reality. (orig.)

  9. A New Tool for Nutrition App Quality Evaluation (AQEL): Development, Validation, and Reliability Testing.

    Science.gov (United States)

    DiFilippo, Kristen Nicole; Huang, Wenhao; Chapman-Novakofski, Karen M

    2017-10-27

    The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps' educational quality and technical functionality. Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no

  10. Reliability and validity in a nutshell.

    Science.gov (United States)

    Bannigan, Katrina; Watson, Roger

    2009-12-01

    To explore and explain the different concepts of reliability and validity as they are related to measurement instruments in social science and health care. There are different concepts contained in the terms reliability and validity and these are often explained poorly and there is often confusion between them. To develop some clarity about reliability and validity a conceptual framework was built based on the existing literature. The concepts of reliability, validity and utility are explored and explained. Reliability contains the concepts of internal consistency and stability and equivalence. Validity contains the concepts of content, face, criterion, concurrent, predictive, construct, convergent (and divergent), factorial and discriminant. In addition, for clinical practice and research, it is essential to establish the utility of a measurement instrument. To use measurement instruments appropriately in clinical practice, the extent to which they are reliable, valid and usable must be established.

  11. Development, content validity and test-retest reliability of the Lifelong Physical Activity Skills Battery in adolescents.

    Science.gov (United States)

    Hulteen, Ryan M; Barnett, Lisa M; Morgan, Philip J; Robinson, Leah E; Barton, Christian J; Wrotniak, Brian H; Lubans, David R

    2018-03-28

    Numerous skill batteries assess fundamental motor skill (e.g., kick, hop) competence. Few skill batteries examine lifelong physical activity skill competence (e.g., resistance training). This study aimed to develop and assess the content validity, test-retest and inter-rater reliability of the "Lifelong Physical Activity Skills Battery". Development of the skill battery occurred in three stages: i) systematic reviews of lifelong physical activity participation rates and existing motor skill assessment tools, ii) practitioner consultation and iii) research expert consultation. The final battery included eight skills: grapevine, golf swing, jog, push-up, squat, tennis forehand, upward dog and warrior I. Adolescents (28 boys, 29 girls; M = 15.8 years, SD = 0.4 years) completed the Lifelong Physical Activity Skills Battery on two occasions two weeks apart. The skill battery was highly reliable (ICC = 0.84, 95% CI = 0.72-0.90) with individual skill reliability scores ranging from moderate (warrior I; ICC = 0.56) to high (tennis forehand; ICC = 0.82). Typical error (4.0; 95% CI 3.4-5.0) and proportional bias (r = -0.21, p = .323) were low. This study has provided preliminary evidence for the content validity and reliability of the Lifelong Physical Activity Skills Battery in an adolescent population.

  12. The reliability and validity of using the urine dipstick test by patient self-assessment for urinary tract infection screening in spinal cord injury patients.

    Science.gov (United States)

    Duanngai, Krit; Sirasaporn, Patpiya; Ngaosinchai, Siriwan Surapaitoon

    2017-01-01

    The aim of this is to evaluate the reliability of the urine dipstick test by patients' self-assessment for urinary tract infection (UTI) screening and to determine the validity of urine dipstick test. Rehabilitation Department, Srinagarind Hospital, Thailand. A diagnostic study. This study compared the urine dipstick test (index test) with the National Institute on Disability and Rehabilitation Research (NIDRR) criteria (gold standard test) in spinal cord injury (SCI) patients. The urine dipstick test informed positive and negative results. Besides the NIDRR criteria classified as UTI and no UTI. The interrater reliability was measured in the sense of Kappa whereas the validity of urine dipstick test was reported in terms of sensitivity, specificity, positive likelihood ratio (LR) (+LR), negative LR (-LR), positive predictive value (PPV), and negative predictive value (NPV). Out of the 56 participants, the kappa of urine dipstick test for leukocyte esterase, nitrite, and combined leukocyte esterase and nitrite were 0.09, 0.21, and 0.52, respectively. The nitrite urine dipstick test showed the highest sensitivity (90%). The combined leukocyte esterase and nitrite urine dipstick test gave the highest specificity (87%), PPV (60%), NPV (93%), and +LR (5.63). The interrater reliability of combined leukocyte esterase and nitrite urine dipstick test was moderate agreement. The combined leukocyte esterase and nitrite urine dipstick test showed high level of both sensitivity and specificity. The combined leukocyte esterase and nitrite urine dipstick test should be promoted for patients' self-assessment for UTI screening in SCI patients.

  13. Reliability of the ECHOWS Tool for Assessment of Patient Interviewing Skills.

    Science.gov (United States)

    Boissonnault, Jill S; Evans, Kerrie; Tuttle, Neil; Hetzel, Scott J; Boissonnault, William G

    2016-04-01

    History taking is an important component of patient/client management. Assessment of student history-taking competency can be achieved via a standardized tool. The ECHOWS tool has been shown to be valid with modest intrarater reliability in a previous study but did not demonstrate sufficient power to definitively prove its stability. The purposes of this study were: (1) to assess the reliability of the ECHOWS tool for student assessment of patient interviewing skills and (2) to determine whether the tool discerns between novice and experienced skill levels. A reliability and construct validity assessment was conducted. Three faculty members from the United States and Australia scored videotaped histories from standardized patients taken by students and experienced clinicians from each of these countries. The tapes were scored twice, 3 to 6 weeks apart. Reliability was assessed using interclass correlation coefficients (ICCs) and repeated measures. Analysis of variance models assessed the ability of the tool to discern between novice and experienced skill levels. The ECHOWS tool showed excellent intrarater reliability (ICC [3,1]=.74-.89) and good interrater reliability (ICC [2,1]=.55) as a whole. The summary of performance (S) section showed poor interrater reliability (ICC [2,1]=.27). There was no statistical difference in performance on the tool between novice and experienced clinicians. A possible ceiling effect may occur when standardized patients are not coached to provide complex and obtuse responses to interviewer questions. Variation in familiarity with the ECHOWS tool and in use of the online training may have influenced scoring of the S section. The ECHOWS tool demonstrates excellent intrarater reliability and moderate interrater reliability. Sufficient training with the tool prior to student assessment is recommended. The S section must evolve in order to provide a more discerning measure of interviewing skills. © 2016 American Physical Therapy

  14. The Americleft Speech Project: A Training and Reliability Study.

    Science.gov (United States)

    Chapman, Kathy L; Baylis, Adriane; Trost-Cardamone, Judith; Cordero, Kelly Nett; Dixon, Angela; Dobbelsteyn, Cindy; Thurmes, Anna; Wilson, Kristina; Harding-Bell, Anne; Sweeney, Triona; Stoddard, Gregory; Sell, Debbie

    2016-01-01

    To describe the results of two reliability studies and to assess the effect of training on interrater reliability scores. The first study (1) examined interrater and intrarater reliability scores (weighted and unweighted kappas) and (2) compared interrater reliability scores before and after training on the use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A) with British English-speaking children. The second study examined interrater and intrarater reliability on a modified version of the CAPS-A (CAPS-A Americleft Modification) with American and Canadian English-speaking children. Finally, comparisons were made between the interrater and intrarater reliability scores obtained for Study 1 and Study 2. The participants were speech-language pathologists from the Americleft Speech Project. In Study 1, interrater reliability scores improved for 6 of the 13 parameters following training on the CAPS-A protocol. Comparison of the reliability results for the two studies indicated lower scores for Study 2 compared with Study 1. However, this appeared to be an artifact of the kappa statistic that occurred due to insufficient variability in the reliability samples for Study 2. When percent agreement scores were also calculated, the ratings appeared similar across Study 1 and Study 2. The findings of this study suggested that improvements in interrater reliability could be obtained following a program of systematic training. However, improvements were not uniform across all parameters. Acceptable levels of reliability were achieved for those parameters most important for evaluation of velopharyngeal function.

  15. Reliability of the Balance Evaluation Systems Test (BESTest) and BESTest sections for adults with hemiparesis

    Science.gov (United States)

    Rodrigues, Letícia C.; Marques, Aline P.; Barros, Paula B.; Michaelsen, Stella M.

    2014-01-01

    BACKGROUND: The Balance Evaluation Systems Test (BESTest) was recently created to allow the development of treatments according to the specific balance system affected in each patient. The Brazilian version of the BESTest has not been specifically tested after stroke. OBJECTIVE: To evaluate the intra- and inter-rater reliability and concurrent and convergent validity of the total score of the BESTest and BESTest sections for adults with hemiparesis after stroke. METHOD: The study included 16 subjects (61.1±7.5 years) with chronic hemiparesis (54.5±43.5 months after stroke). The BESTest was administered by two raters in the same week and one of the raters repeated the test after a one-week interval. Intraclass correlation coefficient (ICC) was calculated to assess intra- and interrater reliability. Concurrent validity with the Berg Balance Scale (BBS) and convergent validity with the Activities-specific Balance Confidence scale (ABC-Brazil) were assessed using Pearson's correlation coefficient. RESULTS: Both the BESTest total score (ICC=0.98) and the BESTest sections (ICC between 0.85 and 0.96) have excellent intrarater reliability. Interrater reliability for the total score was excellent (ICC=0.93) and, for the sections, it ranged between 0.71 and 0.94. The correlation coefficient between the BESTest and the BBS and ABC-Brazil were 0.78 and 0.59, respectively. CONCLUSIONS: The Brazilian version of the BESTest demonstrated adequate reliability when measured by sections and could identify what balance system was affected in patients after stroke. Concurrent validity was excellent with the BBS total score and good to excellent with the sections. The total scores but not the sections present adequate convergent validity with the ABC-Brazil. However, other psychometric properties should be further investigated. PMID:25003281

  16. Reliability and Validity of the Clinical Dementia Rating for Community-Living Elderly Subjects without an Informant

    Directory of Open Access Journals (Sweden)

    Ma Shwe Zin Nyunt

    2013-10-01

    Full Text Available Background: The Clinical Dementia Rating (CDR scale is widely used to assess cognitive impairment in Alzheimer's disease. It requires collateral information from a reliable informant who is not available in many instances. We adapted the original CDR scale for use with elderly subjects without an informant (CDR-NI and evaluated its reliability and validity for assessing mild cognitive impairment (MCI and dementia among community-dwelling elderly subjects. Method: At two consecutive visits 1 week apart, nurses trained in CDR assessment interviewed, observed and rated cognitive and functional performance according to a protocol in 90 elderly subjects with suboptimal cognitive performance [Mini-Mental State Examination (MMSE Results: The CDR-NI scores (0, 0.5, 1 showed good internal consistency (Crohnbach's a 0.83-0.84, inter-rater reliability (κ 0.77-1.00 for six domains and 0.95 for global rating and test-retest reliability (κ 0.75-1.00 for six domains and 0.80 for global rating, good agreement (κ 0.79 with the clinical assessment status of MCI (n = 37 and dementia (n = 4 and significant differences in the mean scores for MMSE, MOCA and Instrumental Activities of Daily Living (ANOVA global p Conclusion: Owing to the protocol of the interviews, assessments and structured observations gathered during the two visits, CDR-NI provides valid and reliable assessment of MCI and dementia in community-living elderly subjects without an informant.

  17. Examining the interrater reliability of the Hare Psychopathy Checklist-Revised across a large sample of trained raters.

    Science.gov (United States)

    Blais, Julie; Forth, Adelle E; Hare, Robert D

    2017-06-01

    The goal of the current study was to assess the interrater reliability of the Psychopathy Checklist-Revised (PCL-R) among a large sample of trained raters (N = 280). All raters completed PCL-R training at some point between 1989 and 2012 and subsequently provided complete coding for the same 6 practice cases. Overall, 3 major conclusions can be drawn from the results: (a) reliability of individual PCL-R items largely fell below any appropriate standards while the estimates for Total PCL-R scores and factor scores were good (but not excellent); (b) the cases representing individuals with high psychopathy scores showed better reliability than did the cases of individuals in the moderate to low PCL-R score range; and (c) there was a high degree of variability among raters; however, rater specific differences had no consistent effect on scoring the PCL-R. Therefore, despite low reliability estimates for individual items, Total scores and factor scores can be reliably scored among trained raters. We temper these conclusions by noting that scoring standardized videotaped case studies does not allow the rater to interact directly with the offender. Real-world PCL-R assessments typically involve a face-to-face interview and much more extensive collateral information. We offer recommendations for new web-based training procedures. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  18. Developing and Validating the Communication Function Classification System for Individuals with Cerebral Palsy

    Science.gov (United States)

    Hidecker, Mary Jo Cooley; Paneth, Nigel; Rosenbaum, Peter L.; Kent, Raymond D.; Lillie, Janet; Eulenberg, John B.; Chester, Ken, Jr.; Johnson, Brenda; Michalsen, Lauren; Evatt, Morgan; Taylor, Kara

    2011-01-01

    Aim: The purpose of this study was to create and validate the Communication Function Classification System (CFCS) for children with cerebral palsy (CP), for use by a wide variety of individuals who are interested in CP. This paper reports the content validity, interrater reliability, and test-retest reliability of the CFCS for children with CP.…

  19. Reliability of the International Spinal Cord Injury Musculoskeletal Basic Data Set

    DEFF Research Database (Denmark)

    Baunsgaard, C B; Chhabra, H S; Harvey, L A

    2016-01-01

    STUDY DESIGN: Psychometric study. OBJECTIVES: To determine the intra- and inter-rater reliability and content validity of the International Spinal Cord Injury (SCI) Musculoskeletal Basic Data Set (ISCIMSBDS). SETTING: Four centers with one in each of the countries in Australia, England, India and...

  20. The reliability and validity of the standardized Mensendieck test in relation to disability in patients with chronic pain.

    Science.gov (United States)

    Keessen, Paul; Maaskant, Jolanda; Visser, Bart

    2018-08-01

    The standardized Mensendieck test (SMT) was developed to quantify posture, movement, gait, and respiration. In the hands of an experienced therapist, the SMT is proven to be a reliable tool. It is unclear whether posture, movement, gait, and respiration are related to the degree of functional disability in patients with chronic pain. The objective of this study was to assess the reliability and convergent validity of the SMT in a heterogeneous sample of 50 patients with chronic pain. Internal consistency was determined by Cronbach's α and interrater reliability by the intraclass correlation coefficient (ICC). Convergent validity was assessed by determining the Spearman rank correlation coefficient between the movement quality measured in the SMT and functional limitation measured on the disability rating index (DRI). The internal consistency was Cronbach's α 0.91. Substantial reliability was found for the items: movement (ICC = 0.68), gait (ICC = 0.69), sitting posture (ICC = 0.63), and respiration (ICC = 0.64). Insufficient reliability was found for standing posture (ICC = 0.23). A moderate correlation was found between average test score SMT and the DRI (r = -0.37) and respiration and DRI (r = -0.45). The SMT is a reasonably reliable tool to assess movement, gait, sitting posture, and respiration. None of the items in the domain standing posture has sufficient reliability. A thorough study of this domain should be considered. The results show little evidence for convergent validity. Several items of the SMT correlated moderately with functional limitation with the DRI. These items were global movement, hip flexion, pelvis rotation, and all respiration items.

  1. Intra- and inter-rater reliability of movement and palpation tests in patients with neck pain: A systematic review.

    Science.gov (United States)

    Jonsson, Anders; Rasmussen-Barr, Eva

    2018-03-01

    Neck pain is common and often becomes chronic. Various clinical tests of the cervical spine are used to direct and evaluate treatment. This systematic review aimed to identify studies examining the intra- and/or interrater reliability of tests used in clinical examination of patients with neck pain. A database search up to April 2016 was conducted in PubMed, CINAHL, and AMED. The Quality Appraisal of Reliability Studies Checklist (QAREL) was used to assess risk of bias. Eleven studies were included, comprising tests of active and passive movement and pain evaluating participants with ongoing neck pain. One study was assessed with a low risk of bias, three with medium risk, while the rest were assessed with high risk of bias. The results showed differing reliabilities for the included tests ranging from poor to almost perfect. In conclusion, active movement and pain for pain or mobility overall presented acceptable to very good reliability (Kappa >0.40); while passive intervertebral tests had lower Kappa values, suggesting poor reliability. It may be a coincidence that the studies indicating very good reliability tended to be of higher quality (low to moderate risk of bias), while studies finding poor reliability tended to be of lower quality (high risk of bias). Regardless, the current recommendation from this review would suggest the clinical use of tests with acceptable reliability and avoiding the use of tests that have been shown to not be reliable. Finally, it is critical that all future reliability studies are of higher quality with low risk of bias.

  2. Intra- and inter-rater reliabilities of measurement of ultrasound imaging for muscle thickness and pennation angle of tibialis anterior muscle in stroke patients.

    Science.gov (United States)

    Cho, Ki Hun; Lee, Hwang Jae; Lee, Wan Hee

    2017-07-01

    Dysfunction of skeletal muscle has been commonly reported in stroke patients. The purpose of this study was to investigate the intra- and inter-rater reliabilities of measurement of ultrasound imaging (USI) for pennation angle (PA) and muscle thickness (MT) of tibialis anterior muscle in stroke patients. Thirty-four stroke patients (19 men) participated in this study. USI was used for measurement of PA and MT of the tibialis anterior muscles at rest and during maximum voluntary contraction (MVC). Two examiners acquired images from all participants during two separate testing sessions, seven days apart. Intra-class correlation coefficients (ICCs), confidence interval (CI), standard error of measurement, minimal detectable change, and Bland-Altman plots were used for estimation of reliability. In the intra-rater reliability between measures, for all variables (PA and MT of the paretic and non-paretic sides of tibialis anterior muscles at rest and during MVC), the ICCs ranged between 0.639 and 0.998 and the CI was within an acceptable range of 0.388-0.999. In inter-rater reliability between examiners for the two tests, for all variables, the ICCs ranged between 0.690 and 0.995 and the CI was within an acceptable range of 0.463-0.997. In addition, significant difference was observed between the paretic and non-paretic sides of the tibialis anterior muscle architecture (p stroke patients. In addition, objective and quantitative measurements of tibialis anterior muscle using USI may provide appropriate management for the walking recovery of stroke patients.

  3. Reliability of the Bulb Dynamometer for Assessing Grip Strength

    Directory of Open Access Journals (Sweden)

    Colleen Maher

    2018-04-01

    Full Text Available Background: Hand function is an overall indicator of health and is often measured using grip strength. Handheld dynamometry is the most common method of measuring grip strength. The purpose of this study was to determine the inter-rater and test-retest reliability, the reliability of one trial versus three trials, and the preliminary norms for a young adult population using the Baseline® Pneumatic Squeeze Bulb Dynamometer (30 psi. Methods: This study used a one-group methodological design. One hundred and three healthy adults (30 males and 73 females were recruited. Six measurements were collected for each hand per participant. The data was analyzed using Intraclass Correlation Coefficients (ICC two-way effects model (2,2 and paired-samples t-tests. Results: The ICC for inter-rater reliability ranged from 0.955 to 0.977. Conclusion: The results of this study suggest that the bulb dynamometer is a reliable tool to measure grip strength and should be further explored for reliable and valid use in diverse populations and as an alternative to the Jamar dynamometer.

  4. Reliability and validity of four alternative definitions of rapid-cycling bipolar disorder.

    Science.gov (United States)

    Maj, M; Pirozzi, R; Formicola, A M; Tortorella, A

    1999-09-01

    This study tested the reliability and validity of four definitions of rapid cycling. Two trained psychiatrists, using the Schedule for Affective Disorders and Schizophrenia, independently assessed 210 patients with bipolar disorder. They checked whether each patient met four definitions of rapid cycling: one consistent with DSM-IV criteria, one waiving criteria for duration of affective episodes, one waiving such criteria and requiring at least one switch from mania to depression or vice versa during the reference year, and one waiving duration criteria and requiring at least 8 weeks of fully symptomatic affective illness during the reference year. The interrater reliability was calculated by Cohen's kappa statistic. Patients who met each definition according to both psychiatrists were compared to those who did not meet any definition (nonrapid-cycling group) on demographic and clinical variables. All patients were followed up for 1 year. Kappa values were 0.93, 0.73, 0.75, and 0.80, respectively, for the four definitions of rapid cycling. The groups meeting the second and third definitions included significantly more female and bipolar II patients than did the nonrapid-cycling group. Those two groups also had the lowest proportion of patients with a favorable lithium prophylaxis outcome and the highest stability of the rapid-cycling pattern on follow-up. The four groups of rapid-cycling patients did not differ significantly among themselves on any of the assessed variables. The expression "rapid cycling" encompasses a spectrum of conditions. The DSM-IV definition, although quite reliable, covers only part of this spectrum, and the conditions that are excluded are very typical in terms of key validators and are relatively stable over time.

  5. Reliability of physical examination tests for the diagnosis of knee disorders: Evidence from a systematic review.

    Science.gov (United States)

    Décary, Simon; Ouellet, Philippe; Vendittoli, Pascal-André; Desmeules, François

    2016-12-01

    Clinicians often rely on physical examination tests to guide them in the diagnostic process of knee disorders. However, reliability of these tests is often overlooked and may influence the consistency of results and overall diagnostic validity. Therefore, the objective of this study was to systematically review evidence on the reliability of physical examination tests for the diagnosis of knee disorders. A structured literature search was conducted in databases up to January 2016. Included studies needed to report reliability measures of at least one physical test for any knee disorder. Methodological quality was evaluated using the QAREL checklist. A qualitative synthesis of the evidence was performed. Thirty-three studies were included with a mean QAREL score of 5.5 ± 0.5. Based on low to moderate quality evidence, the Thessaly test for meniscal injuries reached moderate inter-rater reliability (k = 0.54). Based on moderate to excellent quality evidence, the Lachman for anterior cruciate ligament injuries reached moderate to excellent inter-rater reliability (k = 0.42 to 0.81). Based on low to moderate quality evidence, the Tibiofemoral Crepitus, Joint Line and Patellofemoral Pain/Tenderness, Bony Enlargement and Joint Pain on Movement tests for knee osteoarthritis reached fair to excellent inter-rater reliability (k = 0.29 to 0.93). Based on low to moderate quality evidence, the Lateral Glide, Lateral Tilt, Lateral Pull and Quality of Movement tests for patellofemoral pain reached moderate to good inter-rater reliability (k = 0.49 to 0.73). Many physical tests appear to reach good inter-rater reliability, but this is based on low-quality and conflicting evidence. High-quality research is required to evaluate the reliability of knee physical examination tests. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Standards Performance Continuum: Development and Validation of a Measure of Effective Pedagogy.

    Science.gov (United States)

    Doherty, R. William; Hilberg, R. Soleste; Epaloose, Georgia; Tharp, Roland G.

    2002-01-01

    Describes the development and validation of the Standards Performance Continuum (SPC) for assessing teacher performance of the Standards for Effective Pedagogy. Three studies involving Florida, California, and New Mexico public school teachers provided evidence of inter-rater reliability, concurrent validity, and criterion-related validity…

  7. Assessment of the nursing care product (APROCENF): a reliability and construct validity study.

    Science.gov (United States)

    Cucolo, Danielle Fabiana; Perroca, Márcia Galan

    2017-04-06

    to verify the reliability and construct validity estimates of the "Assessment of nursing care product" scale (APROCENF) and its applicability. this validation study included a sample of 40 (inter-rater reliability) and 172 (construct validity) assessments performed by nurses at the end of the work shift at nine inpatient services of a teaching hospital in the Brazilian Southeast. The data were collected between February and September/2014 with interruptions. Cronbach's alpha and Spearman's correlation coefficients were calculated, as well as the intraclass correlation and the weighted kappa index (inter-rater reliability). Exploratory factor analysis was used with principal component extraction and varimax rotation (construct validity). the internal consistency revealed an alpha coefficient of 0.85, item-item correlation ranging between 0.13 and 0.61 and item-total correlation between 0.43 and 0.69. Inter-rater equivalence was obtained and all items evidenced significant factor loadings. this research evidenced the reliability and construct validity of the scale to assess the nursing care product. Its application in nursing practice permits identifying improvements needed in the production process, contributing to management and care decisions. verificar as estimativas de confiabilidade e validade de construto da escala "Avaliação do produto do cuidar em enfermagem" (APROCENF) e sua aplicabilidade. este estudo de validação incluiu em sua amostra 40 (confiabilidade interavaliadores) e 172 (validade de construto) avaliações realizadas por enfermeiros ao final do turno de trabalho em nove unidades de internação de um hospital universitário do sudeste brasileiro. A coleta de dados ocorreu entre fevereiro e setembro de 2014 de forma interrupta. Foram calculados os coeficientes alfa de Cronbach e correlação de Spearman (consistência interna), a correlação intraclasse e Kappa ponderado (confiabilidade interavaliadores) e a análise fatorial exploratória foi

  8. Development of a Standardized Kalamazoo Communication Skills Assessment Tool for Radiologists: Validation, Multisource Reliability, and Lessons Learned.

    Science.gov (United States)

    Brown, Stephen D; Rider, Elizabeth A; Jamieson, Katherine; Meyer, Elaine C; Callahan, Michael J; DeBenedectis, Carolynn M; Bixby, Sarah D; Walters, Michele; Forman, Sara F; Varrin, Pamela H; Forbes, Peter; Roussin, Christopher J

    2017-08-01

    The purpose of this study was to develop and test a standardized communication skills assessment instrument for radiology. The Delphi method was used to validate the Kalamazoo Communication Skills Assessment instrument for radiology by revising and achieving consensus on the 43 items of the preexisting instrument among an interdisciplinary team of experts consisting of five radiologists and four nonradiologists (two men, seven women). Reviewers assessed the applicability of the instrument to evaluation of conversations between radiology trainees and trained actors portraying concerned parents in enactments about bad news, radiation risks, and diagnostic errors that were video recorded during a communication workshop. Interrater reliability was assessed by use of the revised instrument to rate a series of enactments between trainees and actors video recorded in a hospital-based simulator center. Eight raters evaluated each of seven different video-recorded interactions between physicians and parent-actors. The final instrument contained 43 items. After three review rounds, 42 of 43 (98%) items had an average rating of relevant or very relevant for bad news conversations. All items were rated as relevant or very relevant for conversations about error disclosure and radiation risk. Reliability and rater agreement measures were moderate. The intraclass correlation coefficient range was 0.07-0.58; mean, 0.30; SD, 0.13; and median, 0.30. The range of weighted kappa values was 0.03-0.47; mean, 0.23; SD, 0.12; and median, 0.22. Ratings varied significantly among conversations (χ 2 6 = 1186; p communication skills assessment instrument is highly relevant for radiology, having moderate interrater reliability. These findings have important implications for assessing the relational competencies of radiology trainees.

  9. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs.

    Science.gov (United States)

    Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne

    2014-01-01

    This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent-teacher and 19 mother-father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent-teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother-father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings.

  10. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs

    Science.gov (United States)

    Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne

    2014-01-01

    This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent–teacher and 19 mother–father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent–teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother–father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings. PMID:24994985

  11. Anxiety Disorders Interview Schedule – Autism Addendum: Reliability and Validity in Children with Autism Spectrum Disorder

    Science.gov (United States)

    Kerns, Connor Morrow; Renno, Patricia; Kendall, Philip C.; Wood, Jeffrey J.; Storch, Eric A.

    2017-01-01

    Objective Assessing anxiety in autism spectrum disorder (ASD) is inherently challenging due to overlapping (e.g., social avoidance) and ambiguous symptoms (e.g., fears of change). An ASD addendum to the Anxiety Disorders Interview Schedule–Child/Parent, Parent Version (ADIS/ASA) was developed to provide a systematic approach for differentiating traditional anxiety disorders from symptoms of ASD and more ambiguous, ASD-related anxiety symptoms. Method Inter-rater reliability and convergent and discriminant validity were examined in a sample of 69 youth with ASD (8–13 years, 75% male, IQ:68–143) seeking treatment for anxiety. The parents of participants completed the ADIS/ASA and a battery of behavioral measures. A second rater independently observed and scored recordings of the original interviews. Results Findings suggest reliable measurement of comorbid (ICC=0.85–0.98; κ =0.67–0.91) as well as ambiguous anxiety-like symptoms (ICC=0.87–95, κ=0.77–0.90) in children with ASD. Convergent and discriminant validity were supported for the traditional anxiety symptoms on the ADIS/ASA, whereas convergent and discriminant validity were partially supported for the ambiguous anxiety-like symptoms. Conclusions Results provide evidence for the reliability and validity of the ADIS/ASA as a measure of traditional anxiety categories in youth with ASD, with partial support for the validity of the ambiguous anxiety-like categories. Unlike other measures, the ADIS/ASA differentiates comorbid anxiety disorders from overlapping and ambiguous anxiety-like symptoms in ASD, allowing for more precise measurement and clinical conceptualization. Ambiguous anxiety-like symptoms appear phenomenologically distinct from comorbid anxiety disorders and may reflect either symptoms of ASD or a novel variant of anxiety in ASD. PMID:27925775

  12. Spanish translation, cross-cultural adaptation, and validation of the Questionnaire for Diabetes-Related Foot Disease (Q-DFD).

    Science.gov (United States)

    Castillo-Tandazo, Wilson; Flores-Fortty, Adolfo; Feraud, Lourdes; Tettamanti, Daniel

    2013-01-01

    To translate, cross-culturally adapt, and validate the Questionnaire for Diabetes-Related Foot Disease (Q-DFD), originally created and validated in Australia, for its use in Spanish-speaking patients with diabetes mellitus. The translation and cross-cultural adaptation were based on international guidelines. The Spanish version of the survey was applied to a community-based (sample A) and a hospital clinic-based sample (samples B and C). Samples A and B were used to determine criterion and construct validity comparing the survey findings with clinical evaluation and medical records, respectively; while sample C was used to determine intra- and inter-rater reliability. After completing the rigorous translation process, only four items were considered problematic and required a new translation. In total, 127 patients were included in the validation study: 76 to determine criterion and construct validity and 41 to establish intra- and inter-rater reliability. For an overall diagnosis of diabetes-related foot disease, a substantial level of agreement was obtained when we compared the Q-DFD with the clinical assessment (kappa 0.77, sensitivity 80.4%, specificity 91.5%, positive likelihood ratio [LR+] 9.46, negative likelihood ratio [LR-] 0.21); while an almost perfect level of agreement was obtained when it was compared with medical records (kappa 0.88, sensitivity 87%, specificity 97%, LR+ 29.0, LR- 0.13). Survey reliability showed substantial levels of agreement, with kappa scores of 0.63 and 0.73 for intra- and inter-rater reliability, respectively. The translated and cross-culturally adapted Q-DFD showed good psychometric properties (validity, reproducibility, and reliability) that allow its use in Spanish-speaking diabetic populations.

  13. Reproducibility of tender point examination in chronic low back pain patients as measured by intrarater and inter-rater reliability and agreement

    DEFF Research Database (Denmark)

    Jensen, Ole Kudsk; Callesen, Jacob; Nielsen, Merete Graakjaer

    2013-01-01

    back examination and return-to-work intervention, 43 and 39 patients, respectively (18 women, 46%) entered and completed the study. MAIN OUTCOME MEASURES: The reliability was estimated by the intraclass correlation coefficient (ICC), and agreement was calculated for up to ±3 TPs. Furthermore......, the smallest detectable difference was calculated. RESULTS: TP examination was performed twice by two consultants in rheumatology and rehabilitation at 20 min intervals and repeated 1 week later. Intrarater reliability in the more and less experienced rater was ICC 0.84 (95% CI 0.69 to 0.98) and 0.72 (95% CI 0.......49 to 0.95), respectively. The figures for inter-rater reliability were intermediate between these figures. In more than 70% of the cases, the raters agreed within ±3 TPs in both men and women and between test days. The smallest detectable difference between raters was 5, and for the more and less...

  14. Reliability, Construct Validity and Interpretability of the Brazilian version of the Rapid Upper Limb Assessment (RULA) and Strain Index (SI).

    Science.gov (United States)

    Valentim, Daniela Pereira; Sato, Tatiana de Oliveira; Comper, Maria Luiza Caíres; Silva, Anderson Martins da; Boas, Cristiana Villas; Padula, Rosimeire Simprini

    There are very few observational methods for analysis of biomechanical exposure available in Brazilian-Portuguese. This study aimed to cross-culturally adapt and test the measurement properties of the Rapid Upper Limb Assessment (RULA) and Strain Index (SI). The cross-cultural adaptation and measurement properties test were established according to Beaton et al. and COSMIN guidelines, respectively. Several tasks that required static posture and/or repetitive motion of upper limbs were evaluated (n>100). The intra-raters' reliability for the RULA ranged from poor to almost perfect (k: 0.00-0.93), and SI from poor to excellent (ICC 2.1 : 0.05-0.99). The inter-raters' reliability was very poor for RULA (k: -0.12 to 0.13) and ranged from very poor to moderate for SI (ICC 2.1 : 0.00-0.53). The agreement was good for RULA (75-100% intra-raters, and 42.24-100% inter-raters) and to SI (EPM: -1.03% to 1.97%; intra-raters, and -0.17% to 1.51% inter-raters). The internal consistency was appropriate for RULA (α=0.88), and low for SI (α=0.65). Moderate construct validity were observed between RULA and SI, in wrist/hand-wrist posture (rho: 0.61) and strength/intensity of exertion (rho: 0.39). The adapted versions of the RULA and SI presented semantic and cultural equivalence for the Brazilian Portuguese. The RULA and SI had reliability estimates ranged from very poor to almost perfect. The internal consistency for RULA was better than the SI. The correlation between methods was moderate only of muscle request/movement repetition. Previous training is mandatory to use of observations methods for biomechanical exposure assessment, although it does not guarantee good reproducibility of these measures. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.

  15. A preliminary examination of the validity and reliability of a new brief rating scale for symptom domains of psychosis: Brief Evaluation of Psychosis Symptom Domains (BE-PSD).

    Science.gov (United States)

    Takeuchi, Hiroyoshi; Fervaha, Gagan; Lee, Jimmy; Agid, Ofer; Remington, Gary

    2016-09-01

    Brief assessments have the potential to be widely adopted as outcome measures in research but also routine clinical practice. Existing brief rating scales that assess symptoms of schizophrenia or psychosis have a number of limitations including inability to capture five symptom domains of psychosis and a lack of clearly defined operational anchor points for scoring. We developed a new brief rating scale for five symptom domains of psychosis with clearly defined operational anchor points - the Brief Evaluation of Psychosis Symptom Domains (BE-PSD). To examine the psychometric properties of the BE-PSD, fifty patients with schizophrenia or schizoaffective disorder were included in this preliminary cross-sectional study. To test the convergent and discriminant validity of the BE-PSD, correlational analyses were employed using the consensus Positive and Negative Syndrome Scale (PANSS) five-factor model. To examine the inter-rater reliability of the BE-PSD, single measures intraclass correlation coefficients (ICCs) were calculated for 11 patients. The BE-PSD domain scores demonstrated high convergent validity with the corresponding PANSS factor score (rs = 0.81-0.93) as well as good discriminant validity, as evidenced by lower correlations with the other PANSS factors (rs = 0.23-0.62). The BE-PSD also demonstrated excellent inter-rater reliability for each of the domain scores and the total scores (ICC(2,1) = 0.79-0.96). The present preliminary study found the BE-PSD measure to be valid and reliable; however, further studies are needed to establish the psychometric properties of the BE-PSD because of the limitations such as the small sample size and lacking data on test-retest reliability or sensitivity to change. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. A Systematic Review of the Reliability and Validity of Behavioural Tests Used to Assess Behavioural Characteristics Important in Working Dogs.

    Science.gov (United States)

    Brady, Karen; Cracknell, Nina; Zulch, Helen; Mills, Daniel Simon

    2018-01-01

    Working dogs are selected based on predictions from tests that they will be able to perform specific tasks in often challenging environments. However, withdrawal from service in working dogs is still a big problem, bringing into question the reliability of the selection tests used to make these predictions. A systematic review was undertaken aimed at bringing together available information on the reliability and predictive validity of the assessment of behavioural characteristics used with working dogs to establish the quality of selection tests currently available for use to predict success in working dogs. The search procedures resulted in 16 papers meeting the criteria for inclusion. A large range of behaviour tests and parameters were used in the identified papers, and so behaviour tests and their underpinning constructs were grouped on the basis of their relationship with positive core affect (willingness to work, human-directed social behaviour, object-directed play tendencies) and negative core affect (human-directed aggression, approach withdrawal tendencies, sensitivity to aversives). We then examined the papers for reports of inter-rater reliability, within-session intra-rater reliability, test-retest validity and predictive validity. The review revealed a widespread lack of information relating to the reliability and validity of measures to assess behaviour and inconsistencies in terminologies, study parameters and indices of success. There is a need to standardise the reporting of these aspects of behavioural tests in order to improve the knowledge base of what characteristics are predictive of optimal performance in working dog roles, improving selection processes and reducing working dog redundancy. We suggest the use of a framework based on explaining the direct or indirect relationship of the test with core affect.

  17. Clinical global impression of cognition in schizophrenia (CGI-CogS): reliability and validity of a co-primary measure of cognition.

    Science.gov (United States)

    Ventura, Joseph; Cienfuegos, Angel; Boxer, Oren; Bilder, Robert

    2008-11-01

    Cognitive deficits are core features of schizophrenia that have been associated reliably with functional outcomes and now are a focus of treatment research. New rating scales are needed to complement current psychometric testing procedures, both to enable wider clinical use, and to serve as endpoints in clinical trials. Subjects were 35 schizophrenia patient-and-caregiver pairs recruited from the UCLA and West Los Angeles VA Outpatient Psychiatry Departments. Participants were assessed with the Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS), an interview-based rating scale of cognitive functioning, on 3 occasions (baseline, 1 month, and 3 months). A computerized neurocognitive battery (Cogtest), an assessment of functioning, and symptom measures were administered at two occasions (baseline and one month). The CGI-CogS ratings generally showed a high level of internal consistency (Cronbach's alpha=.69 to .96), adequate levels of inter-rater reliability (ICC's=.71 to .80), and high test-retest stability (ICC's=.92 to .95). Correlations of caregiver and rater global (but not "patient only rating") CGI-CogS ratings with neurocognitive performance were in the moderate range (r's=-.27 to -.48), while most of the correlations with functional outcome were moderate to high (r's=-.41 to -.72). In fact, the CGI-CogS ratings were significantly more correlated with Social Functioning than were objective neurocognitive test scores (p=.02) and showed a trend in the same direction for predicting Instrumental Functioning (p=.06). We found moderate correlations between CGI-CogS global ratings and PANSS positive (r's=.36 to .49) and SANS negative symptoms (r=.41 to .61), but not with BPRS depression (r's=.11 to .13). An interview-based measure of cognition demonstrated high internal consistency, good inter-rater reliability, and high test-retest reliability. Caregiver ratings appear to add important clinical information over patient-only ratings. The CGI

  18. Korean Version of the Delirium Rating Scale-Revised-98: Reliability and Validity

    Science.gov (United States)

    Ryu, Jian; Lee, Jinyoung; Kim, Hwi-Jung; Shin, Im Hee; Kim, Jeong-Lan; Trzepacz, Paula T.

    2011-01-01

    Objective The aims of the present study were 1) to standardize the validity and reliability of the Korean version of Delirium Rating Scale-Revised-98 (DRS-R98-K) and 2) to establish the optimum cut-off value, sensitivity, and specificity for discriminating delirium from other non-delirious psychiatric conditions. Methods Using DSM-IV criteria, 157 subjects (69 delirium, 29 dementia, 32 schizophrenia, and 27 other psychiatric patients) were enrolled. Subjects were evaluated using DRS-R98-K, DRS-K, Mini-Mental State Examination (MMSE-K), and Clinical Global Impression-Severity (CGI-S) scale. Results DRS-R98-K total and severity scores showed high correlations with DRS-K. They were significantly different across all groups (p=0.000). However, neither MMSE-K nor CGI-S distinguished delirium from dementia. All DRS-R98-K diagnostic items (#14-16) and items #1 and 2 significantly discriminated delirium from dementia. Cronbach's alpha coefficient revealed high internal consistency for DRS-R98-K total (r=0.91) and severity (r=0.89) scales. Interrater reliability (ICC between 0.96 and 1) was very high. Using receiver operating characteristic analysis, the area under the curve of DRS-R98-K total score was 0.948 between the delirium group and all other groups and 0.873 between the delirium and dementia groups. The best cut-off scores in DRS-R98-K total score were 18.5 and 19.5 between the delirium and the other three groups and 20.5 between the delirium and dementia groups. Conclusion We demonstrated that DRS-R98-K is a valid and reliable instrument for assessing delirium severity and diagnosis and discriminating delirium from dementia and other psychiatric disorders in Korean patients. PMID:21519534

  19. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs

    Directory of Open Access Journals (Sweden)

    Margarita eStolarova

    2014-06-01

    Full Text Available This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire deve-loped for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent-teacher and 19 mother-father pairs collected for two-year-old children (12 bilingual are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC. Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent-teacher ratings of children’s early vocabulary can achieve agreement and correlation comparable to those of mother-father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters’ agreement. We conclude that future reports of agree-ment, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings.

  20. Validity and Reliability of Clinical Examination in the Diagnosis of Myofascial Pain Syndrome and Myofascial Trigger Points in Upper Quarter Muscles.

    Science.gov (United States)

    Mayoral Del Moral, Orlando; Torres Lacomba, María; Russell, I Jon; Sánchez Méndez, Óscar; Sánchez Sánchez, Beatriz

    2017-12-15

    To determine whether two independent examiners can agree on a diagnosis of myofascial pain syndrome (MPS). To evaluate interexaminer reliability in identifying myofascial trigger points in upper quarter muscles. To evaluate the reliability of clinical diagnostic criteria for the diagnosis of MPS. To evaluate the validity of clinical diagnostic criteria for the diagnosis of MPS. Validity and reliability study. Provincial Hospital. Toledo, Spain. Twenty myofascial pain syndrome patients and 20 healthy, normal control subjects, enrolled by a trained and experienced examiner. Ten bilateral muscles from the upper quarter were evaluated by two experienced examiners. The second examiner was blinded to the diagnosis group. The MPS diagnosis required at least one muscle to have an active myofascial trigger point. Three to four days separated the two examinations. The primary outcome measure was the frequency with which the two examiners agreed on the classification of the subjects as patients or as healthy controls. The kappa statistic (K) was used to determine the level of agreement between both examinations, interpreted as very good (0.81-1.00), good (0.61-0.80), moderate (0.41-0.60), fair (0.21-0.40), or poor (≤0.20). Interexaminer reliability for identifying subjects with MPS was very good (K = 1.0). Interexaminer reliability for identifying muscles leading to a diagnosis of MPS was also very good (K = 0.81). Sensitivity and specificity showed high values for most examination tests in all muscles, which confirms the validity of clinical diagnostic criteria in the diagnosis of MPS. Interrater reliability between two expert examiners identifying subjects with MPS involving upper quarter muscles exhibited substantial agreement. These results suggest that clinical criteria can be valid and reliable in the diagnosis of this condition. © 2017 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  1. The reliability, validity, sensitivity, specificity and predictive values of the Chinese version of the Rowland Universal Dementia Assessment Scale.

    Science.gov (United States)

    Chen, Chia-Wei; Chu, Hsin; Tsai, Chia-Fen; Yang, Hui-Ling; Tsai, Jui-Chen; Chung, Min-Huey; Liao, Yuan-Mei; Chi, Mei-Ju; Chou, Kuei-Ru

    2015-11-01

    The purpose of this study was to translate the Rowland Universal Dementia Assessment Scale into Chinese and to evaluate the psychometric properties (reliability and validity) and the diagnostic properties (sensitivity, specificity and predictive values) of the Chinese version of the Rowland Universal Dementia Assessment Scale. The accurate detection of early dementia requires screening tools with favourable cross-cultural linguistic and appropriate sensitivity, specificity, and predictive values, particularly for Chinese-speaking populations. This was a cross-sectional, descriptive study. Overall, 130 participants suspected to have cognitive impairment were enrolled in the study. A test-retest for determining reliability was scheduled four weeks after the initial test. Content validity was determined by five experts, whereas construct validity was established by using contrasted group technique. The participants' clinical diagnoses were used as the standard in calculating the sensitivity, specificity, positive predictive value and negative predictive value. The study revealed that the Chinese version of the Rowland Universal Dementia Assessment Scale exhibited a test-retest reliability of 0.90, an internal consistency reliability of 0.71, an inter-rater reliability (kappa value) of 0.88 and a content validity index of 0.97. Both the patients and healthy contrast group exhibited significant differences in their cognitive ability. The optimal cut-off points for the Chinese version of the Rowland Universal Dementia Assessment Scale in the test for mild cognitive impairment and dementia were 24 and 22, respectively; moreover, for these two conditions, the sensitivities of the scale were 0.79 and 0.76, the specificities were 0.91 and 0.81, the areas under the curve were 0.85 and 0.78, the positive predictive values were 0.99 and 0.83 and the negative predictive values were 0.96 and 0.91 respectively. The Chinese version of the Rowland Universal Dementia Assessment Scale

  2. Preliminary findings on the reliability and validity of the Cantonese Birmingham Cognitive Screen in patients with acute ischemic stroke.

    Science.gov (United States)

    Pan, Xiaoping; Chen, Haobo; Bickerton, Wai-Ling; Lau, Johnny King Lam; Kong, Anthony Pak Hin; Rotshtein, Pia; Guo, Aihua; Hu, Jianxi; Humphreys, Glyn W

    2015-01-01

    There are no currently effective cognitive assessment tools for patients who have suffered stroke in the People's Republic of China. The Birmingham Cognitive Screen (BCoS) has been shown to be a promising tool for revealing patients' poststroke cognitive deficits in specific domains, which facilitates more individually designed rehabilitation in the long run. Hence we examined the reliability and validity of a Cantonese version BCoS in patients with acute ischemic stroke, in Guangzhou. A total of 98 patients with acute ischemic stroke were assessed with the Cantonese version of the BCoS, and an additional 133 healthy individuals were recruited as controls. Apart from the BCoS, the patients also completed a number of external cognitive tests, including the Montreal Cognitive Assessment Test (MoCA), Mini Mental State Examination (MMSE), Albert's cancellation test, the Rey-Osterrieth Complex Figure Test, and six gesture matching tasks. Cutoff scores for failing each subtest, ie, deficits, were computed based on the performance of the controls. The validity and reliability of the Cantonese BCoS were examined, as well as interrater and test-retest reliability. We also compared the proportions of cases being classified as deficits in controlled attention, memory, character writing, and praxis, between patients with and without spoken language impairment. Analyses showed high test-retest reliability and agreement across independent raters on the qualitative aspects of measurement. Significant correlations were observed between the subtests of the Cantonese BCoS and the other external cognitive tests, providing evidence for convergent validity of the Cantonese BCoS. The screen was also able to generate measures of cognitive functions that were relatively uncontaminated by the presence of aphasia. This study suggests good reliability and validity of the Cantonese version of the BCoS. The Cantonese BCoS is a very promising tool for the detection of cognitive problems in

  3. Reliability of two social cognition tests: The combined stories test and the social knowledge test.

    Science.gov (United States)

    Thibaudeau, Élisabeth; Cellard, Caroline; Legendre, Maxime; Villeneuve, Karèle; Achim, Amélie M

    2018-04-01

    Deficits in social cognition are common in psychiatric disorders. Validated social cognition measures with good psychometric properties are necessary to assess and target social cognitive deficits. Two recent social cognition tests, the Combined Stories Test (COST) and the Social Knowledge Test (SKT), respectively assess theory of mind and social knowledge. Previous studies have shown good psychometric properties for these tests, but the test-retest reliability has never been documented. The aim of this study was to evaluate the test-retest reliability and the inter-rater reliability of the COST and the SKT. The COST and the SKT were administered twice to a group of forty-two healthy adults, with a delay of approximately four weeks between the assessments. Excellent test-retest reliability was observed for the COST, and a good test-retest reliability was observed for the SKT. There was no evidence of practice effect. Furthermore, an excellent inter-rater reliability was observed for both tests. This study shows a good reliability of the COST and the SKT that adds to the good validity previously reported for these two tests. These good psychometrics properties thus support that the COST and the SKT are adequate measures for the assessment of social cognition. Copyright © 2018. Published by Elsevier B.V.

  4. [French version of structured interviews for the Glasgow Outcome Scale: guidelines and first studies of validation].

    Science.gov (United States)

    Fayol, P; Carrière, H; Habonimana, D; Preux, P-M; Dumond, J-J

    2004-05-01

    The Glasgow Outcome Scale (GOS) is the most widely used outcome measure after traumatic brain injury. The GOS's reliability is improved by a structured interview. The two aims of this paper were to present a French version of the structured interview for the five-point Glasgow Outcome Scale and the extended eight-point GOS (GOSE) and to study their validity. The French version was developed using back-translation. Concurrent validity was studied by comparison with GOS/GOSE without structured interview. Inter-rater reliability was studied by comparison between assignments made by untrained head injury observers and trained head injury observers. Strength of agreement between ratings was assessed using the Kappa statistic. The French version and the guidelines for their use are given in the Appendix. Ratings were made for 25 brain injured patients and 25 relatives. Concurrent validity was good and inter-rater reliability was excellent. Using the structured interview for the GOS will give a more reliable assessment of the outcome of brain injured patients by French-speaking rehabilitation teams and a more precise assessment with the extended GOS.

  5. Autism detection in early childhood (ADEC): reliability and validity data for a Level 2 screening tool for autistic disorder.

    Science.gov (United States)

    Nah, Yong-Hwee; Young, Robyn L; Brewer, Neil; Berlingeri, Genna

    2014-03-01

    The Autism Detection in Early Childhood (ADEC; Young, 2007) was developed as a Level 2 clinician-administered autistic disorder (AD) screening tool that was time-efficient, suitable for children under 3 years, easy to administer, and suitable for persons with minimal training and experience with AD. A best estimate clinical Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV-TR; American Psychiatric Association, 2000) diagnosis of AD was made for 70 children using all available information and assessment results, except for the ADEC data. A screening study compared these children on the ADEC with 57 children with other developmental disorders and 64 typically developing children. Results indicated high internal consistency (α = .91). Interrater reliability and test-retest reliability of the ADEC were also adequate. ADEC scores reliably discriminated different diagnostic groups after controlling for nonverbal IQ and Vineland Adaptive Behavior Composite scores. Construct validity (using exploratory factor analysis) and concurrent validity using performance on the Autism Diagnostic Observation Schedule (Lord et al., 2000), the Autism Diagnostic Interview-Revised (Le Couteur, Lord, & Rutter, 2003), and DSM-IV-TR criteria were also demonstrated. Signal detection analysis identified the optimal ADEC cutoff score, with the ADEC identifying all children who had an AD (N = 70, sensitivity = 1.0) but overincluding children with other disabilities (N = 13, specificity ranging from .74 to .90). Together, the reliability and validity data indicate that the ADEC has potential to be established as a suitable and efficient screening tool for infants with AD. 2014 APA

  6. Validation of a new assessment tool for qualitative research articles

    DEFF Research Database (Denmark)

    Schou, Lone; Høstrup, Helle; Lyngsø, Elin

    2012-01-01

    schou l., høstrup h., lyngsø e.e., larsen s. & poulsen i. (2011) Validation of a new assessment tool for qualitative research articles. Journal of Advanced Nursing00(0), 000-000. doi: 10.1111/j.1365-2648.2011.05898.x ABSTRACT: Aim.  This paper presents the development and validation of a new...... assessment tool for qualitative research articles, which could assess trustworthiness of qualitative research articles as defined by Guba and at the same time aid clinicians in their assessment. Background.  There are more than 100 sets of proposals for quality criteria for qualitative research. However, we...... is the Danish acronym for Appraisal of Qualitative Studies. Phase 1 was to develop the tool based on a literature review and on consultation with qualitative researchers. Phase 2 was an inter-rater reliability test in which 40 health professionals participated. Phase 3 was an inter-rater reliability test among...

  7. Validation of the prosthetic esthetic index

    DEFF Research Database (Denmark)

    Özhayat, Esben B; Dannemand, Katrine

    2014-01-01

    OBJECTIVES: In order to diagnose impaired esthetics and evaluate treatments for these, it is crucial to evaluate all aspects of oral and prosthetic esthetics. No professionally administered index currently exists that sufficiently encompasses comprehensive prosthetic esthetics. This study aimed...... to validate a new comprehensive index, the Prosthetic Esthetic Index (PEI), for professional evaluation of esthetics in prosthodontic patients. MATERIAL AND METHODS: The content, criterion, and construct validity; the test-retest, inter-rater, and internal consistency reliability; and the sensitivity...... furthermore distinguish between participants and controls, indicating sufficient sensitivity. CONCLUSION: The PEI is considered a valid and reliable instrument involving sufficient aspects for assessment of the professionally evaluated esthetics in prosthodontic patients. CLINICAL RELEVANCE...

  8. Validation of a Spanish Version of the Lille Apathy Rating Scale for Parkinson’s Disease

    Directory of Open Access Journals (Sweden)

    Rocio García-Ramos

    2014-01-01

    Full Text Available Introduction. To date, no rating scales for detecting apathy in Parkinson’s disease (PD patients have been validated in Spanish. For this reason, the aim of this study was to validate a Spanish version of Lille apathy rating scale (LARS in a cohort of PD patients from Spain. Participants and Methods. 130 PD patients and 70 healthy controls were recruited to participate in the study. Apathy was measured using the Spanish version of LARS and the neuropsychiatric inventory (NPI. Reliability (internal consistency, test-retest, and interrater reliability and validity (construct, content, and criterion validity were measured. Results. Interrater reliability was 0.93. Cronbach’s α for LARS was 0.81. The test-retest correlation coefficient was 0.97. The correlation between LARS and NPI scores was 0.61. The optimal cutoff point under the ROC curve was -14, whereas the value derived from healthy controls was -11. The prevalence of apathy in our population tested by LARS was 42%. Conclusions. The Spanish version of LARS is a reliable and useful tool for diagnosing apathy in PD patients. Total LARS score is influenced by the presence of depression and cognitive impairment. However, both disorders are independent identities with respect to apathy. The satisfactory reliability and validity of the scale make it an appropriate instrument for screening and diagnosing apathy in clinical practice or for research purposes.

  9. Validation and reliability of a modified sphygmomanometer for the assessment of handgrip strength in Parkinson´s disease

    Directory of Open Access Journals (Sweden)

    Soraia M. Silva

    2015-04-01

    Full Text Available BACKGROUND: Handgrip strength is currently considered a predictor of overall muscle strength and functional capacity. Therefore, it is important to find reliable and affordable instruments for this analysis, such as the modified sphygmomanometer test (MST. OBJECTIVES: To assess the concurrent criterion validity of the MST, to compare the MST with the Jamar dynamometer, and to analyze the reproducibility (i.e. reliability and agreement of the MST in individuals with Parkinson's disease (PD. METHOD: The authors recruited 50 subjects, 24 with PD (65.5±6.2 years of age and 26 healthy elderly subjects (63.4±7.2 years of age. The handgrip strength was measured using the Jamar dynamometer and modified sphygmomanometer. The concurrent criterion validity was analyzed using Pearson's correlation coefficient and a simple linear regression test. The reproducibility of the MST was evaluated with the coefficient of intra-class correlation (ICC2,1, the standard error of measurement (SEM, the minimal detectable change (MDC, and the Bland-Altman plot. For all of the analyses, α≤0.05 was considered a risk. RESULTS: There was a significant correlation of moderate magnitude (r≥0.45 between the MST and the Jamar dynamometer. The MST had excellent reliability (ICC2,1≥0.7. The SEM and the MDC were adequate; however, the Bland-Altman plot indicated an unsatisfactory interrater agreement. CONCLUSIONS: The MST exhibited adequate validity and excellent reliability and is, therefore, suitable for monitoring the handgrip strength in PD. However, if the goal is to compare the measurements between examiners, the authors recommend that the data be interpreted with caution.

  10. Inter-Rater Reliability and Agreement of the 6-Minute Walk Test in Women With Hip Fracture

    DEFF Research Database (Denmark)

    Larsen, Camilla Marie; Overgaard, Jan; Tange Kristensen, Morten

    MWT in individuals with hip fractures. Methods: Two senior physiotherapy students independently examined (randomized order) a convenient sample of 20 participants; their assessments were separated by two days, and testing followed instructions from the American Thoracic Society(1). Hip pain...... was assessed with the Verbal Ranking Scale. Results: Participants (all women) with a mean (SD) age of 78.1 ± 5.9 years performed the test within a mean of 31.5 ± 5.8 days post-surgery; 10 had a cervical and 10 a trochanteric fracture. Excellent inter-rater reliability; ICC2.1 =0.92 (95% CI, 0.81 - 0...... = -0.196, P = 0.41). On the contrary, participants walked a mean of 21.7 ± 22.6 meters longer, at the second trial (P = 0.002). Participants with moderate hip fracture- related pain walked a shorter distance than those with no or light pain during the first test (P = 0.04), while this was not the case...

  11. Validation of a method for assessing resident physicians' quality improvement proposals.

    Science.gov (United States)

    Leenstra, James L; Beckman, Thomas J; Reed, Darcy A; Mundell, William C; Thomas, Kris G; Krajicek, Bryan J; Cha, Stephen S; Kolars, Joseph C; McDonald, Furman S

    2007-09-01

    Residency programs involve trainees in quality improvement (QI) projects to evaluate competency in systems-based practice and practice-based learning and improvement. Valid approaches to assess QI proposals are lacking. We developed an instrument for assessing resident QI proposals--the Quality Improvement Proposal Assessment Tool (QIPAT-7)-and determined its validity and reliability. QIPAT-7 content was initially obtained from a national panel of QI experts. Through an iterative process, the instrument was refined, pilot-tested, and revised. Seven raters used the instrument to assess 45 resident QI proposals. Principal factor analysis was used to explore the dimensionality of instrument scores. Cronbach's alpha and intraclass correlations were calculated to determine internal consistency and interrater reliability, respectively. QIPAT-7 items comprised a single factor (eigenvalue = 3.4) suggesting a single assessment dimension. Interrater reliability for each item (range 0.79 to 0.93) and internal consistency reliability among the items (Cronbach's alpha = 0.87) were high. This method for assessing resident physician QI proposals is supported by content and internal structure validity evidence. QIPAT-7 is a useful tool for assessing resident QI proposals. Future research should determine the reliability of QIPAT-7 scores in other residency and fellowship training programs. Correlations should also be made between assessment scores and criteria for QI proposal success such as implementation of QI proposals, resident scholarly productivity, and improved patient outcomes.

  12. The interrater reliability of rating non-exercise activity of inpatients with eating disorders using a visual analogue scale.

    Science.gov (United States)

    Mazloum, A; Johnston, M; Lundrigan, M; Birmingham, C L

    2008-12-01

    Non-exercise activity thermogenesis (NEAT) is the energy expended by body movement, other than sleeping, eating or sports-like activities. The obese have been reported to have a lower NEAT (walking, standing, and fidgeting) than controls. We hypothesize that an elevated NEAT could explain why some patients with anorexia nervosa are resistant to weight gain. To evaluate the interrater reliability of a rating of non-exercise activity of inpatients with eating disorders (ED) using a visual analogue scale (VAS). Health care providers were asked to rate the non-exercise activity of inpatients by marking a VAS. Eight patients were individually rated by 10 clinicians. Results were analyzed using the intraclass correlation coefficient (ICC) and Cohen's multi-rater kappa statistic (kappa). The ICC(3,k) was 0.257 (pexercise activity and physiological measurements should be used.

  13. Reliability and validity of the photogrammetry for scoliosis evaluation: a cross-sectional prospective study.

    Science.gov (United States)

    Saad, Karen Ruggeri; Colombo, Alexandra S; João, Silvia M Amado

    2009-01-01

    The purpose of this study was to investigate the reliability and validity of photogrammetry in measuring the lateral spinal inclination angles. Forty subjects (32 female and 8 males) with a mean age of 23.4 +/- 11.2 years had their scoliosis evaluated by radiographs of their trunk, determined by the Cobb angle method, and by photogrammetry. The statistical methods used included Cronbach alpha, Pearson/Spearman correlation coefficients, and regression analyses. The Cronbach alpha values showed that the photogrammetric measures showed high internal consistency, which indicated that the sample was bias free. The radiograph method showed to be more precise with intrarater reliabilities of 0.936, 0.975, and 0.945 for the thoracic, lumbar, and thoracolumbar curves, respectively, and interrater reliabilities of 0.942 and 0.879 for the angular measures of the thoracic and thoracolumbar segments, respectively. The regression analyses revealed a high determination coefficient although limited to the adjusted linear model between the radiographic and photographic measures. It was found that with more severe scoliosis, the lateral curve measures obtained with the photogrammetry were for the thoracic and lumbar regions (R = 0.619 and 0.551). The photogrammetric measures were found to be reproducible in this study and could be used as supplementary information to decrease the number of radiographs necessary for the monitoring of scoliosis.

  14. Inter-rater reliability of the German version of the Nurses' Global Assessment of Suicide Risk scale.

    Science.gov (United States)

    Kozel, Bernd; Grieser, Manuela; Abderhalden, Christoph; Cutcliffe, John R

    2016-10-01

    In comparison to the general population, the suicide rates of psychiatric inpatient populations in Germany and Switzerland are very high. An important preventive contribution to the lowering of the suicide rates in mental health care is to ensure that the risk of suicide of psychiatric inpatients is assessed as accurately as possible. While risk-assessment instruments can serve an important function in determining such risk, very few have been translated to German. Therefore, in the present study, we reported on the German version of Nurses' Global Assessment of Suicide Risk (NGASR) scale. After translating the original instrument into German and pretesting the German version, we tested the inter-rater reliability of the instrument. Twelve video case studies were evaluated by 13 raters with the NGASR scale in a 'laboratory' trial. In each case, the observer's agreement was calculated for the single items, the overall scale, the risk levels, and the sum scores. The statistical data analysis was conducted with kappa and AC1 statistics for dichotomous (items, scale) scales. A high-to-very high observers' agreement (AC1: 0.62-1.00, kappa: 0.00-1.00) was determined for 16 items of the German version of the NGASR scale. We conclude that the German version of the NGASR scale is a reliable instrument for evaluating risk factors for suicide. A reliable application in the clinical practise appears to be enhanced by training in the use of the instrument and the right implementation instructions. © 2016 Australian College of Mental Health Nurses Inc.

  15. The Myotonometer: Not a Valid Measurement Tool for Active Hamstring Musculotendinous Stiffness.

    Science.gov (United States)

    Pamukoff, Derek N; Bell, Sarah E; Ryan, Eric D; Blackburn, J Troy

    2016-05-01

    Hamstring musculotendinous stiffness (MTS) is associated with lower-extremity injury risk (ie, hamstring strain, anterior cruciate ligament injury) and is commonly assessed using the damped oscillatory technique. However, despite a preponderance of studies that measure MTS reliably in laboratory settings, there are no valid clinical measurement tools. A valid clinical measurement technique is needed to assess MTS and permit identification of individuals at heightened risk of injury and track rehabilitation progress. To determine the validity and reliability of the Myotonometer for measuring active hamstring MTS. Descriptive laboratory study. Laboratory. 33 healthy participants (15 men, age 21.33 ± 2.94 y, height 172.03 ± 16.36 cm, mass 74.21 ± 16.36 kg). Hamstring MTS was assessed using the damped oscillatory technique and the Myotonometer. Intraclass correlations were used to determine the intrasession, intersession, and interrater reliability of the Myotonometer. Criterion validity was assessed via Pearson product-moment correlation between MTS measures obtained from the Myotonometer and from the damped oscillatory technique. The Myotonometer demonstrated good intrasession (ICC3,1 = .807) and interrater reliability (ICC2,k = .830) and moderate intersession reliability (ICC2,k = .693). However, it did not provide a valid measurement of MTS compared with the damped oscillatory technique (r = .346, P = .061). The Myotonometer does not provide a valid measure of active hamstring MTS. Although the Myotonometer does not measure active MTS, it possesses good reliability and portability and could be used clinically to measure tissue compliance, muscle tone, or spasticity associated with multiple musculoskeletal disorders. Future research should focus on portable and clinically applicable tools to measure active hamstring MTS in efforts to prevent and monitor injuries.

  16. The development and validation of a custom built device for assessing frontal knee joint laxity.

    Science.gov (United States)

    Ismail, Shiek Abdullah; Simic, Milena; Clarke, Jillian L; Lopes, Thiago Jambo Alves; Pappas, Evangelos

    2017-12-01

    This study reports the development and validation of a quantitative technique of assessing frontal knee joint laxity through a custom built device named KLICP. The objectives of this study were to determine: (i) the intra- and inter-rater reliability and (ii) the validity of the device when compared to real time ultrasound. Twenty-five participants had their frontal knee joint laxity assessed by the KLICP, by manual varus/valgus tests and by ultrasound. Two raters independently assessed laxity manually by three repeated measurements, repeated at least 48h later. Results were validated by comparing them to the medial and lateral joint space opening measured by the ultrasound. Intraclass correlation coefficients and standard error of measurement reliability were calculated. Pearson's correlation coefficients were calculated to determine the correlation between the KLICP and the joint space. Intra-rater reliability (intra-session) for each rater was good on both sessions (0.91-0.98), intra-rater reliability (inter-sessions) was moderate to good (0.62-0.87), and inter-rater reliability (intra-session) was good (0.75-0.80). There is low agreement for intra-rater (inter-session) and for inter-rater (intra-session) reliability. The KLICP measurement has a significant positive fair to moderate correlation to the ultrasound measurement at the left (r: 0.61, p: 0.01) and right (r: 0.48, p: 0.02) knee in the valgus direction and at the left (r: 0.51, p: 0.01) and right (r: 0.39, p: 0.05) knee in the varus direction. There is low agreement between the KLICP and the RTU. Reliability and agreement was good only when measured for intra-rater, within session. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Validation of Land Cover Products Using Reliability Evaluation Methods

    Directory of Open Access Journals (Sweden)

    Wenzhong Shi

    2015-06-01

    Full Text Available Validation of land cover products is a fundamental task prior to data applications. Current validation schemes and methods are, however, suited only for assessing classification accuracy and disregard the reliability of land cover products. The reliability evaluation of land cover products should be undertaken to provide reliable land cover information. In addition, the lack of high-quality reference data often constrains validation and affects the reliability results of land cover products. This study proposes a validation schema to evaluate the reliability of land cover products, including two methods, namely, result reliability evaluation and process reliability evaluation. Result reliability evaluation computes the reliability of land cover products using seven reliability indicators. Process reliability evaluation analyzes the reliability propagation in the data production process to obtain the reliability of land cover products. Fuzzy fault tree analysis is introduced and improved in the reliability analysis of a data production process. Research results show that the proposed reliability evaluation scheme is reasonable and can be applied to validate land cover products. Through the analysis of the seven indicators of result reliability evaluation, more information on land cover can be obtained for strategic decision-making and planning, compared with traditional accuracy assessment methods. Process reliability evaluation without the need for reference data can facilitate the validation and reflect the change trends of reliabilities to some extent.

  18. Inter-rater reliability of postnatal ultrasound interpretation in infants with congenital hydronephrosis.

    Science.gov (United States)

    Vemulakonda, V M; Wilcox, D T; Torok, M R; Hou, A; Campbell, J B; Kempe, A

    2015-09-01

    The most common measurements of hydronephrosis are the anterior-posterior (AP) diameter and the Society for Fetal Urology (SFU) grading systems. To date, the inter-rater reliability (IRR) of these measures has not been compared in the postnatal period. The objectives of this study were to compare the IRR of the AP diameter and the SFU grading system in infants and to determine whether ultrasound findings other than pelvicalyceal dilation are associated with higher SFU grades. Initial postnatal ultrasounds of infants seen from February 1, 2011, to January 31, 2012, with a primary diagnosis of congenital hydronephrosis were included for review. Ultrasound images were de-identified and reviewed by four pediatric urologists. IRR was calculated using the intraclass correlation (ICC) measure. A paired t test was used to compare ICCs. Associations between SFU grade and other ultrasound findings were tested using Chi-square or Fisher's exact tests. A total of 112 kidneys in 56 patients were reviewed. IRR of the SFU grading system was high (right kidney ICC = 0.83, left kidney ICC = 0.85); however, IRR of AP diameter measurement was higher (right kidney ICC = 00.97, left kidney ICC = 0.98; p hydronephrosis on bivariable and multivariable analysis. The SFU grading system is associated with excellent IRR, although the AP diameter appears to have higher IRR. Physicians may consider ultrasound findings that are not explicitly included in the SFU system when assigning hydronephrosis grade, which may lead to variability in use of this classification system.

  19. Inter-Rater Reliability of Historical Data Collected by Non-Medical Research Assistants and Physicians in Patients with Acute Abdominal Pain

    Directory of Open Access Journals (Sweden)

    Mills, Angela M

    2009-02-01

    Full Text Available OBJECTIVES: In many academic emergency departments (ED, physicians are asked to record clinical data for research that may be time consuming and distracting from patient care. We hypothesized that non-medical research assistants (RAs could obtain historical information from patients with acute abdominal pain as accurately as physicians.METHODS: Prospective comparative study conducted in an academic ED of 29 RAs to 32 resident physicians (RPs to assess inter-rater reliability in obtaining historical information in abdominal pain patients. Historical features were independently recorded on standardized data forms by a RA and RP blinded to each others' answers. Discrepancies were resolved by a third person (RA who asked the patient to state the correct answer on a third questionnaire, constituting the "criterion standard." Inter-rater reliability was assessed using kappa statistics (kappa and percent crude agreement (CrA.RESULTS: Sixty-five patients were enrolled (mean age 43. Of 43 historical variables assessed, the median agreement was moderate (kappa 0.59 [Interquartile range 0.37-0.69]; CrA 85.9% and varied across data categories: initial pain location (kappa 0.61 [0.59-0.73]; CrA 87.7%, current pain location (kappa 0.60 [0.47-0.67]; CrA 82.8%, past medical history (kappa 0.60 [0.48-0.74]; CrA 93.8%, associated symptoms (kappa 0.38 [0.37-0.74]; CrA 87.7%, and aggravating/alleviating factors (kappa 0.09 [-0.01-0.21]; CrA 61.5%. When there was disagreement between the RP and the RA, the RA more often agreed with the criterion standard (64% [55-71%] than the RP (36% [29-45%].CONCLUSION: Non-medical research assistants who focus on clinical research are often more accurate than physicians, who may be distracted by patient care responsibilities, at obtaining historical information from ED patients with abdominal pain.

  20. Development and reliability testing of the Nordic Housing Enabler – an instrument for accessibility assessment of the physical housing

    DEFF Research Database (Denmark)

    Helle, Tina

    and adapted according to accessibility norms and guidelines for housing design in Sweden, Denmark, Iceland and Finland. This iterative process involved occupational therapists, architects, building engineers and professional translators, resulting in the Nordic Housing Enabler. For reliability testing...... serious deficits when it comes to accessibility. This study addresses development of a content valid cross-Nordic version of the Housing Enabler and investigation of inter-rater reliability, when used in occupational therapy practice. The instrument was translated from the original Swedish version......, the sample strategy and data collection procedures were the same in all countries. In total, twenty voluntary occupational therapists collected data from 106 cases by means of the Nordic Housing Enabler. Inter-rater reliability was calculated by means of percentage agreement and kappa statistics. Overall...

  1. Measuring the Value of New Drugs: Validity and Reliability of 4 Value Assessment Frameworks in the Oncology Setting.

    Science.gov (United States)

    Bentley, Tanya G K; Cohen, Joshua T; Elkin, Elena B; Huynh, Julie; Mukherjea, Arnab; Neville, Thanh H; Mei, Matthew; Copher, Ronda; Knoth, Russell; Popescu, Ioana; Lee, Jackie; Zambrano, Jenelle M; Broder, Michael S

    2017-06-01

    Several organizations have developed frameworks to systematically assess the value of new drugs. To evaluate the convergent validity and interrater reliability of 4 value frameworks to understand the extent to which these tools can facilitate value-based treatment decisions in oncology. Eight panelists used the American Society of Clinical Oncology (ASCO), European Society for Medical Oncology (ESMO), Institute for Clinical and Economic Review (ICER), and National Comprehensive Cancer Network (NCCN) frameworks to conduct value assessments of 15 drugs for advanced lung and breast cancers and castration-refractory prostate cancer. Panelists received instructions and published clinical data required to complete the assessments, assigning each drug a numeric or letter score. Kendall's Coefficient of Concordance for Ranks (Kendall's W) was used to measure convergent validity by cancer type among the 4 frameworks. Intraclass correlation coefficients (ICCs) were used to measure interrater reliability for each framework across cancers. Panelists were surveyed on their experiences. Kendall's W across all 4 frameworks for breast, lung, and prostate cancer drugs was 0.560 (P= 0.010), 0.562 (P = 0.010), and 0.920 (P fair to excellent, increasing with clinical benefit subdomain concordance and simplicity of drug trial data. Interrater reliability, highest for ASCO and ESMO, improved with clarity of instructions and specificity of score definitions. Continued use, analyses, and refinements of these frameworks will bring us closer to the ultimate goal of using value-based treatment decisions to improve patient care and outcomes. This work was funded by Eisai Inc. Copher and Knoth are employees of Eisai Inc. Bentley, Lee, Zambrano, and Broder are employees of Partnership for Health Analytic Research, a health services research company paid by Eisai Inc. to conduct this research. For this study, Cohen, Huynh, and Neville report fees from Partnership for Health Analytic Research

  2. Reliability of one-repetition maximum performance in people with chronic heart failure.

    Science.gov (United States)

    Ellis, Rachel; Holland, Anne E; Dodd, Karen; Shields, Nora

    2018-02-24

    Evaluate intra-rater and inter-rater reliability of the one-repetition maximum strength test in people with chronic heart failure. Intra-rater and inter-rater reliability study. A public tertiary hospital in northern metropolitan Melbourne. Twenty-four participants (nine female, mean age 71.8 ± 13.1 years) with mild to moderate heart failure of any aetiology. Lower limb strength was assessed by determining the maximum weight that could be lifted using a leg press. Intra-rater reliability was tested by one assessor on two separate occasions . Inter-rater reliability was tested by two assessors in random order. Intra-class correlation coefficients and 95% confidence intervals were calculated. Bland and Altman analyses were also conducted, including calculation of mean differences between measures ([Formula: see text]) and limits of agreement . Ten intra-rater and 21 inter-rater assessments were completed. Excellent intra-rater (intra-class correlation coefficient 2,1 0.96) and inter-rater (intra-class correlation coefficient 2,1 0.93) reliability was found. Intra-rater assessment showed less variability (mean difference 4.5 kg, limits of agreement -8.11 to 17.11 kg) than inter-rater agreement (mean difference -3.81 kg, limits of agreement -23.39 to 15.77 kg). One-repetition maximum determined using a leg press is a reliable measure in people with heart failure. Given its smaller limits of agreement, intra-rater testing is recommended. Implications for Rehabilitation Using a leg press to determine a one-repetition maximum we were able to demonstrate excellent inter-rater and intra-rater reliability using an intra-class correlation coefficient. The Bland and Altman levels of agreement were wide for inter-rater reliability and so we recommend using one assessor if measuring change in strength within an individual over time.

  3. The Use of Bayesian Networks to Assess the Quality of Evidence from Research Synthesis: 2. Inter-Rater Reliability and Comparison with Standard GRADE Assessment.

    Directory of Open Access Journals (Sweden)

    Alexis Llewellyn

    Full Text Available The grades of recommendation, assessment, development and evaluation (GRADE approach is widely implemented in systematic reviews, health technology assessment and guideline development organisations throughout the world. We have previously reported on the development of the Semi-Automated Quality Assessment Tool (SAQAT, which enables a semi-automated validity assessment based on GRADE criteria. The main advantage to our approach is the potential to improve inter-rater agreement of GRADE assessments particularly when used by less experienced researchers, because such judgements can be complex and challenging to apply without training. This is the first study examining the inter-rater agreement of the SAQAT.We conducted two studies to compare: a the inter-rater agreement of two researchers using the SAQAT independently on 28 meta-analyses and b the inter-rater agreement between a researcher using the SAQAT (who had no experience of using GRADE and an experienced member of the GRADE working group conducting a standard GRADE assessment on 15 meta-analyses.There was substantial agreement between independent researchers using the Quality Assessment Tool for all domains (for example, overall GRADE rating: weighted kappa 0.79; 95% CI 0.65 to 0.93. Comparison between the SAQAT and a standard GRADE assessment suggested that inconsistency was parameterised too conservatively by the SAQAT. Therefore the tool was amended. Following amendment we found fair-to-moderate agreement between the standard GRADE assessment and the SAQAT (for example, overall GRADE rating: weighted kappa 0.35; 95% CI 0.09 to 0.87.Despite a need for further research, the SAQAT may aid consistent application of GRADE, particularly by less experienced researchers.

  4. Validation of the Falls Efficacy Scale – International in a sample of Portuguese elderly

    Directory of Open Access Journals (Sweden)

    Cristina Maria Alves Marques-Vieira

    Full Text Available ABSTRACT Objective: to translate and adapt Falls Efficacy Scale – International (FES-I. To analyze the psychometric properties of the FES-I Portugal version. Method: psychometric study. Sample consisting of 170 elderly people residing in the Autonomous Region of Madeira. A two- part form was used (sociodemographic characterization and FES-I Portugal. The cross-cultural adaptation was performed and the following psychometric properties were evaluated: validity (construct, predictive, and discriminant, reliability (Cronbach’s alpha, and inter-rater reliability. Results: the results allow us to verify a dimension of less demanding physical activities and another of more demanding physical activities. The inter-rater reliability study was 0.62, with an interclass correlation coefficient of 0.859, for a 95% confidence interval. The internal consistency of the Portuguese version was 0.962. Conclusion: the validity and reliability of the FES-I Portugal are consistent with the original version and proved to be appropriate instruments for evaluating the “impaired walking” and “risk of falls” nursing diagnoses in the older people.

  5. Reliability and Validity of the Hip Stability Isometric Test (HipSIT): A New Method to Assess Hip Posterolateral Muscle Strength.

    Science.gov (United States)

    Almeida, Gabriel Peixoto Leão; das Neves Rodrigues, Helena Larissa; de Freitas, Bruno Wesley; de Paula Lima, Pedro Olavo

    2017-12-01

    Study Design Cross-sectional study. Background The Hip Stability Isometric Test (HipSIT) evaluates the strength of the hip posterolateral stabilizers in a position that favors greater activation of the gluteus maximus and gluteus medius and lower activation of the tensor fascia lata. Objectives To check the validity and reliability of the HipSIT and to evaluate the HipSIT in women with patellofemoral pain (PFP). Methods The HipSIT was evaluated with a handheld dynamometer. During testing, the participants were sidelying, with their legs positioned at 45° of hip flexion and 90° of knee flexion. Participants were instructed to raise the knee of the upper leg while keeping the upper and lower heels in contact. To establish reliability and validity, 49 women were tested with the HipSIT by 2 different evaluators on day 1, and then again 7 days later. The strength of the hip extensors, abductors, and external rotators was also evaluated. Twenty women with unilateral PFP were also evaluated. Results The HipSIT has excellent intrarater and interrater reliability. The standard error of measurement was 0.01 kgf/kg, and the minimal detectable change was 0.036 kgf/kg. The HipSIT showed good validity in isolated hip abduction, external rotation, and extension (Pstrength deficits in women with PFP. J Orthop Sports Phys Ther 2017;47(12):906-913. Epub 9 Oct 2017. doi:10.2519/jospt.2017.7274.

  6. Validity and reliability of a new tool to evaluate handwriting difficulties in Parkinson's disease.

    Directory of Open Access Journals (Sweden)

    Evelien Nackaerts

    Full Text Available Handwriting in Parkinson's disease (PD features specific abnormalities which are difficult to assess in clinical practice since no specific tool for evaluation of spontaneous movement is currently available.This study aims to validate the 'Systematic Screening of Handwriting Difficulties' (SOS-test in patients with PD.Handwriting performance of 87 patients and 26 healthy age-matched controls was examined using the SOS-test. Sixty-seven patients were tested a second time within a period of one month. Participants were asked to copy as much as possible of a text within 5 minutes with the instruction to write as neatly and quickly as in daily life. Writing speed (letters in 5 minutes, size (mm and quality of handwriting were compared. Correlation analysis was performed between SOS outcomes and other fine motor skill measurements and disease characteristics. Intrarater, interrater and test-retest reliability were assessed using the intraclass correlation coefficient (ICC and Spearman correlation coefficient.Patients with PD had a smaller (p = 0.043 and slower (p 0.769 for both groups.The SOS-test is a short and effective tool to detect handwriting problems in PD with excellent reliability. It can therefore be recommended as a clinical instrument for standardized screening of handwriting deficits in PD.

  7. Development of Reliable and Validated Tools to Evaluate Technical Resuscitation Skills in a Pediatric Simulation Setting: Resuscitation and Emergency Simulation Checklist for Assessment in Pediatrics.

    Science.gov (United States)

    Faudeux, Camille; Tran, Antoine; Dupont, Audrey; Desmontils, Jonathan; Montaudié, Isabelle; Bréaud, Jean; Braun, Marc; Fournier, Jean-Paul; Bérard, Etienne; Berlengi, Noémie; Schweitzer, Cyril; Haas, Hervé; Caci, Hervé; Gatin, Amélie; Giovannini-Chami, Lisa

    2017-09-01

    To develop a reliable and validated tool to evaluate technical resuscitation skills in a pediatric simulation setting. Four Resuscitation and Emergency Simulation Checklist for Assessment in Pediatrics (RESCAPE) evaluation tools were created, following international guidelines: intraosseous needle insertion, bag mask ventilation, endotracheal intubation, and cardiac massage. We applied a modified Delphi methodology evaluation to binary rating items. Reliability was assessed comparing the ratings of 2 observers (1 in real time and 1 after a video-recorded review). The tools were assessed for content, construct, and criterion validity, and for sensitivity to change. Inter-rater reliability, evaluated with Cohen kappa coefficients, was perfect or near-perfect (>0.8) for 92.5% of items and each Cronbach alpha coefficient was ≥0.91. Principal component analyses showed that all 4 tools were unidimensional. Significant increases in median scores with increasing levels of medical expertise were demonstrated for RESCAPE-intraosseous needle insertion (P = .0002), RESCAPE-bag mask ventilation (P = .0002), RESCAPE-endotracheal intubation (P = .0001), and RESCAPE-cardiac massage (P = .0037). Significantly increased median scores over time were also demonstrated during a simulation-based educational program. RESCAPE tools are reliable and validated tools for the evaluation of technical resuscitation skills in pediatric settings during simulation-based educational programs. They might also be used for medical practice performance evaluations. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. RELIABILITY OF THE DYNAMIC OCCUPATIONAL THERAPY COGNITIVE ASSESSMENT FOR CHILDREN (DOTCA-CH: THAI VERSION OF ORIENTATION, SPATIAL PERCEPTION, AND THINKING OPERATIONS SUBTESTS

    Directory of Open Access Journals (Sweden)

    Suchitporn Lersilp

    2014-06-01

    Full Text Available The Dynamic Occupational Therapy Cognitive Assessment for Children (DOTCA-Ch is a tool for finding out about cognitive problems in school-aged children. However, the DOTCA-Ch was developed in English for Western children. For this reason, it’s not appropriate for Thai children because of the differences of culture and language. The objectives of this study were aimed at translating the DOTCA-Ch in Orientation, Spatial Perception, and Thinking Operations subtests to a Thai version with a World Health Organization back-translation process, and to examine its internal consistency, inter-rater reliability and test-retest reliability. The participants consisted of 38 intellectually impaired and learning disabled individuals between the ages of 6–12. Results from this study revealed high internal consistency in the Orientation subtest (α=.83 Spatial Perception subtest (α=.82 and Thinking Operations subtest (α=.82, high inter-rater reliability in the Orientation subtest (ICC =.83, Spatial Perception subtest (ICC =.84 and Thinking Operations subtest (ICC =.74 and high test-retest reliability in the Orientation subtest (ICC =.84 Spatial Perception subtest (ICC =.86 and Thinking Operations subtest (ICC =.85. These results indicate that the Thai version of the DOTCA-Ch in Orientation, Spatial Perception, and Thinking Operations subtests  might be used as an appropriate assessment tool for Thai children, based on psychometric evidence including internal consistency, inter-rater reliability and test-retest reliability. However, additional study of other psychometric properties, including, predictive validity, concurrent reliability, and inter-rater reliability during the mediation process of this assessment tool needs to be carried out.

  9. [Reliability and validity of the Severe Impairment Battery, short form (SIB-s), in patients with dementia in Spain].

    Science.gov (United States)

    Cruz-Orduña, Isabel; Agüera-Ortiz, Luis F; Montorio-Cerrato, Ignacio; León-Salas, Beatriz; Valle de Juan, M Cristina; Martínez-Martín, Pablo

    2015-01-01

    People with progressive dementia evolve into a state where traditional neuropsychological tests are not effective. Severe Impairment Battery (SIB) and short form (SIB-s) were developed for evaluating the cognitive status in patients with severe dementia. To evaluate the psychometric attributes of the SIB-s in patients with severe dementia. 127 institutionalized patients (female: 86.6%; mean age: 82.6 ± 7.5 years-old) with dementia were assessed with the SIB-s, the Global Deterioration Scale (GDS), Mini-Mental State Examination (MMSE), Severe Mini-Mental State Examination (sMMSE), Barthel Index and FAST. SIB-s acceptability, reliability, validity and precision were analyzed. The mean total score for scale was 19.1 ± 15.34 (range: 0-48). Floor effect was 18.1%, only marginally higher than the desirable 15%. Factor analysis identified a single factor explaining 68% of the total variance of the scale. Cronbach's alpha coefficient was 0.96 and the item-total corrected correlation ranged from 0.27 to 0.83. The item homogeneity value was 0.43. Test-retest and inter-rater reliability for the total score was satisfactory (ICC: 0.96 and 0.95, respectively). The SIB-s showed moderate correlation with functional dependency scales (Barthel Index: 0.48, FAST: -0.74). Standard error of measurement was 3.07 for the total score. The SIB-s is a reliable and valid instrument for evaluating patients with severe dementia in the Spanish population of relatively brief instruments.

  10. Spanish translation, cross-cultural adaptation, and validation of the Questionnaire for Diabetes-Related Foot Disease (Q-DFD

    Directory of Open Access Journals (Sweden)

    Castillo-Tandazo W

    2013-08-01

    Full Text Available Wilson Castillo-Tandazo, Adolfo Flores-Fortty, Lourdes Feraud, Daniel TettamantiSchool of Medicine, Universidad Espíritu Santo – Ecuador, Samborondón, Guayas, EcuadorPurpose: To translate, cross-culturally adapt, and validate the Questionnaire for Diabetes-Related Foot Disease (Q-DFD, originally created and validated in Australia, for its use in Spanish-speaking patients with diabetes mellitus.Patients and methods: The translation and cross-cultural adaptation were based on international guidelines. The Spanish version of the survey was applied to a community-based (sample A and a hospital clinic-based sample (samples B and C. Samples A and B were used to determine criterion and construct validity comparing the survey findings with clinical evaluation and medical records, respectively; while sample C was used to determine intra- and inter-rater reliability.Results: After completing the rigorous translation process, only four items were considered problematic and required a new translation. In total, 127 patients were included in the validation study: 76 to determine criterion and construct validity and 41 to establish intra- and inter-rater reliability. For an overall diagnosis of diabetes-related foot disease, a substantial level of agreement was obtained when we compared the Q-DFD with the clinical assessment (kappa 0.77, sensitivity 80.4%, specificity 91.5%, positive likelihood ratio [LR+] 9.46, negative likelihood ratio [LR-] 0.21; while an almost perfect level of agreement was obtained when it was compared with medical records (kappa 0.88, sensitivity 87%, specificity 97%, LR+ 29.0, LR- 0.13. Survey reliability showed substantial levels of agreement, with kappa scores of 0.63 and 0.73 for intra- and inter-rater reliability, respectively.Conclusion: The translated and cross-culturally adapted Q-DFD showed good psychometric properties (validity, reproducibility, and reliability that allow its use in Spanish-speaking diabetic populations

  11. Assessment of teacher competence using video portfolios: reliability, construct validity and consequential validity

    NARCIS (Netherlands)

    Admiraal, W.; Hoeksma, M.; van de Kamp, M.-T.; van Duin, G.

    2011-01-01

    The richness and complexity of video portfolios endanger both the reliability and validity of the assessment of teacher competencies. In a post-graduate teacher education program, the assessment of video portfolios was evaluated for its reliability, construct validity, and consequential validity.

  12. Development of the Modified Four Square Step Test and its reliability and validity in people with stroke.

    Science.gov (United States)

    Roos, Margaret A; Reisman, Darcy S; Hicks, Gregory; Rose, William; Rudolph, Katherine S

    2016-01-01

    Adults with stroke have difficulty avoiding obstacles when walking, especially when a time constraint is imposed. The Four Square Step Test (FSST) evaluates dynamic balance by requiring individuals to step over canes in multiple directions while being timed, but many people with stroke are unable to complete it. The purposes of this study were to (1) modify the FSST by replacing the canes with tape so that more persons with stroke could successfully complete the test and (2) examine the reliability and validity of the modified version. Fifty-five subjects completed the Modified FSST (mFSST) by stepping over tape in all four directions while being timed. The mFSST resulted in significantly greater numbers of subjects completing the test than the FSST (39/55 [71%] and 33/55 [60%], respectively) (p < 0.04). The test-retest, intrarater, and interrater reliability of the mFSST were excellent (intraclass correlation coefficient ranges: 0.81-0.99). Construct and concurrent validity of the mFSST were also established. The minimal detectable change was 6.73 s. The mFSST, an ideal measure of dynamic balance, can identify progress in people with stroke in varied settings and can be completed by a wide range of people with stroke in approximately 5 min with the use of minimal equipment (tape, stop watch).

  13. Are Validity and Reliability "Relevant" in Qualitative Evaluation Research?

    Science.gov (United States)

    Goodwin, Laura D.; Goodwin, William L.

    1984-01-01

    The views of prominant qualitative methodologists on the appropriateness of validity and reliability estimation for the measurement strategies employed in qualitative evaluations are summarized. A case is made for the relevance of validity and reliability estimation. Definitions of validity and reliability for qualitative measurement are presented…

  14. The Korean Version of the Cognitive Assessment Scale for Stroke Patients (K-CASP): A Reliability and Validity Study.

    Science.gov (United States)

    Park, Kwon-Hee; Lee, Hee-Won; Park, Kee-Boem; Lee, Jin-Youn; Cho, Ah-Ra; Oh, Hyun-Mi; Park, Joo Hyun

    2017-06-01

    To develop the Korean version of the Cognitive Assessment Scale for Stroke Patients (K-CASP) and to evaluate the test reliability and validity of the K-CASP in stroke patients. The original CASP was translated into Korean, back-translated into English, then reviewed and compared with the original version. Thirty-three stroke patients were assessed independently by two examiners using the K-CASP twice, with a one-day interval, for a total of four test results. To evaluate the reliability of the K-CASP, intra-class correlation coefficients were used. Pearson correlations were calculated and simple regression analyses performed with the Korean version of Mini-Mental State Examination (K-MMSE) and the aphasia quotient (AQ) to assess the validity. The mean score was 24.42±9.47 (total score 36) for the K-CASP and 21.50±7.01 (total score 30) for the K-MMSE. The inter-rater correlation coefficients of the K-CASP were 0.992 on the first day and 0.995 on the second day. The intra-rater correlation coefficients of the K-CASP were 0.997 for examiner 1 and 0.996 for examiner 2. In the Pearson correlation analysis, the K-CASP score significantly correlated with the K-MMSE score (r=0.825, preliable and valid instrument for cognitive dysfunction screening in post-stroke patients. It is more applicable than other cognitive assessment tools in stroke patients with aphasia.

  15. Elaboration and Validation of the Medication Prescription Safety Checklist 1

    Science.gov (United States)

    Pires, Aline de Oliveira Meireles; Ferreira, Maria Beatriz Guimarães; do Nascimento, Kleiton Gonçalves; Felix, Márcia Marques dos Santos; Pires, Patrícia da Silva; Barbosa, Maria Helena

    2017-01-01

    ABSTRACT Objective: to elaborate and validate a checklist to identify compliance with the recommendations for the structure of medication prescriptions, based on the Protocol of the Ministry of Health and the Brazilian Health Surveillance Agency. Method: methodological research, conducted through the validation and reliability analysis process, using a sample of 27 electronic prescriptions. Results: the analyses confirmed the content validity and reliability of the tool. The content validity, obtained by expert assessment, was considered satisfactory as it covered items that represent the compliance with the recommendations regarding the structure of the medication prescriptions. The reliability, assessed through interrater agreement, was excellent (ICC=1.00) and showed perfect agreement (K=1.00). Conclusion: the Medication Prescription Safety Checklist showed to be a valid and reliable tool for the group studied. We hope that this study can contribute to the prevention of adverse events, as well as to the improvement of care quality and safety in medication use. PMID:28793128

  16. Reliability, Validity, and Minimal Detectable Change of Balance Evaluation Systems Test and Its Short Versions in Older Cancer Survivors: A Pilot Study.

    Science.gov (United States)

    Huang, Min H; Miller, Kara; Smith, Kristin; Fredrickson, Kayle; Shilling, Tracy

    2016-01-01

    Cancer is primarily a disease of older adults. About 77% of all cancers are diagnosed in persons aged 55 years and older. Cancer and its treatment can cause diverse sequelae impacting body systems underlying balance control. No study has examined the psychometric properties of balance assessment tools in older cancer survivors, presenting a significant challenge in the selection of outcome measures for clinicians treating this fast-growing population. This study aimed to determine the reliability, validity, and minimal detectable change (MDC) of the Balance Evaluation System Test (BESTest), Mini-Balance Evaluation Systems Test (Mini-BESTest), and Brief-Balance Evaluation Systems Test (Brief-BESTest) in community-dwelling older cancer survivors. This study was a cross-sectional design. Twenty breast and 8 prostate cancer survivors participated [age (SD) = 68.4 (8.13) years]. The BESTest and Activity-specific Balance Confidence (ABC) Scale were administered during the first session. Scores of Mini-BESTest and Brief-BESTest were extracted on the basis of the scores of BESTest. The BESTest was repeated within 1 to 2 weeks by the same rater to determine the test-retest reliability. For the analysis of the inter-rater reliability, 21 participants were randomly selected to be evaluated by 2 raters. A primary rater administered the test. The 2 raters independently and concurrently scored the performance of the participants. Each rater recorded the ratings separately on the scoring sheet. No discussion among the raters was allowed throughout the testing. Intraclass correlation coefficients (ICCs), standard error of measurement, minimal detectable change (MDC), and Bland-Altman plots were calculated. Concurrent validity of these balance tests with the ABC Scale was examined using the Spearman correlation. The BESTest, Mini-BESTest, and Brief-BESTest had high test-retest (ICC = 0.90-0.94) and interrater reliability (ICC = 0.86-0.96), small standard error of measurement (0

  17. Reliability and validity of revised Turkish version of Mini Mental State Examination (rMMSE-T) in community-dwelling educated and uneducated elderly.

    Science.gov (United States)

    Keskinoglu, Pembe; Ucku, Reyhan; Yener, Görsev; Yaka, Erdem; Kurt, Pinar; Tunca, Zeliha

    2009-11-01

    To evaluate the reliability and validity of the revised Turkish version of Mini Mental State Examination (rMMSE-T) in educated and uneducated community-dwelling elderly, to re-organize the present Turkish version of MMSE and to determine cut-off point of the revised test. This cross-sectional and analytical study involved totally 490 elderly subjects selected by cluster sampling method. Receiver operating characteristic (ROC) analysis, kappa analysis and Cronbach's alpha coefficients were used for statistical analysis. Areas under ROC curve in educated and uneducated elderly were found as 0.953 and 0.907. Cut-off point of 22/23 of rMMSE-T in educated elderly had the highest sensitivity (90.9), specificity (97.0) and positive likelihood ratio (30.3), whereas cut-off point of 18/19 of the test in uneducated elderly had the highest sensitivity (82.7), specificity (92.3) and positive likelihood ratio (10.7). The Cronbach's alpha values of the rMMSE-T for educated and uneducated elderly were higher than 0.7 (sign of good internal consistency of the test). A significant correlations between intrarater and interrater test-retest in educated elderly subjects were observed (0.966 (p = 0.000); 0.855 (p = 0.000), respectively), and also in uneducated elderly (0.988 (p = 0.000); 0.934 (p = 0.000), respectively). Kappa value of the test in educated and uneducated elderly showed a perfect agreement interraters (1.000) and a substantial agreement in intraraters (1.000, 0.784; 0.826, 0.656, respectively). rMMSE-T had a high reliability and validity. It will be more appropriate to use the revised test and the new cut-off point for the diagnosis and screening of dementia among community-dwelling Turkish elderly population. Copyright 2009 John Wiley & Sons, Ltd.

  18. A validation study of the Keyboard Personal Computer Style instrument (K-PeCS) for use with children.

    Science.gov (United States)

    Green, Dido; Meroz, Anat; Margalit, Adi Edit; Ratzon, Navah Z

    2012-11-01

    This study examines a potential instrument for measurement of typing postures of children. This paper describes inter-rater, test-retest reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS), an observational measurement of postures and movements during keyboarding, for use with children. Two trained raters independently rated videos of 24 children (aged 7-10 years). Six children returned one week later for identifying test-retest reliability. Concurrent validity was assessed by comparing ratings obtained using the K-PECS to scores from a 3D motion analysis system. Inter-rater reliability was moderate to high for 12 out of 16 items (Kappa: 0.46 to 1.00; correlation coefficients: 0.77-0.95) and test-retest reliability varied across items (Kappa: 0.25 to 0.67; correlation coefficients: r = 0.20 to r = 0.95). Concurrent validity compared favourably across arm pathlength, wrist extension and ulnar deviation. In light of the limitations of other tools the K-PeCS offers a fairly affordable, reliable and valid instrument to address the gap for measurement of typing styles of children, despite the shortcomings of some items. However further research is required to refine the instrument for use in evaluating typing among children. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  19. A Surgery Oral Examination: Interrater Agreement and the Influence of Rater Characteristics.

    Science.gov (United States)

    Burchard, Kenneth W.; And Others

    1995-01-01

    A study measured interrater reliability among 140 United States and Canadian surgery exam raters and the influences of age, years in practice, and experience as an examiner on individual scores. Results indicate three aspects of examinee performance influenced scores: verbal style, dress, and content of answers. No rater characteristic…

  20. Reliability, Validity, and Optimal Cutoff Score of the Montreal Cognitive Assessment (Changsha Version) in Ischemic Cerebrovascular Disease Patients of Hunan Province, China

    Science.gov (United States)

    Tu, Qiu-yun; Jin, Hui; Ding, Bin-rong; Yang, Xia; Lei, Zeng-hui; Bai, Song; Zhang, Ying-dong; Tang, Xiang-qi

    2013-01-01

    Background/Aims The goal of this study was to examine the reliability and validity of the Changsha version of the Montreal Cognitive Assessment (MoCA-CS) in ischemic cerebrovascular disease patients of Hunan Province, China, and to explore the optimal cutoff score for detecting vascular cognitive impairment-no dementia (VCI-ND) and vascular dementia (VD). Methods Three hundred and thirty-eight ischemic cerebrovascular disease patients (131 with normal cognition, 111 with VCI-ND, and 96 with VD) and 132 healthy controls were recruited. All participants accepted examination by the MoCA-CS, Mini-Mental State Examination (MMSE), and other related scales. A detailed neuropsychological battery was used for making a final cognitive diagnosis. SPSS 16.0 statistical software was used for reliability, validity examination, and optimal cutoff score detection. Results Cronbach's α of the MoCA-CS was 0.884, and test-retest and interrater reliability of the MoCA-CS were 0.966 and 0.926, respectively. MoCA-CS scores were highly correlated with MMSE scores (r = 0.867) and simplified intelligence quotients (r = 0.822). The results indicate that 1 point should be added for subjects with less than 6 years of education, and that the optimal cutoff score for detecting VCI-ND is 26/27 (sensitivity 96.1%, specificity 75.6%), whereas the optimal cutoff score for detecting VD is 16/17 (sensitivity 92.7%, specificity 96.3%). Conclusion The MoCA-CS has good reliability and validity, and is a useful cognitive screening instrument for detecting VCI in the Chinese population. PMID:23637698

  1. Reliability, Validity, and Optimal Cutoff Score of the Montreal Cognitive Assessment (Changsha Version in Ischemic Cerebrovascular Disease Patients of Hunan Province, China

    Directory of Open Access Journals (Sweden)

    Qiu-yun Tu

    2013-02-01

    Full Text Available Background/Aims: The goal of this study was to examine the reliability and validity of the Changsha version of the Montreal Cognitive Assessment (MoCA-CS in ischemic cerebrovascular disease patients of Hunan Province, China, and to explore the optimal cutoff score for detecting vascular cognitive impairment-no dementia (VCI-ND and vascular dementia (VD. Methods: Three hundred and thirty-eight ischemic cerebrovascular disease patients (131 with normal cognition, 111 with VCI-ND, and 96 with VD and 132 healthy controls were recruited. All participants accepted examination by the MoCA-CS, Mini-Mental State Examination (MMSE, and other related scales. A detailed neuropsychological battery was used for making a final cognitive diagnosis. SPSS 16.0 statistical software was used for reliability, validity examination, and optimal cutoff score detection. Results: Cronbach’s α of the MoCA-CS was 0.884, and test-retest and interrater reliability of the MoCA-CS were 0.966 and 0.926, respectively. MoCA-CS scores were highly correlated with MMSE scores (r = 0.867 and simplified intelligence quotients (r = 0.822. The results indicate that 1 point should be added for subjects with less than 6 years of education, and that the optimal cutoff score for detecting VCI-ND is 26/27 (sensitivity 96.1%, specificity 75.6%, whereas the optimal cutoff score for detecting VD is 16/17 (sensitivity 92.7%, specificity 96.3%. Conclusion: The MoCA-CS has good reliability and validity, and is a useful cognitive screening instrument for detecting VCI in the Chinese population.

  2. How reliable are Functional Movement Screening scores? A systematic review of rater reliability.

    Science.gov (United States)

    Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John

    2016-05-01

    Several physical assessment protocols to identify intrinsic risk factors for injury aetiology related to movement quality have been described. The Functional Movement Screen (FMS) is a standardised, field-expedient test battery intended to assess movement quality and has been used clinically in preparticipation screening and in sports injury research. To critically appraise and summarise research investigating the reliability of scores obtained using the FMS battery. Systematic literature review. Systematic search of Google Scholar, Scopus (including ScienceDirect and PubMed), EBSCO (including Academic Search Complete, AMED, CINAHL, Health Source: Nursing/Academic Edition), MEDLINE and SPORTDiscus. Studies meeting eligibility criteria were assessed by 2 reviewers for risk of bias using the Quality Appraisal of Reliability Studies checklist. Overall quality of evidence was determined using van Tulder's levels of evidence approach. 12 studies were appraised. Overall, there was a 'moderate' level of evidence in favour of 'acceptable' (intraclass correlation coefficient ≥0.6) inter-rater and intra-rater reliability for composite scores derived from live scoring. For inter-rater reliability of composite scores derived from video recordings there was 'conflicting' evidence, and 'limited' evidence for intra-rater reliability. For inter-rater reliability based on live scoring of individual subtests there was 'moderate' evidence of 'acceptable' reliability (κ≥0.4) for 4 subtests (Deep Squat, Shoulder Mobility, Active Straight-leg Raise, Trunk Stability Push-up) and 'conflicting' evidence for the remaining 3 (Hurdle Step, In-line Lunge, Rotary Stability). This review found 'moderate' evidence that raters can achieve acceptable levels of inter-rater and intra-rater reliability of composite FMS scores when using live ratings. Overall, there were few high-quality studies, and the quality of several studies was impacted by poor study reporting particularly in relation to

  3. Assessment of apraxia: inter-rater reliability of a new apraxia test, association between apraxia and other cognitive deficits and prevalence of apraxia in a rehabilitation setting.

    Science.gov (United States)

    Zwinkels, Angeliek; Geusgens, Chantal; van de Sande, Peter; Van Heugten, Caroline

    2004-11-01

    To investigate the inter-rater reliability of a new apraxia test. Furthermore to examine the association of apraxia with other neuropsychological impairments and the prevalence of apraxia in a rehabilitation setting on the basis of the new test. Cross-sectional cohort study, involving 100 patients with a first stroke admitted to a rehabilitation centre in the Netherlands. General patient characteristics and stroke-related aspects. Cognitive screening involving apraxia, visuospatial scanning, abstract thinking and reasoning, memory, attention, planning and aphasia. The indices for inter-rater agreement range from excellent to poor. Significant correlations are found between apraxia and visuospatial scanning, memory, attention, planning and aphasia. The patients with apraxia perform significantly worse than the patients without apraxia on memory, the time needed to complete the tests for scanning and attention, and aphasia. The prevalence of apraxia is 25.3% in the total group, 51.3% in the left hemisphere stroke patients and 6.0% in the right hemisphere stroke patients. Patients with and without apraxia do not differ significantly concerning age, gender and type of stroke. The apraxia test has been shown to be a reliable instrument. Apraxia is often associated with aphasia, memory problems and mental slowness. This study shows that on the basis of the apraxia test, the prevalence of apraxia among patients in the rehabilitation centre is high, especially among patients with left hemisphere lesions.

  4. Reliability of Alberta Infant Motor Scale Using Recorded Video Observations Among the Preterm Infants in India: A Reliability Study

    Directory of Open Access Journals (Sweden)

    Veena Kirthika S

    2017-10-01

    Full Text Available Background: Assessment of motor function is a vital characteristic of infant development. Alberta Infant Motor scale (AIMS is considered to be one of the tool available for screening the developmental delays, but this scale was formulated by using western samples. Every country has its own ethnic and cultural background and various differences are observed in the culture and ethnicity. Therefore, there is a need to obtain reliability for the use of AIMS in south Indian population. Purpose: To find the intra-rater and inter-rater reliability of Alberta Infant Motor Scale (AIMS on pre-term infants using the recorded video observations in Indian population. Method: 30 preterm infants in three age groups, 0-3 months (10 infants, 4-7 months (10 infants, 8-18 months (10 infants were recruited for this reliability study. The AIMS was administered to the preterm infants and the performance was videotaped. The performance was then rescored by the same therapist, immediately from the video and on another two consecutive months to estimate intra-rater reliability using ICC (3,1, two-way mixed effects model. For reporting inter-rater reliability, AIMS was scored by three different raters, using ICC (2,k two-way random effects model and by two other therapists to examine the inter and intra-rater reliability. Results: The two-way mixed effects model for intra-rater reliability of AIMS, ICC (3,1 = 0.99 and for reporting inter-rater reliability of AIMS by two-way random effects model, ICC (2,k = 0.96. Conclusion: AIMS has excellent intra and inter-rater reliability using recorded video observations among the preterm infants in India

  5. Strength and Pain Threshold Handheld Dynamometry Test Reliability in Patellofemoral Pain.

    Science.gov (United States)

    van der Heijden, R A; Vollebregt, T; Bierma-Zeinstra, S M A; van Middelkoop, M

    2015-12-01

    Patellofemoral pain syndrome (PFPS), characterized by peri- and retropatellar pain, is a common disorder in young, active people. The etiology is unclear; however, quadriceps strength seems to be a contributing factor, and sensitization might play a role. The study purpose is determining the inter-rater reliability of handheld dynamometry to test both quadriceps strength and pressure pain threshold (PPT), a measure for sensitization, in patients with PFPS. This cross-sectional case-control study comprises 3 quadriceps strength and one PPT measurements performed by 2 independent investigators in 22 PFPS patients and 16 matched controls. Inter-rater reliability was analyzed using intraclass correlation coefficients (ICC) and Bland-Altman plots. Inter-rater reliability of quadriceps strength testing was fair to good in PFPS patients (ICC=0.72) and controls (ICC=0.63). Bland-Altman plots showed an increased difference between assessors when average quadriceps strength values exceeded 250 N. Inter-rater reliability of PPT was excellent in patients (ICC=0.79) and fair to good in controls (ICC=0.52). Handheld dynamometry seems to be a reliable method to test both quadriceps strength and PPT in PFPS patients. Inter-rater reliability was higher in PFPS patients compared to control subjects. With regard to quadriceps testing, a higher variance between assessors occurs when quadriceps strength increases. © Georg Thieme Verlag KG Stuttgart · New York.

  6. Interrater Reliability in Analysis of Laryngoscopic Features for Unilateral Vocal Fold Paresis.

    Science.gov (United States)

    Isseroff, Tova F; Parasher, Arjun K; Richards, Amanda; Sivak, Mark; Woo, Peak

    2016-11-01

    The diagnosis of paresis in patients with vocal fold motion impairment remains a challenge. In particular, laryngoscopy examination may result in significant disagreement in diagnosis among providers. We hypothesize that systematically evaluating for a standard set of clinical parameters will increase the diagnostic concordance among providers. Prospective case series conducted at a Tertiary referral Laryngology office. Two laryngologists (rater 1) and two trainees (rater 2) rated laryngoscopy findings in 19 patients suspected of paresis. The diagnosis was confirmed with laryngeal electromyogram. A standard set of 27 ratings was used for each examination that included movement, laryngeal configuration, and stroboscopy signs. A kappa coefficient was calculated for agreement in laryngoscopy findings and effectiveness in predicting the laterality of paresis. A substantial agreement (kappa coefficient > 0.61) existed between the raters for vocal fold length, vocal fold thickness, bowing, and reduction in movement. A moderate agreement (kappa coefficient > 0.41) existed between raters for piriform opening and reduced kinesis. The senior author was accurately able to diagnose the side of paresis in 89.5% of cases for a kappa coefficient of 0.78, whereas the trainees correctly predicted the side of paresis in 63.1% for a kappa coefficient of 0.35. The raters agreed on the diagnosis in 73.7% of cases for a kappa coefficient of 0.50. Using a standard set of laryngoscopy findings may improve the provider's ability to identify the laterality of vocal fold paresis and increase interrater reliability compared with other series. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  7. Reliability and validity of risk analysis

    International Nuclear Information System (INIS)

    Aven, Terje; Heide, Bjornar

    2009-01-01

    In this paper we investigate to what extent risk analysis meets the scientific quality requirements of reliability and validity. We distinguish between two types of approaches within risk analysis, relative frequency-based approaches and Bayesian approaches. The former category includes both traditional statistical inference methods and the so-called probability of frequency approach. Depending on the risk analysis approach, the aim of the analysis is different, the results are presented in different ways and consequently the meaning of the concepts reliability and validity are not the same.

  8. The reliability of three psoriasis assessment tools: Psoriasis area and severity index, body surface area and physician global assessment.

    Science.gov (United States)

    Bożek, Agnieszka; Reich, Adam

    2017-08-01

    A wide variety of psoriasis assessment tools have been proposed to evaluate the severity of psoriasis in clinical trials and daily practice. The most frequently used clinical instrument is the psoriasis area and severity index (PASI); however, none of the currently published severity scores used for psoriasis meets all the validation criteria required for an ideal score. The aim of this study was to compare and assess the reliability of 3 commonly used assessment instruments for psoriasis severity: the psoriasis area and severity index (PASI), body surface area (BSA) and physician global assessment (PGA). On the scoring day, 10 trained dermatologists evaluated 9 adult patients with plaque-type psoriasis using the PASI, BSA and PGA. All the subjects were assessed twice by each physician. Correlations between the assessments were analyzed using the Pearson correlation coefficient. Intra-class correlation coefficient (ICC) was calculated to analyze intra-rater reliability, and the coefficient of variation (CV) was used to assess inter-rater variability. Significant correlations were observed among the 3 scales in both assessments. In all 3 scales the ICCs were > 0.75, indicating high intra-rater reliability. The highest ICC was for the BSA (0.96) and the lowest one for the PGA (0.87). The CV for the PGA and PASI were 29.3 and 36.9, respectively, indicating moderate inter-rater variability. The CV for the BSA was 57.1, indicating high inter-rater variability. Comparing the PASI, PGA and BSA, it was shown that the PGA had the highest inter-rater reliability, whereas the BSA had the highest intra-rater reliability. The PASI showed intermediate values in terms of interand intra-rater reliability. None of the 3 assessment instruments showed a significant advantage over the other. A reliable assessment of psoriasis severity requires the use of several independent evaluations simultaneously.

  9. Validity of the Consensual Assessment Technique--Evidence with Three Groups of Judges and an Elementary School Student Sample

    Science.gov (United States)

    Long, Haiying

    2012-01-01

    As one of the most widely used creativity assessment tools, the Consensual Assessment Technique (CAT) has been praised as a valid tool to assess creativity. In Amabile's (1982) seminal work, the inter-rater reliability was defined as construct validity of the CAT. During the past three decades, researchers followed this definition and…

  10. Reliability of visual and instrumental color matching.

    Science.gov (United States)

    Igiel, Christopher; Lehmann, Karl Martin; Ghinea, Razvan; Weyhrauch, Michael; Hangx, Ysbrand; Scheller, Herbert; Paravina, Rade D

    2017-09-01

    The aim of this investigation was to evaluate intra-rater and inter-rater reliability of visual and instrumental shade matching. Forty individuals with normal color perception participated in this study. The right maxillary central incisor of a teaching model was prepared and restored with 10 feldspathic all-ceramic crowns of different shades. A shade matching session consisted of the observer (rater) visually selecting the best match by using VITA classical A1-D4 (VC) and VITA Toothguide 3D Master (3D) shade guides and the VITA Easyshade Advance intraoral spectrophotometer (ES) to obtain both VC and 3D matches. Three shade matching sessions were held with 4 to 6 weeks between sessions. Intra-rater reliability was assessed based on the percentage of agreement for the three sessions for the same observer, whereas the inter-rater reliability was calculated as mean percentage of agreement between different observers. The Fleiss' Kappa statistical analysis was used to evaluate visual inter-rater reliability. The mean intra-rater reliability for the visual shade selection was 64(11) for VC and 48(10) for 3D. The corresponding ES values were 96(4) for both VC and 3D. The percentages of observers who matched the same shade with VC and 3D were 55(10) and 43(12), respectively, while corresponding ES values were 88(8) for VC and 92(4) for 3D. The results for visual shade matching exhibited a high to moderate level of inconsistency for both intra-rater and inter-rater comparisons. The VITA Easyshade Advance intraoral spectrophotometer exhibited significantly better reliability compared with visual shade selection. This study evaluates the ability of observers to consistently match the same shade visually and with a dental spectrophotometer in different sessions. The intra-rater and inter-rater reliability (agreement of repeated shade matching) of visual and instrumental tooth color matching strongly suggest the use of color matching instruments as a supplementary tool in

  11. What to Do With "Moderate" Reliability and Validity Coefficients?

    NARCIS (Netherlands)

    Post, Marcel W

    Clinimetric studies may use criteria for test-retest reliability and convergent validity such that correlation coefficients as low as .40 are supportive of reliability and validity. It can be argued that moderate (.40-.60) correlations should not be interpreted in this way and that reliability

  12. Verification, validation, and reliability of predictions

    International Nuclear Information System (INIS)

    Pigford, T.H.; Chambre, P.L.

    1987-04-01

    The objective of predicting long-term performance should be to make reliable determinations of whether the prediction falls within the criteria for acceptable performance. Establishing reliable predictions of long-term performance of a waste repository requires emphasis on valid theories to predict performance. The validation process must establish the validity of the theory, the parameters used in applying the theory, the arithmetic of calculations, and the interpretation of results; but validation of such performance predictions is not possible unless there are clear criteria for acceptable performance. Validation programs should emphasize identification of the substantive issues of prediction that need to be resolved. Examples relevant to waste package performance are predicting the life of waste containers and the time distribution of container failures, establishing the criteria for defining container failure, validating theories for time-dependent waste dissolution that depend on details of the repository environment, and determining the extent of congruent dissolution of radionuclides in the UO 2 matrix of spent fuel. Prediction and validation should go hand in hand and should be done and reviewed frequently, as essential tools for the programs to design and develop repositories. 29 refs

  13. Initial Validation of a Technical Writing Rubric for Engineering Design

    Directory of Open Access Journals (Sweden)

    Cheryl Bodnar

    2018-02-01

    Full Text Available Engineering design serves as the capstone experience of most undergraduate engineering programs. One of the key elements of the engineering design process is the compilation of results obtained into a technical report that can be shared and distributed to interested stakeholders including industry, faculty members and other relevant parties. In an effort to expand the tools available for assessment of engineering design technical reports, this study performed an initial validation of a previously developed Technical Writing rubric. The rubric was evaluated for its reliability to measure the intended construct, inter-rater reliability and external validity in comparison to an existing generalized written communication rubric. It was found that the rubric was reliable with Cronbach’s alpha for all dimensions between 0.817 and 0.976. The inter-rater reliability for the overall instrument was also found to be excellent at 0.85. Finally, it was observed that there were no statistically significant differences observed between the measurements obtained on the Technical Writing rubric in comparison to the more generalized Written Communication Value rubric. This demonstrates that although specific to engineering design environments the Technical Writing rubric was able to measure key constructs associated with written communication practice. This rubric can now serve as one additional tool for assessment of communication skills within engineering capstone design experiences.

  14. Reliability of Lactation Assessment Tools Applied to Overweight and Obese Women.

    Science.gov (United States)

    Chapman, Donna J; Doughty, Katherine; Mullin, Elizabeth M; Pérez-Escamilla, Rafael

    2016-05-01

    The interrater reliability of lactation assessment tools has not been evaluated in overweight/obese women. This study aimed to compare the interrater reliability of 4 lactation assessment tools in this population. A convenience sample of 45 women (body mass index > 27.0) was videotaped while breastfeeding (twice daily on days 2, 4, and 7 postpartum). Three International Board Certified Lactation Consultants independently rated each videotaped session using 4 tools (Infant Breastfeeding Assessment Tool [IBFAT], modified LATCH [mLATCH], modified Via Christi [mVC], and Riordan's Tool [RT]). For each day and tool, we evaluated interrater reliability with 1-way repeated-measures analyses of variance, intraclass correlation coefficients (ICCs), and percentage absolute agreement between raters. Analyses of variance showed significant differences between raters' scores on day 2 (all scales) and day 7 (RT). Intraclass correlation coefficient values reflected good (mLATCH) to excellent reliability (IBFAT, mVC, and RT) on days 2 and 7. All day 4 ICCs reflected good reliability. The ICC for mLATCH was significantly lower than all others on day 2 and was significantly lower than IBFAT (day 7). Percentage absolute interrater agreement for scale components ranged from 31% (day 2: observable swallowing, RT) to 92% (day 7: IBFAT, fixing; and mVC, latch time). Swallowing scores on all scales had the lowest levels of interrater agreement (31%-64%). We demonstrated differences in the interrater reliability of 4 lactation assessment tools when applied to overweight/obese women, with the lowest values observed on day 4. Swallowing assessment was particularly unreliable. Researchers and clinicians using these scales should be aware of the differences in their psychometric behavior. © The Author(s) 2015.

  15. The health preoccupation diagnostic interview: inter-rater reliability of a structured interview for diagnostic assessment of DSM-5 somatic symptom disorder and illness anxiety disorder.

    Science.gov (United States)

    Axelsson, Erland; Andersson, Erik; Ljótsson, Brjánn; Wallhed Finn, Daniel; Hedman, Erik

    2016-06-01

    Somatic symptom disorder (SSD) and illness anxiety disorder (IAD) are two new diagnoses introduced in the DSM-5. There is a need for reliable instruments to facilitate the assessment of these disorders. We therefore developed a structured diagnostic interview, the Health Preoccupation Diagnostic Interview (HPDI), which we hypothesized would reliably differentiate between SSD, IAD, and no diagnosis. Persons with clinically significant health anxiety (n = 52) and healthy controls (n = 52) were interviewed using the HPDI. Diagnoses were then compared with those made by an independent assessor, who listened to audio recordings of the interviews. Ratings generally indicated moderate to almost perfect inter-rater agreement, as illustrated by an overall Cohen's κ of .85. Disagreements primarily concerned (a) the severity of somatic symptoms, (b) the differential diagnosis of panic disorder, and (c) SSD specifiers. We conclude that the HPDI can be used to reliably diagnose DSM-5 SSD and IAD.

  16. Systematic reviews need to consider applicability to disadvantaged populations: inter-rater agreement for a health equity plausibility algorithm.

    Science.gov (United States)

    Welch, Vivian; Brand, Kevin; Kristjansson, Elizabeth; Smylie, Janet; Wells, George; Tugwell, Peter

    2012-12-19

    Systematic reviews have been challenged to consider effects on disadvantaged groups. A priori specification of subgroup analyses is recommended to increase the credibility of these analyses. This study aimed to develop and assess inter-rater agreement for an algorithm for systematic review authors to predict whether differences in effect measures are likely for disadvantaged populations relative to advantaged populations (only relative effect measures were addressed). A health equity plausibility algorithm was developed using clinimetric methods with three items based on literature review, key informant interviews and methodology studies. The three items dealt with the plausibility of differences in relative effects across sex or socioeconomic status (SES) due to: 1) patient characteristics; 2) intervention delivery (i.e., implementation); and 3) comparators. Thirty-five respondents (consisting of clinicians, methodologists and research users) assessed the likelihood of differences across sex and SES for ten systematic reviews with these questions. We assessed inter-rater reliability using Fleiss multi-rater kappa. The proportion agreement was 66% for patient characteristics (95% confidence interval: 61%-71%), 67% for intervention delivery (95% confidence interval: 62% to 72%) and 55% for the comparator (95% confidence interval: 50% to 60%). Inter-rater kappa, assessed with Fleiss kappa, ranged from 0 to 0.199, representing very low agreement beyond chance. Users of systematic reviews rated that important differences in relative effects across sex and socioeconomic status were plausible for a range of individual and population-level interventions. However, there was very low inter-rater agreement for these assessments. There is an unmet need for discussion of plausibility of differential effects in systematic reviews. Increased consideration of external validity and applicability to different populations and settings is warranted in systematic reviews to meet this

  17. Predictive validity of the SVR-20 and Static-99 in a Dutch sample of treated sex offenders

    NARCIS (Netherlands)

    de Vogel, V.; de Ruiter, C.; van Beek, D.; Mead, G.

    2004-01-01

    In this retrospective study, the interrater reliability and redictive validity of 2 risk assessment instruments for sexual violence are presented. The SVR-20, an instrument for structured professional judgment, and the Static-99, an actuarial risk assessment instrument, were coded from file

  18. EasyDIAg: A tool for easy determination of interrater agreement.

    Science.gov (United States)

    Holle, Henning; Rein, Robert

    2015-09-01

    Reliable measurements are fundamental for the empirical sciences. In observational research, measurements often consist of observers categorizing behavior into nominal-scaled units. Since the categorization is the outcome of a complex judgment process, it is important to evaluate the extent to which these judgments are reproducible, by having multiple observers independently rate the same behavior. A challenge in determining interrater agreement for timed-event sequential data is to develop clear objective criteria to determine whether two raters' judgments relate to the same event (the linking problem). Furthermore, many studies presently report only raw agreement indices, without considering the degree to which agreement can occur by chance alone. Here, we present a novel, free, and open-source toolbox (EasyDIAg) designed to assist researchers with the linking problem, while also providing chance-corrected estimates of interrater agreement. Additional tools are included to facilitate the development of coding schemes and rater training.

  19. Reliability and Validity of Qualitative and Operational Research Paradigm

    Directory of Open Access Journals (Sweden)

    Muhammad Bashir

    2008-01-01

    Full Text Available Both qualitative and quantitative paradigms try to find the same result; the truth. Qualitative studies are tools used in understanding and describing the world of human experience. Since we maintain our humanity throughout the research process, it is largely impossible to escape the subjective experience, even for the most experienced of researchers. Reliability and Validity are the issue that has been described in great deal by advocates of quantitative researchers. The validity and the norms of rigor that are applied to quantitative research are not entirely applicable to qualitative research. Validity in qualitative research means the extent to which the data is plausible, credible and trustworthy; and thus can be defended when challenged. Reliability and validity remain appropriate concepts for attaining rigor in qualitative research. Qualitative researchers have to salvage responsibility for reliability and validity by implementing verification strategies integral and self-correcting during the conduct of inquiry itself. This ensures the attainment of rigor using strategies inherent within each qualitative design, and moves the responsibility for incorporating and maintaining reliability and validity from external reviewers’ judgments to the investigators themselves. There have different opinions on validity with some suggesting that the concepts of validity is incompatible with qualitative research and should be abandoned while others argue efforts should be made to ensure validity so as to lend credibility to the results. This paper is an attempt to clarify the meaning and use of reliability and validity in the qualitative research paradigm.

  20. The six-item Clock Drawing Test – reliability and validity in mild Alzheimer’s disease

    DEFF Research Database (Denmark)

    Jørgensen, Kasper; Kristensen, Maria K; Waldemar, Gunhild

    2015-01-01

    This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical neuropsychologi......This study presents a reliable, short and practical version of the Clock Drawing Test (CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer's disease versus elderly nonpatients. Clock drawings from 231 participants were scored independently by four clinical...... neuropsychologists blind to diagnostic classification. The interrater agreement of individual scoring criteria was analyzed and items with poor or moderate reliability were excluded. The classification accuracy of the resulting scoring system - the six-item CDT - was examined. We explored the effect of further...

  1. An Italian multicentre validation study of the coma recovery scale-revised.

    Science.gov (United States)

    Estraneo, A; Moretta, P; De Tanti, A; Gatta, G; Giacino, J T; Trojano, L

    2015-10-01

    Rate of misdiagnosis of disorders of consciousness (DoC) can be reduced by employing validated clinical diagnostic tools, such as the Coma Recovery Scale-Revised (CRS-R). An Italian version of the CRS-R has been recently developed, but its applicability across different clinical settings, and its concurrent validity and diagnostic sensitivity have not been estimated yet. To perform a multicentre validation study of the Italian version of the Coma Recovery Scale-Revised (CRS-R). Analysis of inter-rater reliability, concurrent validity and diagnostic sensitivity of the scale. One Intensive Care Unit, 8 Post-acute rehabilitation centres and 2 Long-term facilities Twenty-seven professionals (physicians, N.=11; psychologists, N.=5; physiotherapists, N.=3; speech therapists, N.=6; nurses, N.=2) from 11 Italian Centres. CRS-R and Disability Rating Scale (DRS) applied to 122 patients with clinical diagnosis of Vegetative State (VS) or Minimally Conscious State (MCS). CRS-R has good-to-excellent inter-rater reliability for all subscales, particularly for the communication subscale. The Italian version of the CRS-R showed a high sensitivity and specificity in detecting MCS with reference to clinical consensus diagnosis. The CRS-R showed good concurrent validity with the Disability Rating Scale, which had very low specificity with reference to clinical consensus diagnosis. The Italian version of the CRS-R is a valid scale for use from the sub-acute to chronic stages of DoC. It can be administered reliably by all members of the rehabilitation team with different specialties, levels of experience and settings. The present study promote use of the Italian version of the CRS-R to improve diagnosis of DoC patients, and plan tailored rehabilitation treatment.

  2. Using the Hemophilia Joint Health Score for assessment of children: Reliability of the Spanish version.

    Science.gov (United States)

    R, Cuesta-Barriuso; A, Torres-Ortuño; S, Pérez-Alenda; J, Carrasco Juan; F, Querol; J, Nieto-Munuera; Ja, López-Pina

    2018-02-27

    Numerous measuring instruments for the evaluation of hemophilic arthropathy have been developed. One of the most used systems is the Hemophilia Joint Health Score (HJHS) given its sensitivity to clinical changes appearing in the joints because of recurrent hemarthrosis. Assessing the interrater reliability, using the Spanish version of the HJHS (version 2.1) in children with hemophilia. Reliability study to assess the interrater reliability of the Spanish version of HJHS. A sample of 36 children aged 7-13 years diagnosed with hemophilia A or B was used. Two physiotherapists performed physical assessments with the Spanish version of the HJHS. Descriptive statistics (range, mean, standard deviation) and the analysis of interrater reliability were calculated. The interrater reliability was heterogeneous since the Kappa coefficient range (ĸ), although significant (p reliability of the Spanish population version of the HJHS is high. This scale should be used generically in evaluating musculoskeletal pediatric patients with hemophilia.

  3. Emergency Severity Index version 4: a valid and reliable tool in pediatric emergency department triage.

    Science.gov (United States)

    Green, Nicole A; Durani, Yamini; Brecher, Deena; DePiero, Andrew; Loiselle, John; Attia, Magdy

    2012-08-01

    The Emergency Severity Index version 4 (ESI v.4) is the most recently implemented 5-level triage system. The validity and reliability of this triage tool in the pediatric population have not been extensively established. The goals of this study were to assess the validity of ESI v.4 in predicting hospital admission, emergency department (ED) length of stay (LOS), and number of resources utilized, as well as its reliability in a prospective cohort of pediatric patients. The first arm of the study was a retrospective chart review of 780 pediatric patients presenting to a pediatric ED to determine the validity of ESI v.4. Abstracted data included acuity level assigned by the triage nurse using ESI v.4 algorithm, disposition (admission vs discharge), LOS, and number of resources utilized in the ED. To analyze the validity of ESI v.4, patients were divided into 2 groups for comparison: higher-acuity patients (ESI levels 1, 2, and 3) and lower-acuity patients (ESI levels 4 and 5). Pearson χ analysis was performed for categorical variables. For continuous variables, we conducted a comparison of means based on parametric distribution of variables. The second arm was a prospective cohort study to determine the interrater reliability of ESI v.4 among and between pediatric triage (PT) nurses and pediatric emergency medicine (PEM) physicians. Three raters (2 PT nurses and 1 PEM physician) independently assigned triage scores to 100 patients; k and interclass correlation coefficient were calculated among PT nurses and between the primary PT nurses and physicians. In the validity arm, the distribution of ESI score levels among the 780 cases are as follows: ESI 1: 2 (0.25%); ESI 2: 73 (9.4%); ESI 3: 289 (37%); ESI 4: 251 (32%); and ESI 5: 165 (21%). Hospital admission rates by ESI level were 1: 100%, 2: 42%, 3: 14.9%, 4: 1.2%, and 5: 0.6%. The admission rate of the higher-acuity group (76/364, 21%) was significantly greater than the lower-acuity group (4/415, 0.96%), P group was

  4. Validity and reliability of smartphone magnetometer-based goniometer evaluation of shoulder abduction--A pilot study.

    Science.gov (United States)

    Johnson, Linda B; Sumner, Sean; Duong, Tina; Yan, Posu; Bajcsy, Ruzena; Abresch, R Ted; de Bie, Evan; Han, Jay J

    2015-12-01

    Goniometers are commonly used by physical therapists to measure range-of-motion (ROM) in the musculoskeletal system. These measurements are used to assist in diagnosis and to help monitor treatment efficacy. With newly emerging technologies, smartphone-based applications are being explored for measuring joint angles and movement. This pilot study investigates the intra- and inter-rater reliability as well as concurrent validity of a newly-developed smartphone magnetometer-based goniometer (MG) application for measuring passive shoulder abduction in both sitting and supine positions, and compare against the traditional universal goniometer (UG). This is a comparative study with repeated measurement design. Three physical therapists utilized both the smartphone MG and a traditional UG to measure various angles of passive shoulder abduction in a healthy subject, whose shoulder was positioned in eight different positions with pre-determined degree of abduction while seated or supine. Each therapist was blinded to the measured angles. Concordance correlation coefficients (CCCs), Bland-Altman plotting methods, and Analysis of Variance (ANOVA) were used for statistical analyses. Both traditional UG and smartphone MG were reliable in repeated measures of standardized joint angle positions (average CCC > 0.997) with similar variability in both measurement tools (standard deviation (SD) ± 4°). Agreement between the UG and MG measurements was greater than 0.99 in all positions. Our results show that the smartphone MG has equivalent reliability compared to the traditional UG when measuring passive shoulder abduction ROM. With concordant measures and comparable reliability to the UG, the newly developed MG application shows potential as a useful tool to assess joint angles. Published by Elsevier Ltd.

  5. Validation of the Intelligibility in Context Scale for Jamaican Creole-Speaking Preschoolers.

    Science.gov (United States)

    Washington, Karla N; McDonald, Megan M; McLeod, Sharynne; Crowe, Kathryn; Devonish, Hubert

    2017-08-15

    To describe validation of the Intelligibility in Context Scale (ICS; McLeod, Harrison, & McCormack, 2012a) and ICS-Jamaican Creole (ICS-JC; McLeod, Harrison, & McCormack, 2012b) in a sample of typically developing 3- to 6-year-old Jamaicans. One-hundred and forty-five preschooler-parent dyads participated in the study. Parents completed the 7-item ICS (n = 145) and ICS-JC (n = 98) to rate children's speech intelligibility (5-point scale) across communication partners (parents, immediate family, extended family, friends, acquaintances, strangers). Preschoolers completed the Diagnostic Evaluation of Articulation and Phonology (DEAP; Dodd, Hua, Crosbie, Holm, & Ozanne, 2006) in English and Jamaican Creole to establish speech-sound competency. For this sample, we examined validity and reliability (interrater, test-rest, internal consistency) evidence using measures of speech-sound production: (a) percentage of consonants correct, (b) percentage of vowels correct, and (c) percentage of phonemes correct. ICS and ICS-JC ratings showed preschoolers were always (5) to usually (4) understood across communication partners (ICS, M = 4.43; ICS-JC, M = 4.50). Both tools demonstrated excellent internal consistency (α = .91), high interrater, and test-retest reliability. Significant correlations between the two tools and between each measure and language-specific percentage of consonants correct, percentage of vowels correct, and percentage of phonemes correct provided criterion-validity evidence. A positive correlation between the ICS and age further strengthened validity evidence for that measure. Both tools show promising evidence of reliability and validity in describing functional speech intelligibility for this group of typically developing Jamaican preschoolers.

  6. Doloplus-2, a valid tool for behavioural pain assessment?

    Directory of Open Access Journals (Sweden)

    Loge Jon H

    2007-12-01

    Full Text Available Abstract Background The Doloplus-2 is used for behavioural pain assessment in cognitively impaired patients. Little data exists on the psychometric properties of the Doloplus-2. Our objectives were to test the criterion validity and inter-rater reliability of the Doloplus-2, and to explore a design for validations of behavioural pain assessment tools. Methods Fifty-one nursing home patients and 22 patients admitted to a geriatric hospital ward were included. All were cognitively impaired and unable to self-report pain. Each patient was examined by an expert in pain evaluation and treatment, who rated the pain on a numerical rating scale. The ratings were based on information from the medical record, reports from nurses and patients (if possible about pain during the past 24 hours, and a clinical examination. These ratings were used as pain criterion. The Doloplus-2 was administered by the attending nurse. Regression analyses were used to estimate the ability of the Doloplus-2 to explain the expert's ratings. The inter-rater reliability of the Doloplus-2 was evaluated in 16 patients by comparing the ratings of two nurses administrating the Doloplus-2. Results There was no association between the Doloplus-2 and the expert's pain ratings (R2 = 0.02. There was an association (R2 = 0.54 between the expert's ratings and the Doloplus-2 scores in a subgroup of 16 patients assessed by a geriatric expert nurse (the most experienced Doloplus-2 administrator. The inter-rater reliability between the Doloplus-2 administrators assessed by the intra-class coefficient was 0.77. The pain expert's ratings were compared with ratings of two independent geriatricians in a sub sample of 15, and were found satisfactory (intra-class correlation 0.74. Conclusion It was challenging to conduct such a study in patients with cognitive impairment and the study has several limitations. The results do not support the validity of the Doloplus-2 in its present version and they

  7. Inter-rater agreement of the PEWS tools used in Central Denmark Region

    DEFF Research Database (Denmark)

    Jensen, Claus Sixtus; Aagaard, Hanne; Olesen, Hanne Vebert

    2017-01-01

    BACKGROUND: Paediatric early warning score (PEWS) assessment tools can assist healthcare providers in the timely detection and recognition of subtle patient condition changes signalling clinical deterioration. However, PEWS tools instrument data are only as reliable and accurate as the caregivers...... agreement. The nurses assigned the exact same aggregated score for both PEWS models in 76% of the cases. In 98% of the PEWS assessments, the aggregated PEWS scores assigned by the nurses were equal to or below 1 point in both models. CONCLUSION: The study showed good to very good inter-rater reliability...

  8. Inter-rater Reliability of the Dysphagia Outcome and Severity Scale (DOSS): Effects of Clinical Experience, Audio-Recording and Training.

    Science.gov (United States)

    Zarkada, Angeliki; Regan, Julie

    2017-10-19

    The Dysphagia Outcome and Severity Scale (DOSS) is widely used to measure dysphagia severity based on videofluoroscopy (VFSS). This study investigated inter-rater reliability (IRR) of the DOSS. It also determined the effect of clinical experience, VFSS audio-recording and training on DOSS IRR. A quantitative prospective research design was used. Seventeen speech and language pathologists (SLPs) were recruited from an acute teaching hospital, Dublin (> 3 years' VFSS experience, n = 10) and from a postgraduate dysphagia programme in a university setting (training session on DOSS rating after which DOSS IRR was re-tested. Cohen's kappa co-efficient was used to establish IRR. IRR of the DOSS presented only fair agreement (κ = 0.36, p training (κ = 0.328) was significantly better comparing to post-training (κ = 0.218) (p < 0.05). Findings raise concerns as the DOSS is frequently used in clinical practice to capture dysphagia severity and to monitor changes.

  9. Nerve ultrasound reliability of upper limbs: Effects of examiner training.

    Science.gov (United States)

    Garcia-Santibanez, Rocio; Dietz, Alexander R; Bucelli, Robert C; Zaidman, Craig M

    2018-02-01

    Duration of training to reliably measure nerve cross-sectional area with ultrasound is unknown. A retrospective review was performed of ultrasound data, acquired and recorded by 2 examiners-an expert and either a trainee with 2 months (novice) or a trainee with 12 months (experienced) of experience. Data on median, ulnar, and radial nerves were reviewed for 42 patients. Interrater reliability was good and varied most with nerve site but little with experience. Coefficient of variation (CoV) range was 9.33%-22.5%. Intraclass correlation coefficient (ICC) was good to excellent (0.65-95) except ulnar nerve-wrist/forearm and radial nerve-humerus (ICC = 0.39-0.59). Interrater differences did not vary with nerve size or body mass index. Expert-novice and expert-experienced interrater differences and CoV were similar. The ulnar nerve-wrist expert-novice interrater difference decreased with time (r s  = -0.68, P = 0.001). A trainee with at least 2 months of experience can reliably measure upper limb nerves. Reliability varies by nerve and location and slightly improves with time. Muscle Nerve 57: 189-192, 2018. © 2017 Wiley Periodicals, Inc.

  10. Validity and Reliability of Turkish Male Breast Self-Examination Instrument.

    Science.gov (United States)

    Erkin, Özüm; Göl, İlknur

    2018-04-01

    This study aims to measure the validity and reliability of Turkish male breast self-examination (MBSE) instrument. The methodological study was performed in 2016 at Ege University, Faculty of Nursing, İzmir, Turkey. The MBSE includes ten steps. For validity studies, face validity, content validity, and construct validity (exploratory factor analysis) were done. For reliability study, Kuder Richardson was calculated. The content validity index was found to be 0.94. Kendall W coefficient was 0.80 (p=0.551). The total variance explained by the two factors was found to be 63.24%. Kuder Richardson 21 was done for reliability study and found to be 0.97 for the instrument. The final instrument included 10 steps and two stages. The Turkish version of MBSE is a valid and reliable instrument for early diagnose. The MBSE can be used in Turkish speaking countries and cultures with two stages and 10 steps.

  11. The reliability and validity of a sexual functioning questionnaire.

    Science.gov (United States)

    Corty, E W; Althof, S E; Kurit, D M

    1996-01-01

    The present study assessed the reliability and validity of a measure of sexual functioning, the CMSH-SFQ, for male patients and their partners. The CMSH-SFQ measures erectile and orgasmic functioning, sexual drive, frequency of sexual behavior, and sexual satisfaction. Test-retest reliability was assessed with 19 males and 19 females for the baseline CMSH-SFQ. Criterion validity was measured by comparing the answers of 25 male patients to those of their partners at baseline and follow-up. The majority of items had acceptable levels of reliability and validity. The CMSH-SFQ provides a reliable and valid device that can be used to measure global sexual functioning in men and their partners and may be used to evaluate the efficacy of treatments for sexual dysfunctions. Limitations and suggestions for use of the CMSH-SFQ are addressed.

  12. Reliability and validity of the McDonald Play Inventory.

    Science.gov (United States)

    McDonald, Ann E; Vigen, Cheryl

    2012-01-01

    This study examined the ability of a two-part self-report instrument, the McDonald Play Inventory, to reliably and validly measure the play activities and play styles of 7- to 11-yr-old children and to discriminate between the play of neurotypical children and children with known learning and developmental disabilities. A total of 124 children ages 7-11 recruited from a sample of convenience and a subsample of 17 parents participated in this study. Reliability estimates yielded moderate correlations for internal consistency, total test intercorrelations, and test-retest reliability. Validity estimates were established for content and construct validity. The results suggest that a self-report instrument yields reliable and valid measures of a child's perceived play performance and discriminates between the play of children with and without disabilities. Copyright © 2012 by the American Occupational Therapy Association, Inc.

  13. TWO CRITERIA FOR GOOD MEASUREMENTS IN RESEARCH: VALIDITY AND RELIABILITY

    Directory of Open Access Journals (Sweden)

    Haradhan Kumar Mohajan

    2017-12-01

    Full Text Available Reliability and validity are two most important and fundamental features in the evaluation of any measurement instrument or toll for a good research. The purpose of this research is to discuss the validity and reliability of measurement instruments that are used in research. Validity concerns what an instrument measures, and how well it does so. Reliability concerns the faith that one can have in the data obtained from use of an instrument, that is, the degree to which any measuring tool controls for random error. An attempt has been taken here to review the reliability and validity, and threat to them in some details.

  14. Reliability and Validity of the Assessment of Neurological Soft-Signs in Children with and without Attention-Deficit-Hyperactivity Disorder

    Science.gov (United States)

    Gustafsson, Peik; Svedin, Carl Goran; Ericsson, Ingegerd; Linden, Christian; Karlsson, Magnus K.; Thernlund, Gunilla

    2010-01-01

    Aim: To study the value and reliability of an examination of neurological soft-signs, often used in Sweden, in the assessment of children with attention-deficit-hyperactivity disorder (ADHD), by examining children with and without ADHD, as diagnosed by an experienced clinician using the DSM-III-R. Method: We have examined interrater reliability…

  15. Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire

    Science.gov (United States)

    Akmaz, Hazel Ekin; Uyar, Meltem; Kuzeyli Yıldırım, Yasemin; Akın Korhan, Esra

    2018-05-29

    Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Methodological and cross sectional study. A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance of chronic pain.

  16. Preliminary findings on the reliability and validity of the Cantonese Birmingham Cognitive Screen in patients with acute ischemic stroke

    Directory of Open Access Journals (Sweden)

    Pan X

    2015-09-01

    Full Text Available Xiaoping Pan,1,* Haobo Chen,1,2,* Wai-Ling Bickerton,2 Johnny King Lam Lau,2 Anthony Pak Hin Kong,3 Pia Rotshtein,2 Aihua Guo,1 Jianxi Hu,1 Glyn W Humphreys4 1Department of Neurology, Guangzhou First People’s Hospital, Guangzhou Medical University, Guangzhou, People’s Republic of China; 2School of Psychology, University of Birmingham, Birmingham, UK; 3Department of Communication Sciences and Disorders, University of Central Florida, Orlando, FL, USA; 4Department of Experimental Psychology, University of Oxford, Oxford, UK *These authors contributed equally to this work Background: There are no currently effective cognitive assessment tools for patients who have suffered stroke in the People’s Republic of China. The Birmingham Cognitive Screen (BCoS has been shown to be a promising tool for revealing patients’ poststroke cognitive deficits in specific domains, which facilitates more individually designed rehabilitation in the long run. Hence we examined the reliability and validity of a Cantonese version BCoS in patients with acute ischemic stroke, in Guangzhou.Method: A total of 98 patients with acute ischemic stroke were assessed with the Cantonese version of the BCoS, and an additional 133 healthy individuals were recruited as controls. Apart from the BCoS, the patients also completed a number of external cognitive tests, including the Montreal Cognitive Assessment Test (MoCA, Mini Mental State Examination (MMSE, Albert’s cancellation test, the Rey–Osterrieth Complex Figure Test, and six gesture matching tasks. Cutoff scores for failing each subtest, ie, deficits, were computed based on the performance of the controls. The validity and reliability of the Cantonese BCoS were examined, as well as interrater and test–retest reliability. We also compared the proportions of cases being classified as deficits in controlled attention, memory, character writing, and praxis, between patients with and without spoken language impairment

  17. Reliability and validity of the Incontinence Quiz-Turkish version.

    Science.gov (United States)

    Kara, Kerime C; Çıtak Karakaya, İlkim; Tunalı, Nur; Karakaya, Mehmet G

    2018-01-01

    The aim of this study was to investigate the reliability and validity of the Turkish version of the Incontinence Quiz, which was developed by Branch et al. (1994), to assess women's knowledge of and attitudes toward urinary incontinence. Comprehensibility of the Turkish version of the 14-item Incontinence Quiz, which was prepared following translation-back translation procedures, was tested on a pilot group of eight women, and its internal reliability, test-retest reliability and construct validity were assessed in 150 women who attended the gynecology clinics of three hospitals in İçel, Turkey. Physical and sociodemographic characteristics and presence of incontinence complaints were also recorded. Data were analyzed at the 0.05 alpha level, using SPSS version 22. The scale had good reliability and validity. The internal reliability coefficient (Cronbach α) was 0.80, test-retest correlation coefficients were 0.83-0.94; and with regard to construct validity, Kaiser-Meyer-Olkin coefficient was 0.76 and Barlett sphericity test was 562.777 (P = 0.000). Turkish version of the Incontinence Quiz had a four-factor structure, with Eigenvalues ranging from 1.17 to 4.08. The Incontinence Quiz-Turkish version is a highly comprehensible, reliable and valid scale, which may be used to assess Turkish-speaking women's knowledge of and attitudes toward urinary incontinence. © 2017 Japan Society of Obstetrics and Gynecology.

  18. Validation in Colombia of the Oswestry disability questionnaire in patients with low back pain.

    Science.gov (United States)

    Payares, Kelly; Lugo, Luz Helena; Morales, Victoria; Londoño, Alejandro

    2011-12-15

    Observational study to validate a scale. To translate, culturally adapt, and validate the Oswestry Disability Index (ODI), version 2.1a. The ODI is one of the most frequently used tools to evaluate disability in patients with low back pain. Its psychometric properties have shown to be highly reliable. Currently, no validated Colombian version is available. The ODI (2.1a) was translated into Spanish and this translated version was analyzed in terms of semantic and linguistic equivalence. Then, the Spanish version was translated back into English. The first time, the ODI was administered to a total of 111 patients with back pain. Internal consistency, construct validity, content validity and criterion validity were evaluated for the scale. The inter-rater reliability was evaluated by 2 different observers a day apart from each other and the intra-rater reliability was determined by the same observer, 7 days apart. A sensitivity-to-change analysis was performed on 81 patients. Of the sample, 67.6% were women, with a mean (SD) age of 44.88 (16.38) years. Cronbach alpha coefficient was 0.86. Inter-rater reliability yielded an intraclass correlation coefficient (ICC) of 0.94 whereas intrarater reliability yielded an ICC of 0.95. Pearson correlation between ODI and each of the 8 domains of SF-36, was statistically significant. Construct validity, when comparing extremely acute and chronic groups, did not show any differences (P = 0.409). Concurrent criterion validity between ODI and Roland-Morris Disability Questionnaire (RMQ) was r = 0.75; between ODI and the Visual Analog Scale (VAS) was r = 0.540. For patients who received an intervention, the value of this change was 1.2. ODI-C is a helpful, reliable and valid tool in Colombia for back pain patient follow-up and assessment, regardless the stage of the evolution. It is an observational study to validate the Oswestry disability index (ODI) in the Spanish language. ODI is the most used tool in evaluating disability

  19. Psychometric validation of the Chinese version of the Johns Hopkins Fall Risk Assessment Tool for older Chinese inpatients.

    Science.gov (United States)

    Zhang, Junhong; Wang, Min; Liu, Yu

    2016-10-01

    To culturally adapt and evaluate the reliability and validity of the Chinese version of the Johns Hopkins Fall Risk Assessment Tool among older inpatients in the mainland of China. Patient falls are an important safety consideration within hospitals among older inpatients. Nurses need specific risk assessment tools for older inpatients to reliably identify at-risk populations and guide interventions that highlight fixable risk factors for falls and consequent injuries. In China, a few tools have been developed to measure fall risk. However, they lack the solid psychometric development necessary to establish their validity and reliability, and they are not widely used for elderly inpatients. A cross-sectional study. A convenient sampling was used to recruit 201 older inpatients from two tertiary-level hospitals in Beijing and Xiamen, China. The Johns Hopkins Fall Risk Assessment Tool was translated using forward and backward translation procedures and was administered to these 201 older inpatients. Reliability of the tool was calculated by inter-rater reliability and Cronbach's alpha. Validity was analysed through content validity index and construct validity. The Inter-rater reliability of Chinese version of Johns Hopkins Fall Risk Assessment Tool was 97·14% agreement with Cohen's Kappa of 0·903. Cronbach's α was 0·703. Content of Validity Index was 0·833. Two factors represented intrinsic and extrinsic risk factors were explored that together explained 58·89% of the variance. This study provided evidence that Johns Hopkins Fall Risk Assessment Tool is an acceptable, valid and reliable tool to identify older inpatients at risk of falls and falls with injury. Further psychometric testing on criterion validity and evaluation of its advanced utility in geriatric clinical settings are warranted. The Chinese version of Johns Hopkins Fall Risk Assessment Tool may be useful for health care personnel to identify older Chinese inpatients at risk of falls and falls

  20. Validity and Reliability in Social Science Research

    Science.gov (United States)

    Drost, Ellen A.

    2011-01-01

    In this paper, the author aims to provide novice researchers with an understanding of the general problem of validity in social science research and to acquaint them with approaches to developing strong support for the validity of their research. She provides insight into these two important concepts, namely (1) validity; and (2) reliability, and…

  1. Ethical Implications of Validity-vs.-Reliability Trade-Offs in Educational Research

    Science.gov (United States)

    Fendler, Lynn

    2016-01-01

    In educational research that calls itself empirical, the relationship between validity and reliability is that of trade-off: the stronger the bases for validity, the weaker the bases for reliability (and vice versa). Validity and reliability are widely regarded as basic criteria for evaluating research; however, there are ethical implications of…

  2. Validity and Reliability of the 8-Item Work Limitations Questionnaire.

    Science.gov (United States)

    Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

    2017-12-01

    Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.

  3. Educational testing validity and reliability in pharmacy and medical education literature.

    Science.gov (United States)

    Hoover, Matthew J; Jung, Rose; Jacobs, David M; Peeters, Michael J

    2013-12-16

    To evaluate and compare the reliability and validity of educational testing reported in pharmacy education journals to medical education literature. Descriptions of validity evidence sources (content, construct, criterion, and reliability) were extracted from articles that reported educational testing of learners' knowledge, skills, and/or abilities. Using educational testing, the findings of 108 pharmacy education articles were compared to the findings of 198 medical education articles. For pharmacy educational testing, 14 articles (13%) reported more than 1 validity evidence source while 83 articles (77%) reported 1 validity evidence source and 11 articles (10%) did not have evidence. Among validity evidence sources, content validity was reported most frequently. Compared with pharmacy education literature, more medical education articles reported both validity and reliability (59%; particles in pharmacy education compared to medical education, validity, and reliability reporting were limited in the pharmacy education literature.

  4. Feasibility and reliability of a newly developed antenatal risk score card in routine care

    NARCIS (Netherlands)

    E. Birnie; E.A.P. Steegers; Drs. H.W. Torij; M.J. Veen; J. Poeran; G.J. Bonsel

    2015-01-01

    A population-based cross-sectional study (feasibility) and a cohort study (inter-rater reliability) to study in routine care the feasibility and inter-rater reliability of the Rotterdam Reproductive Risk Reduction risk score card (R4U), a new semi-quantitative score card for use during the antenatal

  5. Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire

    Directory of Open Access Journals (Sweden)

    Hazel Ekin Akmaz

    2018-05-01

    Full Text Available Background: Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. Aims: To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Study Design: Methodological and cross sectional study. Methods: A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. Results: The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. Conclusion: The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance

  6. The Validity and Reliability of the Mini-Mental State Examination-2 for Detecting Mild Cognitive Impairment and Alzheimer's Disease in a Korean Population.

    Directory of Open Access Journals (Sweden)

    Min Jae Baek

    Full Text Available To examine the validity and reliability of the MMSE-2 for assessing patients with mild cognitive impairment (MCI and Alzheimer's disease (AD in a Korean population. Specifically, the usefulness of the MMSE-2 as a screening measure for detecting early cognitive change, which has not been detectable through the MMSE, was examined.Two-hundred and twenty-six patients with MCI, 97 patients with AD, and 91 healthy older adults were recruited. All participants consented to examination with the MMSE-2, the MMSE, and other detailed neuropsychological assessments.The MMSE-2 performed well in discriminating participants across Clinical Dementia Rating (CDR stages and CDR-Sum of Boxes (CDR-SOB, and it showed excellent internal consistency, high test-retest reliability, high interrater reliability, and good concurrent validity with the MMSE and other detailed neuropsychological assessments. The MMSE-2 was divided into two factors (tests that are sensitive to decline in cognitive functions vs. tests that are not sensitive to decline in cognitive functions in normal cognitive aging. Moreover, the MMSE-2 was divided into two factors (tests related overall cognitive functioning other than memory vs. tests related to episodic memory in patients with AD. Finally, the MMSE-2 was divided into three factors (tests related to working memory and frontal lobe functioning vs. tests related to verbal memory vs. tests related to orientation and immediate recall in patients with MCI. The sensitivity and specificity of the three versions of the MMSE-2 were relatively high in discriminating participants with normal cognitive aging from patients with MCI and AD.The MMSE-2 is a valid and reliable cognitive screening instrument for assessing cognitive impairment in a Korean population, but its ability to distinguish patients with MCI from those with normal cognitive aging may not be as highly sensitive as expected.

  7. The Validity and Reliability of the Mini-Mental State Examination-2 for Detecting Mild Cognitive Impairment and Alzheimer's Disease in a Korean Population.

    Science.gov (United States)

    Baek, Min Jae; Kim, Karyeong; Park, Young Ho; Kim, SangYun

    To examine the validity and reliability of the MMSE-2 for assessing patients with mild cognitive impairment (MCI) and Alzheimer's disease (AD) in a Korean population. Specifically, the usefulness of the MMSE-2 as a screening measure for detecting early cognitive change, which has not been detectable through the MMSE, was examined. Two-hundred and twenty-six patients with MCI, 97 patients with AD, and 91 healthy older adults were recruited. All participants consented to examination with the MMSE-2, the MMSE, and other detailed neuropsychological assessments. The MMSE-2 performed well in discriminating participants across Clinical Dementia Rating (CDR) stages and CDR-Sum of Boxes (CDR-SOB), and it showed excellent internal consistency, high test-retest reliability, high interrater reliability, and good concurrent validity with the MMSE and other detailed neuropsychological assessments. The MMSE-2 was divided into two factors (tests that are sensitive to decline in cognitive functions vs. tests that are not sensitive to decline in cognitive functions) in normal cognitive aging. Moreover, the MMSE-2 was divided into two factors (tests related overall cognitive functioning other than memory vs. tests related to episodic memory) in patients with AD. Finally, the MMSE-2 was divided into three factors (tests related to working memory and frontal lobe functioning vs. tests related to verbal memory vs. tests related to orientation and immediate recall) in patients with MCI. The sensitivity and specificity of the three versions of the MMSE-2 were relatively high in discriminating participants with normal cognitive aging from patients with MCI and AD. The MMSE-2 is a valid and reliable cognitive screening instrument for assessing cognitive impairment in a Korean population, but its ability to distinguish patients with MCI from those with normal cognitive aging may not be as highly sensitive as expected.

  8. The Reliability of Assessing Radiographic Healing of Osteochondritis Dissecans of the Knee.

    Science.gov (United States)

    Wall, Eric J; Milewski, Matthew D; Carey, James L; Shea, Kevin G; Ganley, Theodore J; Polousky, John D; Grimm, Nathan L; Eismann, Emily A; Jacobs, Jake C; Murnaghan, Lucas; Nissen, Carl W; Myer, Gregory D; Weiss, Jennifer; Edmonds, Eric W; Anderson, Allen F; Lyon, Roger M; Heyworth, Benton E; Fabricant, Peter D; Zbojniewicz, Andy

    2017-05-01

    The reliability of assessing healing on plain radiographs has not been well-established for knee osteochondritis dissecans (OCD). To determine the inter- and intrarater reliability of specific radiographic criteria in judging healing of femoral condyle OCD. Cohort study (Diagnosis); Level of evidence, 3. Ten orthopedic sports surgeons rated the radiographic healing of 30 knee OCD lesions at 2 time points, a minimum of 1 month apart. First, raters compared pretreatment and 2-year follow-up radiographs on "overall healing" and on 5 subfeatures of healing, including OCD boundary, sclerosis, size, shape, and ossification using a continuous slider scale. "Overall healing" was also rated using a 7-tier ordinal scale. Raters then compared the same 30 pretreatment knee radiographs in a stepwise progression to the 2-, 4-, 7-, 12-, and 24-month follow-up radiographs on "overall healing" using a continuous slider scale. Interrater and intrarater reliability were assessed using intraclass correlations (ICC) derived from a 2-way mixed effects analysis of variance for absolute agreement. Overall healing of the OCD lesions from pretreatment to 2-year follow-up radiographs was rated with excellent interrater reliability (ICC = 0.94) and intrarater reliability (ICC = 0.84) when using a continuous scale. The reliability of the 5 subfeatures of healing was also excellent (interrater ICCs of 0.87-0.89; intrarater ICCs of 0.74-0.84). The 7-tier ordinal scale rating of overall healing had lower interrater (ICC = 0.61) and intrarater (ICC = 0.68) reliability. The overall healing of OCD lesions at the 5 time points up to 24 months had interrater ICCs of 0.81-0.88 and intrarater ICCs of 0.65-0.70. Interrater reliability was excellent when judging the overall healing of OCD femoral condyle lesions on radiographs as well as on 5 specific features of healing on 2-year follow-up radiographs. Continuous scale rating of OCD radiographic healing yielded higher reliability than the ordinal scale

  9. Validity and reliability of the NAB Naming Test.

    Science.gov (United States)

    Sachs, Bonnie C; Rush, Beth K; Pedraza, Otto

    2016-05-01

    Confrontation naming is commonly assessed in neuropsychological practice, but few standardized measures of naming exist and those that do are susceptible to the effects of education and culture. The Neuropsychological Assessment Battery (NAB) Naming Test is a 31-item measure used to assess confrontation naming. Despite adequate psychometric information provided by the test publisher, there has been limited independent validation of the test. In this study, we investigated the convergent and discriminant validity, internal consistency, and alternate forms reliability of the NAB Naming Test in a sample of adults (Form 1: n = 247, Form 2: n = 151) clinically referred for neuropsychological evaluation. Results indicate adequate-to-good internal consistency and alternate forms reliability. We also found strong convergent validity as demonstrated by relationships with other neurocognitive measures. We found preliminary evidence that the NAB Naming Test demonstrates a more pronounced ceiling effect than other commonly used measures of naming. To our knowledge, this represents the largest published independent validation study of the NAB Naming Test in a clinical sample. Our findings suggest that the NAB Naming Test demonstrates adequate validity and reliability and merits consideration in the test arsenal of clinical neuropsychologists.

  10. Validation of a survey instrument to assess home environments for physical activity and healthy eating in overweight children

    Directory of Open Access Journals (Sweden)

    Crane Lori A

    2008-01-01

    Full Text Available Abstract Background Few measures exist to measure the overall home environment for its ability to support physical activity (PA and healthy eating in overweight children. The purpose of this study was to develop and test the reliability and validity of such a measure. Methods The Home Environment Survey (HES was developed to reflect availability, accessibility, parental role modelling, and parental policies related to PA resources, fruits and vegetables (F&V, and sugar sweetened drinks and snacks (SS. Parents of overweight children (n = 219 completed the HES and concurrent behavioural assessments. Children completed the Block Kids survey and wore an accelerometer for one week. A subset of parents (n = 156 completed the HES a second time to determine test-retest reliability. Finally, 41 parent dyads living in the same home (n = 41 completed the survey to determine inter-rater reliability. Initial psychometric analyses were completed to trim items from the measure based on lack of variability in responses, moderate or higher item to scale correlation, or contribution to strong internal consistency. Inter-rater and test-retest reliability were completed using intraclass correlation coefficients. Validity was assessed using Pearson correlations between the HES scores and child and parent nutrition and PA. Results Eight items were removed and acceptable internal consistency was documented for all scales (α = .66–84 with the exception of the F&V accessibility. The F&V accessibility was reduced to a single item because the other two items did not meet reliability standards. Test-retest reliability was high (r > .75 for all scales. Inter-rater reliability varied across scales (r = .22–.89. PA accessibility, parent role modelling, and parental policies were all related significantly to child (r = .14–.21 and parent (r = .15–.31 PA. Similarly, availability of F&V and SS, parental role modelling, and parental policies were related to child (r

  11. Translation, adaptation and validation of "Community Integration Questionnaire"

    Directory of Open Access Journals (Sweden)

    Helena Maria Silveira Fraga-Maia

    2015-05-01

    Full Text Available Objective: To translate, adapt, and validate the "Community Integration Questionnaire (CIQ," a tool that evaluates community integration after traumatic brain injury (TBI.Methods: A study of 61 TBI survivors was carried out. The appraisal of the measurement equivalence was based on a reliability assessment by estimating inter-rater agreement, item-scale correlation and internal consistency of CIQ scales, concurrent validity, and construct validity.Results: Inter-rater agreement ranged from substantial to almost perfect. The item-scale correlations were generally higher between the items and their respective domains, whereas the intra-class correlation coefficients were high for both the overall scale and the CIQ domains. The correlation between the CIQ and Disability Rating Scale (DRS, the Extended Glasgow Outcome Scale (GOSE, and the Rancho Los Amigos Level of Cognitive Functioning Scale (RLA reached values considered satisfactory. However, the factor analysis generated four factors (dimensions that did not correspond with the dimensional structure of the original tool.Conclusion: The resulting tool herein may be useful in globally assessing community integration after TBI in the Brazilian context, at least until new CIQ psychometric assessment studies are developed with larger samples.

  12. Interrater reliability of the Volume-Viscosity Swallow Test; screening for dysphagia among hospitalized elderly medical patients

    DEFF Research Database (Denmark)

    Jørgensen, Lise Walther; Søndergaard, Kasper; Melgaard, Dorte

    2017-01-01

    Background: Oropharyngeal dysphagia (OD) is prevalent among medical and geriatric patients admitted due to acute illness and it is associated with malnutrition, increased length of stay and increased mortality. A valid and reliable bedside screening test for patients at risk of OD is essential...... in order to detect patients in need of further assessment. The Volume-Viscosity Swallow Test (V-VST) has been shown to be a valid screening test for OD in mixed outpatient populations. However, as reliability of the test has yet to be investigated in a population of medical and geriatric patients admitted...... skilled occupational therapists examined an unselected group of 110 patients admitted to geriatric or medical wards. In an overall agreement phase raters reached ≥80% agreement before data collection phase was commenced. The V-VST was applied to patients twice within maximum one hour by raters who...

  13. Outcomes validity and reliability of the modified Rankin scale: implications for stroke clinical trials: a literature review and synthesis.

    Science.gov (United States)

    Banks, Jamie L; Marotta, Charles A

    2007-03-01

    The modified Rankin scale (mRS), a clinician-reported measure of global disability, is widely applied for evaluating stroke patient outcomes and as an end point in randomized clinical trials. Extensive evidence on the validity of the mRS exists across a large but fragmented literature. As new treatments for acute ischemic stroke are submitted for agency approval, an appreciation of the mRS's attributes, specifically its relationship to other stroke evaluation scales, would be valuable for decision-makers to properly assess the impact of a new drug on treatment paradigms. The purpose of this report is to assemble and systematically assess the properties of the mRS to provide decision-makers with pertinent evaluative information. A Medline search was conducted to identify reports in the peer-reviewed medical literature (1957-2006) that provide information on the structure, validation, scoring, and psychometric properties of the mRS and its use in clinical trials. The selection of articles was based on defined criteria that included relevance, study design and use of appropriate statistical methods. Of 224 articles identified by the literature search, 50 were selected for detailed assessment. Inter-rater reliability with the mRS is moderate and improves with structured interviews (kappa 0.56 versus 0.78); strong test-re-test reliability (kappa=0.81 to 0.95) has been reported. Numerous studies demonstrate the construct validity of the mRS by its relationships to physiological indicators such as stroke type, lesion size, perfusion and neurological impairment. Convergent validity between the mRS and other disability scales is well documented. Patient comorbidities and socioeconomic factors should be considered in properly applying and interpreting the mRS. Recent analyses suggest that randomized clinical trials of acute stroke treatments may require a smaller sample size if the mRS is used as a primary end point rather than the Barthel Index. Multiple types of evidence

  14. Cross-cultural validation of the Persian version of the Functional Independence Measure for patients with stroke.

    Science.gov (United States)

    Naghdi, Soofia; Ansari, Noureddin Nakhostin; Raji, Parvin; Shamili, Aryan; Amini, Malek; Hasson, Scott

    2016-01-01

    To translate and cross-culturally adapt the Functional Independence Measure (FIM) into the Persian language and to test the reliability and validity of the Persian FIM (PFIM) in patients with stroke. In this cross-sectional study carried out in an outpatient stroke rehabilitation center, 40 patients with stroke (mean age 60 years) were participated. A standard forward-backward translation method and expert panel validation was followed to develop the PFIM. Two experienced occupational therapists (OTs) assessed the patients independently in all items of the PFIM in a single session for inter-rater reliability. One of the OTs reassessed the patients after 1 week for intra-rater reliability. There were no floor or ceiling effects for the PFIM. Excellent inter-rater and intra-rater reliability was noted for the PFIM total score, motor and cognitive subscales (ICC(agreement)0.88-0.98). According to the Bland-Altman agreement analysis, there was no systematic bias between raters and within raters. The internal consistency of the PFIM was with Cronbach's alpha from 0.70 to 0.96. The principal component analysis with varimax rotation indicated a three-factor structure: (1) self-care and mobility; (2) sphincter control and (3) cognitive that jointly accounted for 74.8% of the total variance. Construct validity was supported by a significant Pearson correlation between the PFIM and the Persian Barthel Index (r = 0.95; p Persian patients with stroke. The Functional Independence Measure (FIM) is an outcome measure for disability based on the International Classification of Functioning, Disability and Health (ICF). The FIM was cross-culturally adapted and validated into Persian language. The Persian version of the FIM (PFIM) is reliable and valid for assessing functional status of patients with stroke. The PFIM can be used in Persian speaking countries to assess the limitations in activities of daily living of patients with stroke.

  15. Development of a new assessment tool for cervical myelopathy using hand-tracking sensor: Part 1: validity and reliability.

    Science.gov (United States)

    Alagha, M Abdulhadi; Alagha, Mahmoud A; Dunstan, Eleanor; Sperwer, Olaf; Timmins, Kate A; Boszczyk, Bronek M

    2017-04-01

    To assess the reliability and validity of a hand motion sensor, Leap Motion Controller (LMC), in the 15-s hand grip-and-release test, as compared against human inspection of an external digital camera recording. Fifty healthy participants were asked to fully grip-and-release their dominant hand as rapidly as possible for two trials with a 10-min rest in-between, while wearing a non-metal wrist splint. Each test lasted for 15 s, and a digital camera was used to film the anterolateral side of the hand on the first test. Three assessors counted the frequency of grip-and-release (G-R) cycles independently and in a blinded fashion. The average mean of the three was compared with that measured by LMC using the Bland-Altman method. Test-retest reliability was examined by comparing the two 15-s tests. The mean number of G-R cycles recorded was: 47.8 ± 6.4 (test 1, video observer); 47.7 ± 6.5 (test 1, LMC); and 50.2 ± 6.5 (test 2, LMC). Bland-Altman indicated good agreement, with a low bias (0.15 cycles) and narrow limits of agreement. The ICC showed high inter-rater agreement and the coefficient of repeatability for the number of cycles was ±5.393, with a mean bias of 3.63. LMC appears to be valid and reliable in the 15-s grip-and-release test. This serves as a first step towards the development of an objective myelopathy assessment device and platform for the assessment of neuromotor hand function in general. Further assessment in a clinical setting and to gauge healthy benchmark values is warranted.

  16. Validating the Danish adaptation of the World Health Organization's International Classification for Patient Safety classification of patient safety incident types

    DEFF Research Database (Denmark)

    Mikkelsen, Kim Lyngby; Thommesen, Jacob; Andersen, Henning Boje

    2013-01-01

    Objectives Validation of a Danish patient safety incident classification adapted from the World Health Organizaton's International Classification for Patient Safety (ICPS-WHO). Design Thirty-three hospital safety management experts classified 58 safety incident cases selected to represent all types.......513 (range: 0.193–0.804). Kappa and ICC showed high correlation (r = 0.99). An inverse correlation was found between the prevalence of type and inter-rater reliability. Results are discussed according to four factors known to determine the inter-rater agreement: skill and motivation of raters; clarity...

  17. Reliability of the Brazilian Portuguese version of the Gross Motor Function Measure in children with cerebral palsy

    Science.gov (United States)

    Almeida, Kênnea M.; Albuquerque, Karolina A.; Ferreira, Marina L.; Aguiar, Stéphany K. B.; Mancini, Marisa C.

    2016-01-01

    OBJECTIVE: To test the intra- and interrater reliability of the Brazilian Portuguese version of the 66-item Gross Motor Function Measure (GMFM-66). METHOD: The sample included 48 children with cerebral palsy (CP), ranging from 2-17 years old, classified at levels I to IV of the Gross Motor Function Classification System (GMFCS) and four child rehabilitation examiners. A main examiner evaluated all children using the GMFM-66 and video-recorded the assessments. The other examiners watched the video recordings and scored them independently for the assessment of interrater reliability. For the intrarater reliability evaluation, the main examiner watched the video recordings one month after the evaluation and re-scored each child. We calculated reliability by using intraclass correlation coefficients (ICC) with their respective 95% confidence intervals. RESULTS: Excellent test reliability was documented. The intrarater reliability of the total sample was ICC=0.99 (95% CI 0.98-0.99), and the interrater reliability was ICC=0.97 (95% CI 0.95-0.98). The reliability across GMFCS levels ranged from ICC=0.92 (95% CI 0.72-0.98) to ICC=0.99 (95% CI 0.99-0.99); the lowest value was the interrater reliability for the GMFCS IV group. Reliability in the five GMFM dimensions varied from ICC=0.95 (95% CI 0.93-0.97) to ICC=0.99 (95% CI 0.99-0.99). CONCLUSION: The Brazilian Portuguese version of the GMFM-66 showed excellent intra- and interrater reliability when used in Brazilian children with CP levels GMFCS I to IV. PMID:26786081

  18. Evaluating the reliability of an injury prevention screening tool: Test-retest study.

    Science.gov (United States)

    Gittelman, Michael A; Kincaid, Madeline; Denny, Sarah; Wervey Arnold, Melissa; FitzGerald, Michael; Carle, Adam C; Mara, Constance A

    2016-10-01

    A standardized injury prevention (IP) screening tool can identify family risks and allow pediatricians to address behaviors. To assess behavior changes on later screens, the tool must be reliable for an individual and ideally between household members. Little research has examined the reliability of safety screening tool questions. This study utilized test-retest reliability of parent responses on an existing IP questionnaire and also compared responses between household parents. Investigators recruited parents of children 0 to 1 year of age during admission to a tertiary care children's hospital. When both parents were present, one was chosen as the "primary" respondent. Primary respondents completed the 30-question IP screening tool after consent, and they were re-screened approximately 4 hours later to test individual reliability. The "second" parent, when present, only completed the tool once. All participants received a 10-dollar gift card. Cohen's Kappa was used to estimate test-retest reliability and inter-rater agreement. Standard test-retest criteria consider Kappa values: 0.0 to 0.40 poor to fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and 0.81 to 1.00 as almost perfect reliability. One hundred five families participated, with five lost to follow-up. Thirty-two (30.5%) parent dyads completed the tool. Primary respondents were generally mothers (88%) and Caucasian (72%). Test-retest of the primary respondents showed their responses to be almost perfect; average 0.82 (SD = 0.13, range 0.49-1.00). Seventeen questions had almost perfect test-retest reliability and 11 had substantial reliability. However, inter-rater agreement between household members for 12 objective questions showed little agreement between responses; inter-rater agreement averaged 0.35 (SD = 0.34, range -0.19-1.00). One question had almost perfect inter-rater agreement and two had substantial inter-rater agreement. The IP screening tool used by a single individual had excellent

  19. Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.

    Science.gov (United States)

    Lou, Xiaoying; Lee, Richard; Feins, Richard H; Enter, Daniel; Hicks, George L; Verrier, Edward D; Fann, James I

    2014-12-01

    Previous work has demonstrated high inter-rater reliability in the objective assessment of simulated anastomoses among experienced educators. We evaluated the inter-rater reliability of less-experienced educators and the impact of focused training with a video-embedded coronary anastomosis assessment tool. Nine less-experienced cardiothoracic surgery faculty members from different institutions evaluated 2 videos of simulated coronary anastomoses (1 by a medical student and 1 by a resident) at the Thoracic Surgery Directors Association Boot Camp. They then underwent a 30-minute training session using an assessment tool with embedded videos to anchor rating scores for 10 components of coronary artery anastomosis. Afterward, they evaluated 2 videos of a different student and resident performing the task. Components were scored on a 1 to 5 Likert scale, yielding an average composite score. Inter-rater reliabilities of component and composite scores were assessed using intraclass correlation coefficients (ICCs) and overall pass/fail ratings with kappa. All components of the assessment tool exhibited improvement in reliability, with 4 (bite, needle holder use, needle angles, and hand mechanics) improving the most from poor (ICC range, 0.09-0.48) to strong (ICC range, 0.80-0.90) agreement. After training, inter-rater reliabilities for composite scores improved from moderate (ICC, 0.76) to strong (ICC, 0.90) agreement, and for overall pass/fail ratings, from poor (kappa = 0.20) to moderate (kappa = 0.78) agreement. Focused, video-based anchor training facilitates greater inter-rater reliability in the objective assessment of simulated coronary anastomoses. Among raters with less teaching experience, such training may be needed before objective evaluation of technical skills. Published by Elsevier Inc.

  20. Relative and Absolute Interrater Reliabilities of a Hand-Held Myotonometer to Quantify Mechanical Muscle Properties in Patients with Acute Stroke in an Inpatient Ward

    Directory of Open Access Journals (Sweden)

    Wai Leung Ambrose Lo

    2017-01-01

    Full Text Available Introduction. The reliability of using MyotonPRO to quantify muscles mechanical properties in a ward setting for the acute stroke population remains unknown. Aims. To investigate the within-session relative and absolute interrater reliability of MyotonPRO. Methods. Mechanical properties of biceps brachii, brachioradialis, rectus femoris, and tibialis anterior were recorded at bedside. Participants were within 1 month of the first occurrence of stroke. Relative reliability was assessed by intraclass correlation coefficient (ICC. Absolute reliability was assessed by standard error of measurement (SEM, SEM%, smallest real difference (SRD, SRD%, and the Bland-Altman 95% limits of agreement. Results. ICCs of all studied muscles ranged between 0.63 and 0.97. The SEM of all muscles ranged within 0.30–0.88 Hz for tone, 0.07–0.19 for decrement, 6.42–20.20 N/m for stiffness, and 0.04–0.07 for creep. The SRD of all muscles ranged within 0.70–2.05 Hz for tone, 0.16–0.45 for decrement, 14.98–47.15 N/m for stiffness, and 0.09–0.17 for creep. Conclusions. MyotonPRO demonstrated acceptable relative and absolute reliability in a ward setting for patients with acute stroke. However, results must be interpreted with caution, due to the varying level of consistency between different muscles, as well as between different parameters within a muscle.

  1. The Validity and Reliability of the Mobbing Scale (MS)

    Science.gov (United States)

    Yaman, Erkan

    2009-01-01

    The aim of this research is to develop the Mobbing Scale and examine its validity and reliability. The sample of the study consisted of 515 persons from Sakarya and Bursa. In this study, construct validity, internal consistency, test-retest reliability, and item analysis of the scale were examined. As a result of factor analysis for construct…

  2. Validity and Reliability of Agoraphobic Cognitions Questionnaire-Turkish Version

    Directory of Open Access Journals (Sweden)

    Ayşegül KART

    2013-11-01

    Full Text Available Validity and Reliability of Agoraphobic Cognitions Questionnaire-Turkish Version Objective: The aim of this study is to investigate the validity and reliability of Agoraphobic Cognitions Questionnaire -Turkish Version (ACQ. Method: ACQ was administered to 92 patients with agoraphobia or panic disorder with agoraphobia. BSQ Turkish version completed by translation, back-translation and pilot assessment. Reliability of ACQ was analyzed by test-retest correlation, split-half technique, Cronbach’s alpha coefficient. Construct validity was evaluated by factor analysis after the Kaiser-Meyer-Olkin (KMO and Bartlett test had been performed. Principal component analysis and varimax rotation used for factor analysis. Results: 64% of patients evaluated in the study were female and 36% were male. Age interval was between 18 and 58, mean age was 31.5±10.4. The Cronbach’s alpha coefficient was 0.91. Analysis of test-retest evaluations revealed that there were statistically significant correlations ranging between 24% and 84% concerning questionnaire components. In analysis performed by split-half method reliability coefficients of half questionnaires were found as 0.77 and 0.91. Again Spearmen-Brown coefficient was found as 0.87 by the same analysis. To assess construct validity of ACQ, factor analysis was performed and two basic factors found. These two factors explained 57.6% of the total variance. (Factor 1: 34.6%, Factor 2: 23% Conclusion: Our findings support that ACQ-Turkish version had a satisfactory level of reliability and validity

  3. Predictive validity of the Hendrich fall risk model II in an acute geriatric unit.

    Science.gov (United States)

    Ivziku, Dhurata; Matarese, Maria; Pedone, Claudio

    2011-04-01

    Falls are the most common adverse events reported in acute care hospitals, and older patients are the most likely to fall. The risk of falling cannot be completely eliminated, but it can be reduced through the implementation of a fall prevention program. A major evidence-based intervention to prevent falls has been the use of fall-risk assessment tools. Many tools have been increasingly developed in recent years, but most instruments have not been investigated regarding reliability, validity and clinical usefulness. This study intends to evaluate the predictive validity and inter-rater reliability of Hendrich fall risk model II (HFRM II) in order to identify older patients at risk of falling in geriatric units and recommend its use in clinical practice. A prospective descriptive design was used. The study was carried out in a geriatric acute care unit of an Italian University hospital. All over 65 years old patients consecutively admitted to a geriatric acute care unit of an Italian University hospital over 8-month period were enrolled. The patients enrolled were screened for the falls risk by nurses with the HFRM II within 24h of admission. The falls occurring during the patient's hospital stay were registered. Inter-rater reliability, area under the ROC curve, sensitivity, specificity, positive and negative predictive values and time for the administration were evaluated. 179 elderly patients were included. The inter-rater reliability was 0.87 (95% CI 0.71-1.00). The administration time was about 1min. The most frequently reported risk factors were depression, incontinence, vertigo. Sensitivity and specificity were respectively 86% and 43%. The optimal cut-off score for screening at risk patients was 5 with an area under the ROC curve of 0.72. The risk factors more strongly associated with falls were confusion and depression. As falls of older patients are a common problem in acute care settings it is necessary that the nurses use specific validate and reliable

  4. Validity and reliability of eating disorder assessments used with athletes: A review

    Directory of Open Access Journals (Sweden)

    Zachary Pope

    2015-09-01

    Conclusion: Only seven studies calculated validity coefficients within the study whereas 47 cited the validity coefficient. Twenty-six calculated a reliability coefficient whereas 47 cited the reliability of the ED measures. Four studies found validity evidence for the EAT, EDI, BULIT-R, QEDD, and EDE-Q in an athlete population. Few studies reviewed calculated validity and reliability coefficients of ED measures. Cross-validation of these measures in athlete populations is clearly needed.

  5. Internal Consistency, Retest Reliability, and their Implications For Personality Scale Validity

    Science.gov (United States)

    McCrae, Robert R.; Kurtz, John E.; Yamagata, Shinji; Terracciano, Antonio

    2010-01-01

    We examined data (N = 34,108) on the differential reliability and validity of facet scales from the NEO Inventories. We evaluated the extent to which (a) psychometric properties of facet scales are generalizable across ages, cultures, and methods of measurement; and (b) validity criteria are associated with different forms of reliability. Composite estimates of facet scale stability, heritability, and cross-observer validity were broadly generalizable. Two estimates of retest reliability were independent predictors of the three validity criteria; none of three estimates of internal consistency was. Available evidence suggests the same pattern of results for other personality inventories. Internal consistency of scales can be useful as a check on data quality, but appears to be of limited utility for evaluating the potential validity of developed scales, and it should not be used as a substitute for retest reliability. Further research on the nature and determinants of retest reliability is needed. PMID:20435807

  6. Methods to achieve high interrater reliability in data collection from primary care medical records.

    Science.gov (United States)

    Liddy, Clare; Wiens, Miriam; Hogg, William

    2011-01-01

    We assessed interrater reliability (IRR) of chart abstractors within a randomized trial of cardiovascular care in primary care. We report our findings, and outline issues and provide recommendations related to determining sample size, frequency of verification, and minimum thresholds for 2 measures of IRR: the κ statistic and percent agreement. We designed a data quality monitoring procedure having 4 parts: use of standardized protocols and forms, extensive training, continuous monitoring of IRR, and a quality improvement feedback mechanism. Four abstractors checked a 5% sample of charts at 3 time points for a predefined set of indicators of the quality of care. We set our quality threshold for IRR at a κ of 0.75, a percent agreement of 95%, or both. Abstractors reabstracted a sample of charts in 16 of 27 primary care practices, checking a total of 132 charts with 38 indicators per chart. The overall κ across all items was 0.91 (95% confidence interval, 0.90-0.92) and the overall percent agreement was 94.3%, signifying excellent agreement between abstractors. We gave feedback to the abstractors to highlight items that had a κ of less than 0.70 or a percent agreement less than 95%. No practice had to have its charts abstracted again because of poor quality. A 5% sampling of charts for quality control using IRR analysis yielded κ and agreement levels that met or exceeded our quality thresholds. Using 3 time points during the chart audit phase allows for early quality control as well as ongoing quality monitoring. Our results can be used as a guide and benchmark for other medical chart review studies in primary care.

  7. Validity and Reliability of the Upper Extremity Work Demands Scale.

    Science.gov (United States)

    Jacobs, Nora W; Berduszek, Redmar J; Dijkstra, Pieter U; van der Sluis, Corry K

    2017-12-01

    Purpose To evaluate validity and reliability of the upper extremity work demands (UEWD) scale. Methods Participants from different levels of physical work demands, based on the Dictionary of Occupational Titles categories, were included. A historical database of 74 workers was added for factor analysis. Criterion validity was evaluated by comparing observed and self-reported UEWD scores. To assess structural validity, a factor analysis was executed. For reliability, the difference between two self-reported UEWD scores, the smallest detectable change (SDC), test-retest reliability and internal consistency were determined. Results Fifty-four participants were observed at work and 51 of them filled in the UEWD twice with a mean interval of 16.6 days (SD 3.3, range = 10-25 days). Criterion validity of the UEWD scale was moderate (r = .44, p = .001). Factor analysis revealed that 'force and posture' and 'repetition' subscales could be distinguished with Cronbach's alpha of .79 and .84, respectively. Reliability was good; there was no significant difference between repeated measurements. An SDC of 5.0 was found. Test-retest reliability was good (intraclass correlation coefficient for agreement = .84) and all item-total correlations were >.30. There were two pairs of highly related items. Conclusion Reliability of the UEWD scale was good, but criterion validity was moderate. Based on current results, a modified UEWD scale (2 items removed, 1 item reworded, divided into 2 subscales) was proposed. Since observation appeared to be an inappropriate gold standard, we advise to investigate other types of validity, such as construct validity, in further research.

  8. Determination of validity and reliability of performance assessments tasks developed for selected topics in high school chemistry

    Science.gov (United States)

    Zichittella, Gail Eberhardt

    The primary purpose of this study was to validate performance assessments, which can be used as teaching and assessment instruments in high school science classrooms. This study evaluated the classroom usability of these performance instruments and establishes the interrater reliability of the scoring rubrics when used by classroom teachers. The assessment instruments were designed to represent two levels of scientific inquiry. The high level of inquiry tasks are relatively unstructured in terms of student directions; the low inquiry tasks provided more structure for the student. The tasks cover two content topics studied in chemistry (scientific observation and density). Students from a variety of Western New York school districts who were enrolled in chemistry classes and other science courses were involved in completion of the tasks at the two levels of inquiry. The chemistry students completed the NYS Regents Examination in Chemistry. Their classroom teachers were interviewed and completed a questionnaire to aid in the establishment their epistemological view on the inclusion of inquiry based learning in the science classroom. Data showed that the performance assessment tasks were reliable, valid and helpful for obtaining a more complete picture of the students' scientific understanding. The teacher participants reported no difficulty with the usability of the task in the high school chemistry setting. Collected data gave no evidence of gender bias with reference to the performance tasks or the NYS Regents Chemistry Examination. Additionally, it was shown that the instructors' classroom practices do have an effect upon the students' achievement on the performance tasks and the NYS Regents examination. Data also showed that achievement on the performance tasks was influenced by the number of years of science instruction students had received.

  9. Development and validation of the mindfulness-based interventions - teaching assessment criteria (MBI:TAC).

    Science.gov (United States)

    Crane, Rebecca S; Eames, Catrin; Kuyken, Willem; Hastings, Richard P; Williams, J Mark G; Bartley, Trish; Evans, Alison; Silverton, Sara; Soulsby, Judith G; Surawy, Christina

    2013-12-01

    The assessment of intervention integrity is essential in psychotherapeutic intervention outcome research and psychotherapist training. There has been little attention given to it in mindfulness-based interventions research, training programs, and practice. To address this, the Mindfulness-Based Interventions: Teaching Assessment Criteria (MBI:TAC) was developed. This article describes the MBI:TAC and its development and presents initial data on reliability and validity. Sixteen assessors from three centers evaluated teaching integrity of 43 teachers using the MBI:TAC. Internal consistency (α = .94) and interrater reliability (overall intraclass correlation coefficient = .81; range = .60-.81) were high. Face and content validity were established through the MBI:TAC development process. Data on construct validity were acceptable. Initial data indicate that the MBI:TAC is a reliable and valid tool. It can be used in Mindfulness-Based Stress Reduction/Mindfulness-Based Cognitive Therapy outcome evaluation research, training and pragmatic practice settings, and in research to assess the impact of teaching integrity on participant outcome.

  10. Estudo da confiabilidade e validade da utilização do hidropletismômetro para medida de edema no tornozelo Study of the reliability and validity of the water plethysmographer for use in measurement of the edema at the ankle/foot

    Directory of Open Access Journals (Sweden)

    Ian Lara Lamounier Andrade

    2011-03-01

    Full Text Available Instrumentos confiáveis e válidos são necessários para avaliar a efetividade das técnicas de reabilitação. O objetivo deste estudo foi avaliar a confiabilidade e a validade do Hidropletismômetro. Quatorze sujeitos com idade entre 18 e 59 anos foram selecionados para participar do estudo. Os tornozelos foram avaliados por dois examinadores treinados com o Hidropletismômetro. Três medidas de cada tornozelo foram obtidas pelos examinadores de forma aleatória para investigar a confiabilidade intra e inter-examinadores. Foram também obtidas vinte e seis medidas com o Hidropletismômetro e comparadas com as medidas obtidas com o Deslocador de Água considerado "padrão ouro", através de provetas de vidro graduadas de 10 ml a 1000 ml. A confiabilidade intra e inter-examinadores foi realizada através do Coeficiente de Correlação Intraclasse (CCI e a validade através do teste t pareado. O Coeficiente de Correlação de Pearson foi utilizado para estabelecer a associação entre os dois instrumentos. Os dados demonstraram níveis de confiabilidade intra e inter-examinadores com (CCI3,1 = 0,99 e (CCI3,2 = 0,99, respectivamente. Nenhuma diferença foi encontrada entre as medidas obtidas com o Hidropletismômetro e as obtidas com as provetas (p=0,404. A Correlação de Coeficiente de Pearson mostrou alta magnitute e nível de significância entre as medidas (r=1,0; pReliable and valid instruments are necessary to evaluate the effectiveness of the rehabilitation techniques.The objective this study was to evaluate the reliability and the validity of the water plethysmography. Fourteen subjects between the ages of 18 and 59 years were selected to participate in the study. Ankles were assessed by two trained examiners with the water plethysmography. Three measures of each ankle were obtained by the two investigators to evaluate intra- and inter-raters reliability. Twenty-six measures were obtained with the water plethysmography and compared with

  11. Development and first validation of a simplified CT-based classification system of soft tissue changes in large-head metal-on-metal total hip replacement: intra- and interrater reliability and association with revision rates in a uniform cohort of 664 arthroplasties

    International Nuclear Information System (INIS)

    Boomsma, Martijn F.; Warringa, Niek; Edens, Mireille A.; Lingen, Christiaan P. van; Ettema, Harmen B.; Verheyen, Cees C.P.M.; Maas, Mario

    2015-01-01

    analysis was performed two-tailed using alpha 5 % as the significance level. In total, 664 scores from 664 MoM hips obtained by two observers were available for analyses. Interobserver reliability for the non-simplified version (I-V) was κw = 0.71 (95 % CI: 0.62-0.79), which indicates good agreement between the two musculoskeletal radiologists. Intra- and interobserver reliability for the simplified version (A-C) were respectively κw 0.78 (95 % CI: 0.68-0.87), and κw = 0.71 (95 % CI: 0.65-0.76). This indicates good agreement within and between the two observers. The simplified A-C version is significantly associated with revision exclusively due to MoM pathology, in both patients with unilateral MoM THA (p < 0.001) and patients with bilateral MoM THA (p < 0.044). The simplified A-C version is associated with several clinical measures. In patients with unilateral MoM THA, with or without contralateral THA, in situ time (p < 0.008), cobalt and chromium (p < 0.001) were statistically significant. In patients with bilateral MoM, cobalt (p < 0.001) and chromium (p < 0.027) were statistically significant. Revision is significantly associated with cup size (p < 0.001), anteversion of the cup (p < 0.004), serum ion levels of cobalt and chromium (p < 0.001) and the adapted classification system (p < 0.001). In univariate logistic regression analysis on revision, cup, anteversion of the cup, cobalt-chromium ion serum levels, and the simplified (A-C) CT category system were statistically significant. The simplified (A-C) CT category system was an independent associate of revision, in several multiple logistic regression models. The presented simplified CT grading system (A-C) in its first clinical validation on 48- and 64-multislice systems is reliable, showing good intra- and interrater reliability and is independently associated with revision surgery. (orig.)

  12. Development and first validation of a simplified CT-based classification system of soft tissue changes in large-head metal-on-metal total hip replacement: intra- and interrater reliability and association with revision rates in a uniform cohort of 664 arthroplasties

    Energy Technology Data Exchange (ETDEWEB)

    Boomsma, Martijn F.; Warringa, Niek [Isala Hospital, Department of Radiology, Zwolle (Netherlands); Edens, Mireille A. [Isala Hospital, Department of Innovation and Science, Zwolle (Netherlands); Lingen, Christiaan P. van; Ettema, Harmen B.; Verheyen, Cees C.P.M. [Isala Hospital, Department of Orthopaedics, Zwolle (Netherlands); Maas, Mario [AMC, Department of Radiology, Amsterdam (Netherlands)

    2015-08-15

    analysis was performed two-tailed using alpha 5 % as the significance level. In total, 664 scores from 664 MoM hips obtained by two observers were available for analyses. Interobserver reliability for the non-simplified version (I-V) was κw = 0.71 (95 % CI: 0.62-0.79), which indicates good agreement between the two musculoskeletal radiologists. Intra- and interobserver reliability for the simplified version (A-C) were respectively κw 0.78 (95 % CI: 0.68-0.87), and κw = 0.71 (95 % CI: 0.65-0.76). This indicates good agreement within and between the two observers. The simplified A-C version is significantly associated with revision exclusively due to MoM pathology, in both patients with unilateral MoM THA (p < 0.001) and patients with bilateral MoM THA (p < 0.044). The simplified A-C version is associated with several clinical measures. In patients with unilateral MoM THA, with or without contralateral THA, in situ time (p < 0.008), cobalt and chromium (p < 0.001) were statistically significant. In patients with bilateral MoM, cobalt (p < 0.001) and chromium (p < 0.027) were statistically significant. Revision is significantly associated with cup size (p < 0.001), anteversion of the cup (p < 0.004), serum ion levels of cobalt and chromium (p < 0.001) and the adapted classification system (p < 0.001). In univariate logistic regression analysis on revision, cup, anteversion of the cup, cobalt-chromium ion serum levels, and the simplified (A-C) CT category system were statistically significant. The simplified (A-C) CT category system was an independent associate of revision, in several multiple logistic regression models. The presented simplified CT grading system (A-C) in its first clinical validation on 48- and 64-multislice systems is reliable, showing good intra- and interrater reliability and is independently associated with revision surgery. (orig.)

  13. Development and validation of a toddler silhouette scale.

    Science.gov (United States)

    Hager, Erin R; McGill, Adrienne E; Black, Maureen M

    2010-02-01

    The purpose of this study is to develop and validate a toddler silhouette scale. A seven-point scale was developed by an artist based on photographs of 15 toddlers (6 males, 9 females) varying in race/ethnicity and body size, and a list of phenotypic descriptions of toddlers of varying body sizes. Content validity, age-appropriateness, and gender and race/ethnicity neutrality were assessed among 180 pediatric health professionals and 129 parents of toddlers. Inter- and intrarater reliability and concurrent validity were assessed by having 138 pediatric health professionals match the silhouettes with photographs of toddlers. Assessments of content validity revealed that most health professionals (74.6%) and parents of toddlers (63.6%) ordered all seven silhouettes correctly, and interobserver agreement for weight status classification was high (kappa = 0.710, r = 0.827, P gender (68.5%) and race/ethnicity (77.3%) neutral. The inter-rater reliability, based on matching silhouettes with photographs, was 0.787 (Cronbach's alpha) and the intrarater reliability was 0.855 (P parents' perception of and satisfaction with their toddler's body size. Interventions can be targeted toward parents who have inaccurate perceptions of or are dissatisfied with their toddler's body size.

  14. Development of a Conservative Model Validation Approach for Reliable Analysis

    Science.gov (United States)

    2015-01-01

    CIE 2015 August 2-5, 2015, Boston, Massachusetts, USA [DRAFT] DETC2015-46982 DEVELOPMENT OF A CONSERVATIVE MODEL VALIDATION APPROACH FOR RELIABLE...obtain a conservative simulation model for reliable design even with limited experimental data. Very little research has taken into account the...3, the proposed conservative model validation is briefly compared to the conventional model validation approach. Section 4 describes how to account

  15. Reliability and Concurrent Validity of the International Personality ...

    African Journals Online (AJOL)

    Reliability and Concurrent Validity of the International Personality item Pool (IPIP) Big-five Factor Markers in Nigeria. ... Nigerian Journal of Psychiatry ... Aims: The aim of this study was to assess the internal consistency and concurrent validity ...

  16. The Reliability of a Novel Mobile 3-dimensional Wound Measurement Device.

    Science.gov (United States)

    Anghel, Ersilia L; Kumar, Anagha; Bigham, Thomas E; Maselli, Kathryn M; Steinberg, John S; Evans, Karen K; Kim, Paul J; Attinger, Christopher E

    2016-11-01

    Objective assessment of wound dimensions is essential for tracking progression and determining treatment effectiveness. A reliability study was designed to establish intrarater and interrater reliability of a novel mobile 3-dimensional wound measurement (3DWM) device. Forty-five wounds were assessed by 2 raters using a 3DWM device to obtain length, width, area, depth, and volume measurements. Wounds were also measured manually, using a disposable ruler and digital planimetry. The intraclass correlation coefficient (ICC) was used to establish intrarater and interrater reliability. High levels of intrarater and interrater agreement were observed for area, length, and width; ICC = 0.998, 0.977, 0.955 and 0.999, 0.997, 0.995, respectively. Moderate levels of intrarater (ICC = 0.888) and interrater (ICC = 0.696) agreement were observed for volume. Lastly, depth yielded an intrarater ICC of 0.360 and an interrater ICC of 0.649. Measures from the 3DWM device were highly correlated with those obtained from scaled photography for length, width, and area (ρ = 0.997, 0.988, 0.997, P device yielded correlations of ρ = 0.990, 0.987, 0.996 with P device was found to be highly reliable for measuring wound areas for a range of wound sizes and types as compared to manual measurement and digital planimetry. The depth and therefore volume measurement using the 3DWM device was found to have a lower ICC, but volume ICC alone was moderate. Overall, this device offers a mobile option for objective wound measurement in the clinical setting.

  17. The Children's Play Therapy Instrument (CPTI). Description, development, and reliability studies.

    Science.gov (United States)

    Kernberg, P F; Chazan, S E; Normandin, L

    1998-01-01

    The Children's Play Therapy Instrument (CPTI), its development, and reliability studies are described. The CPTI is a new instrument to examine a child's play activity in individual psychotherapy. Three independent raters used the CPTI to rate eight videotaped play therapy vignettes. Results were compared with the authors' consensual scores from a preliminary study. Generally good to excellent levels of interrater reliability were obtained for the independent raters on intraclass correlation coefficients for ordinal categories of the CPTI. Likewise, kappa levels were acceptable to excellent for nominal categories of the scale. The CPTI holds promise to become a reliable measure of play activity in child psychotherapy. Further research is needed to assess discriminant validity of the CPTI for use as a diagnostic tool and as a measure of process and outcome.

  18. Validity and reliability of the Thai version of the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU

    Directory of Open Access Journals (Sweden)

    Pipanmekaporn T

    2014-05-01

    Full Text Available Tanyong Pipanmekaporn,1 Nahathai Wongpakaran,2 Sirirat Mueankwan,3 Piyawat Dendumrongkul,2 Kaweesak Chittawatanarat,3 Nantiya Khongpheng,3 Nongnut Duangsoy31Department of Anesthesiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand; 2Department of Psychiatry, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand; 3Division of Surgical Critical Care and Trauma, Department of Surgery, Chiang Mai University Hospital, Chiang Mai, ThailandPurpose: The purpose of this study was to determine the validity and reliability of the Thai version of the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU, when compared to the diagnoses made by delirium experts.Patients and methods: This was a cross-sectional study conducted in both surgical intensive care and subintensive care units in Thailand between February–June 2011. Seventy patients aged 60 years or older who had been admitted to the units were enrolled into the study within the first 48 hours of admission. Each patient was randomly assessed as to whether they had delirium by a nurse using the Thai version of the CAM-ICU algorithm (Thai CAM-ICU or by a delirium expert using the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision.Results: The prevalence of delirium was found to be 18.6% (n=13 by the delirium experts. The sensitivity of the Thai CAM-ICU’s algorithms was found to be 92.3% (95% confidence interval [CI] =64.0%-99.8%, while the specificity was 94.7% (95% CI =85.4%-98.9%. The instrument displayed good interrater reliability (Cohen’s κ=0.81; 95% CI =0.64-0.99. The time taken to complete the Thai CAM-ICU was 1 minute (interquatile range, 1-2 minutes.Conclusion: The Thai CAM-ICU demonstrated good validity, reliability, and ease of use when diagnosing delirium in a surgical intensive care unit setting. The use of this diagnostic tool should be encouraged for daily, routine use, so as to promote the early detection

  19. Reliable and Valid Assessment of Point-of-care Ultrasonography

    DEFF Research Database (Denmark)

    Todsen, Tobias; Tolsgaard, Martin Grønnebæk; Olsen, Beth Härstedt

    2015-01-01

    physicians' OSAUS scores with diagnostic accuracy. RESULTS: The generalizability coefficient was high (0.81) and a D-study demonstrated that 1 assessor and 5 cases would result in similar reliability. The construct validity of the OSAUS scale was supported by a significant difference in the mean scores......OBJECTIVE: To explore the reliability and validity of the Objective Structured Assessment of Ultrasound Skills (OSAUS) scale for point-of-care ultrasonography (POC US) performance. BACKGROUND: POC US is increasingly used by clinicians and is an essential part of the management of acute surgical...... conditions. However, the quality of performance is highly operator-dependent. Therefore, reliable and valid assessment of trainees' ultrasonography competence is needed to ensure patient safety. METHODS: Twenty-four physicians, representing novices, intermediates, and experts in POC US, scanned 4 different...

  20. Reliability and Validity of the Greek Migraine Disability Assessment (MIDAS) Questionnaire.

    Science.gov (United States)

    Oikonomidi, Theodora; Vikelis, Michail; Artemiadis, Artemios; Chrousos, George P; Darviri, Christina

    2018-03-01

    The Migraine Disability Assessment (MIDAS) Questionnaire is a reliable and valid instrument for migraine-related disability. Such a tool is needed to quantify migraine-related disability in the Greek population. This validation study aims to assess the test-retest reliability, internal consistency, item discriminant and convergent validity of the Greek translation of the MIDAS. Adults diagnosed with migraine completed the MIDAS Questionnaire on two occasions 3 weeks apart to assess reliability, and completed the RAND-36 to assess validity. Participants (n = 152) had a median MIDAS score of 24 and mostly severe disability (58% were grade IV). The test-retest reliability analysis (N = 59) revealed excellent reliability for the total score. Internal consistency was α = 0.71 for initial and α = 0.82 for retest completion. For item discriminant validity, the correlations between each question and the total score were significant, with high correlations for questions 2-5 (range 0.67 ≤ r ≤ 0.79; p MIDAS score tended to have better wellbeing. Psychometric properties are comparable with those of other published validation studies of the MIDAS and the original. Findings on question 1 show that missing work/school days may be closely related with increased affect issues. The Greek version of the MIDAS Questionnaire has good reliability and validity. This study allowed for cross-cultural comparability of research findings.

  1. Validity of an observation method for assessing pain behavior in individuals with multiple sclerosis.

    Science.gov (United States)

    Cook, Karon F; Roddey, Toni S; Bamer, Alyssa M; Amtmann, Dagmar; Keefe, Francis J

    2013-09-01

    Pain is a common and complex experience for individuals who live with multiple sclerosis (MS) and it interferes with physical, psychological, and social function. A valid and reliable tool for quantifying observed pain behaviors in MS is critical to understand how pain behaviors contribute to pain-related disability in this clinical population. To evaluate the reliability and validity of a pain behavioral observation protocol in individuals who have MS. Community-dwelling volunteers with MS (N=30), back pain (N=5), or arthritis (N=8) were recruited based on clinician referrals, advertisements, fliers, web postings, and participation in previous research. Participants completed the measures of pain severity, pain interference, and self-reported pain behaviors and were videotaped doing typical activities (e.g., walking and sitting). Two coders independently recorded frequencies of pain behaviors by category (e.g., guarding and bracing) and interrater reliability statistics were calculated. Naïve observers reviewed videotapes of individuals with MS and rated their pain. The Spearman's correlations were calculated between pain behavior frequencies and self-reported pain and pain ratings by naïve observers. Interrater reliability estimates indicated the reliability of pain codes in the MS sample. Kappa coefficients ranged from moderate (sighing=0.40) to substantial agreements (guarding=0.83). These values were comparable with those obtained in the combined back pain and arthritis sample. Concurrent validity was supported by correlations with self-reported pain (0.46-0.53) and with self-reports of pain behaviors (0.58). Construct validity was supported by a finding of 0.87 correlation between total pain behaviors observed by coders and mean pain ratings by naïve observers. Results support the use of the pain behavior observation protocol for assessing pain behaviors of individuals with MS. Valid assessments of pain behaviors of individuals with MS could lead to

  2. Reliable and valid assessment of performance in thoracoscopy

    DEFF Research Database (Denmark)

    Konge, Lars; Lehnert, Per; Hansen, Henrik Jessen

    2012-01-01

    BACKGROUND: As we move toward competency-based education in medicine, we have lagged in developing competency-based evaluation methods. In the era of minimally invasive surgery, there is a need for a reliable and valid tool dedicated to measure competence in video-assisted thoracoscopic surgery....... The purpose of this study is to create such an assessment tool, and to explore its reliability and validity. METHODS: An expert group of physicians created an assessment tool consisting of 10 items rated on a five-point rating scale. The following factors were included: economy and confidence of movement...

  3. NDE reliability and advanced NDE technology validation

    International Nuclear Information System (INIS)

    Doctor, S.R.; Deffenbaugh, J.D.; Good, M.S.; Green, E.R.; Heasler, P.G.; Hutton, P.H.; Reid, L.D.; Simonen, F.A.; Spanner, J.C.; Vo, T.V.

    1989-01-01

    This paper reports on progress for three programs: (1) evaluation and improvement in nondestructive examination reliability for inservice inspection of light water reactors (LWR) (NDE Reliability Program), (2) field validation acceptance, and training for advanced NDE technology, and (3) evaluation of computer-based NDE techniques and regional support of inspection activities. The NDE Reliability Program objectives are to quantify the reliability of inservice inspection techniques for LWR primary system components through independent research and establish means for obtaining improvements in the reliability of inservice inspections. The areas of significant progress will be described concerning ASME Code activities, re-analysis of the PISC-II data, the equipment interaction matrix study, new inspection criteria, and PISC-III. The objectives of the second program are to develop field procedures for the AE and SAFT-UT techniques, perform field validation testing of these techniques, provide training in the techniques for NRC headquarters and regional staff, and work with the ASME Code for the use of these advanced technologies. The final program's objective is to evaluate the reliability and accuracy of interpretation of results from computer-based ultrasonic inservice inspection systems, and to develop guidelines for NRC staff to monitor and evaluate the effectiveness of inservice inspections conducted on nuclear power reactors. This program started in the last quarter of FY89, and the extent of the program was to prepare a work plan for presentation to and approval from a technical advisory group of NRC staff

  4. Preliminary Validation and Reliability Testing of the Montreal Instrument for Cat Arthritis Testing, for Use by Veterinarians, in a Colony of Laboratory Cats

    Directory of Open Access Journals (Sweden)

    Mary P. Klinck

    2015-12-01

    Full Text Available Subtle signs and conflicting physical and radiographic findings make feline osteoarthritis (OA challenging to diagnose. A physical examination-based assessment was developed, consisting of eight items: Interaction, Exploration, Posture, Gait, Body Condition, Coat and Claws, (joint Palpation–Findings, and Palpation–Cat Reaction. Content (experts and face (veterinary students validity were excellent. Construct validity, internal consistency, and intra- and inter-rater reliability were assessed via a pilot and main study, using laboratory-housed cats with and without OA. Gait distinguished OA status in the pilot ( p = 0.05 study. In the main study, no scale item achieved statistically significant OA detection. Forelimb peak vertical ground reaction force (PVF correlated inversely with Gait (Rho s = −0.38 ( p = 0.03 to −0.41 ( p = 0.02. Body Posture correlated with Gait, and inversely with forelimb PVF at two of three time points (Rho s = −0.38 ( p = 0.03 to −0.43 ( p = 0.01. Palpation (Findings, Cat Reaction did not distinguish OA from non-OA cats. Palpation—Cat Reaction (Forelimbs correlated inversely with forelimb PVF at two time points (Rho s = −0.41 ( p = 0.02 to −0.41 ( p = 0.01, but scores were highly variable, and poorly reliable. Gait and Posture require improved sensitivity, and Palpation should be interpreted cautiously, in diagnosing feline OA.

  5. Reliability and Validity of Athletes Disability Index Questionnaire.

    Science.gov (United States)

    Noormohammadpour, Pardis; Hosseini Khezri, Alireza; Farahbakhsh, Farzin; Mansournia, Mohammad Ali; Smuck, Matthew; Kordi, Ramin

    2018-03-01

    The purpose of this study was to evaluate validity and reliability of a new proposed questionnaire for assessment of functional disability in athletes with low back pain (LBP). Validity and reliability study. Elite athletes participating in different fields of sports. Participants were 165 male and female athletes (between 12 and 50 years old) with LBP. Athlete Disability Index (ADI) Questionnaire which is developed by the authors for assessing LBP-related disability in athletes, Oswestry Disability Index (ODI), and the Roland-Morris Disability Questionnaire (RDQ). Self-reported responses were collected regarding LBP-related disability through ADI, ODI, and RDQ. The test-retest reliability was strong, and intraclass correlation value ranged between 0.74 and 0.94. The Cronbach alpha coefficient value of 0.91 (P visual analog scale was r = 0.626 (P disability levels were mild in the large majority of subjects (91.5% and 86.0%, respectively). Alternatively, disability assessments by the ADI did not cluster at the mild level and ranged more broadly from mild to very high. The ADI is a reliable and valid instrument for assessing disability in athletes with LBP. Compared with the available LBP disability questionnaires used in the general population, ADI can more precisely stratify the disability levels of athletes due to LBP.

  6. The Danish anal sphincter rupture questionnaire: Validity and reliability

    DEFF Research Database (Denmark)

    Due, Ulla; Ottesen, Marianne

    2008-01-01

    Objective. To revise, validate and test for reliability an anal sphincter rupture questionnaire in relation to construct, content and face validity. Setting and background. Since 1996 women with anal sphincter rupture (ASR) at one of the public university hospitals in Copenhagen, Denmark have been...... main questions but one. Two questions needed further explanation. Seven women made minor errors. Conclusion. The validated Danish questionnaire has a good construct, content and face validity. It is a well accepted, reliable, simple and clinically relevant screening tool. It reveals physical problems...... offered pelvic floor muscle examination and instruction by a specialist physiotherapist. In relation to that, a non-validated questionnaire about anal and urinary incontinence was to be answered six months after childbirth. Method. The original questionnaire was revised and a pilot test was performed...

  7. Reliability and Validity Assessment of a Linear Position Transducer

    Science.gov (United States)

    Garnacho-Castaño, Manuel V.; López-Lastra, Silvia; Maté-Muñoz, José L.

    2015-01-01

    The objectives of the study were to determine the validity and reliability of peak velocity (PV), average velocity (AV), peak power (PP) and average power (AP) measurements were made using a linear position transducer. Validity was assessed by comparing measurements simultaneously obtained using the Tendo Weightlifting Analyzer Systemi and T-Force Dynamic Measurement Systemr (Ergotech, Murcia, Spain) during two resistance exercises, bench press (BP) and full back squat (BS), performed by 71 trained male subjects. For the reliability study, a further 32 men completed both lifts using the Tendo Weightlifting Analyzer Systemz in two identical testing sessions one week apart (session 1 vs. session 2). Intraclass correlation coefficients (ICCs) indicating the validity of the Tendo Weightlifting Analyzer Systemi were high, with values ranging from 0.853 to 0.989. Systematic biases and random errors were low to moderate for almost all variables, being higher in the case of PP (bias ±157.56 W; error ±131.84 W). Proportional biases were identified for almost all variables. Test-retest reliability was strong with ICCs ranging from 0.922 to 0.988. Reliability results also showed minimal systematic biases and random errors, which were only significant for PP (bias -19.19 W; error ±67.57 W). Only PV recorded in the BS showed no significant proportional bias. The Tendo Weightlifting Analyzer Systemi emerged as a reliable system for measuring movement velocity and estimating power in resistance exercises. The low biases and random errors observed here (mainly AV, AP) make this device a useful tool for monitoring resistance training. Key points This study determined the validity and reliability of peak velocity, average velocity, peak power and average power measurements made using a linear position transducer The Tendo Weight-lifting Analyzer Systemi emerged as a reliable system for measuring movement velocity and power. PMID:25729300

  8. Dental Management Survey Brazil (DMS-BR): creation and validation of a management instrument

    Science.gov (United States)

    Gonzales, Paola Sampaio; Martins, Ismar Eduardo; Biazevic, Maria Gabriela Haye; Silva, Paulo Roberto da; Michel-Crosato, Edgard

    2017-04-10

    Questionnaires for the assessment of knowledge and self-perception can be useful to diagnose what a dentist knows about management and administration. The aim of the present study was to create and validate the Dental Management Survey Brazil (DMS-BR) scale, based on meetings with experts in the field. After having elaborated the first version, 10 audits were performed in dental offices in order to produce the final version, which included nine dimensions: location, patient, finance, marketing, competition, quality, staff, career, and productivity. The accuracy of the instrument was measured by intrarater and interrater reliability. In the validation phase, 247 Brazilian dentists answered a web-based questionnaire. The data were processed using Stata 13.0 and the significance level was set at 95%. The instrument had intrarater and interrater reliability (ICC-0.93 and 0.94). The overall average of respondents for the DMS-BR scale was 3.77 (SD = 0.45). Skewness and kurtosis were below absolute values 3 and 7, respectively. Internal validity measured by Cronbach's alpha was 0.925 and the correlation of each dimension with the final result of the DMS-BR ranged between 0.606 and 0.810. Correlation with the job satisfaction scale was 0.661. The SEM data ranged between 0.80 and 0.56. The questionnaire presented satisfactory indicators of dentists' self-perception about management and administration activities.

  9. Learning Style Scales: a valid and reliable questionnaire

    Directory of Open Access Journals (Sweden)

    Abdolghani Abdollahimohammad

    2014-08-01

    Full Text Available Purpose: Learning-style instruments assist students in developing their own learning strategies and outcomes, in eliminating learning barriers, and in acknowledging peer diversity. Only a few psychometrically validated learning-style instruments are available. This study aimed to develop a valid and reliable learning-style instrument for nursing students. Methods: A cross-sectional survey study was conducted in two nursing schools in two countries. A purposive sample of 156 undergraduate nursing students participated in the study. Face and content validity was obtained from an expert panel. The LSS construct was established using principal axis factoring (PAF with oblimin rotation, a scree plot test, and parallel analysis (PA. The reliability of LSS was tested using Cronbach’s α, corrected item-total correlation, and test-retest. Results: Factor analysis revealed five components, confirmed by PA and a relatively clear curve on the scree plot. Component strength and interpretability were also confirmed. The factors were labeled as perceptive, solitary, analytic, competitive, and imaginative learning styles. Cronbach’s α was > 0.70 for all subscales in both study populations. The corrected item-total correlations were > 0.30 for the items in each component. Conclusion: The LSS is a valid and reliable inventory for evaluating learning style preferences in nursing students in various multicultural environments.

  10. Learning Style Scales: a valid and reliable questionnaire.

    Science.gov (United States)

    Abdollahimohammad, Abdolghani; Ja'afar, Rogayah

    2014-01-01

    Learning-style instruments assist students in developing their own learning strategies and outcomes, in eliminating learning barriers, and in acknowledging peer diversity. Only a few psychometrically validated learning-style instruments are available. This study aimed to develop a valid and reliable learning-style instrument for nursing students. A cross-sectional survey study was conducted in two nursing schools in two countries. A purposive sample of 156 undergraduate nursing students participated in the study. Face and content validity was obtained from an expert panel. The LSS construct was established using principal axis factoring (PAF) with oblimin rotation, a scree plot test, and parallel analysis (PA). The reliability of LSS was tested using Cronbach's α, corrected item-total correlation, and test-retest. Factor analysis revealed five components, confirmed by PA and a relatively clear curve on the scree plot. Component strength and interpretability were also confirmed. The factors were labeled as perceptive, solitary, analytic, competitive, and imaginative learning styles. Cronbach's α was >0.70 for all subscales in both study populations. The corrected item-total correlations were >0.30 for the items in each component. The LSS is a valid and reliable inventory for evaluating learning style preferences in nursing students in various multicultural environments.

  11. Inter- and intra-rater reliability of nasal auscultation in daycare children.

    Science.gov (United States)

    Santos, Rita; Silva Alexandrino, Ana; Tomé, David; Melo, Cristina; Mesquita Montes, António; Costa, Daniel; Pinto Ferreira, João

    2018-02-01

    The aim of this study was to assess nasal auscultation's intra- and inter-rater reliability and to analyze ear and respiratory clinical condition according to nasal auscultation. Cross-sectional study performed in 125 children aged up to 3 years old attending daycare centers. Nasal auscultation, tympanometry and Paediatric Respiratory Severity Score (PRSS) were applied to all children. Nasal sounds were classified by an expert panel in order to determine nasal auscultation's intra and inter- rater reliability. The classification of nasal sounds was assessed against tympanometric and PRSS values. Nasal auscultation revealed substantial inter-rater (K=0.75) and intra-rater (K=0.69; K=0.61 and K=0.72) reliability. Children with a "non-obstructed" classification revealed a lower peak pressure (t=-3.599, Pauscultation revealed substantial intra- and inter-rater reliability. Nasal auscultation exhibited important differences according to ear and respiratory clinical conditions. Nasal auscultation in pediatrics seems to be an original topic as well as a simple method that can be used to identify early signs of nasopharyngeal obstruction.

  12. Validity, Reliability, and the Questionable Role of Psychometrics in Plastic Surgery

    Science.gov (United States)

    2014-01-01

    Summary: This report examines the meaning of validity and reliability and the role of psychometrics in plastic surgery. Study titles increasingly include the word “valid” to support the authors’ claims. Studies by other investigators may be labeled “not validated.” Validity simply refers to the ability of a device to measure what it intends to measure. Validity is not an intrinsic test property. It is a relative term most credibly assigned by the independent user. Similarly, the word “reliable” is subject to interpretation. In psychometrics, its meaning is synonymous with “reproducible.” The definitions of valid and reliable are analogous to accuracy and precision. Reliability (both the reliability of the data and the consistency of measurements) is a prerequisite for validity. Outcome measures in plastic surgery are intended to be surveys, not tests. The role of psychometric modeling in plastic surgery is unclear, and this discipline introduces difficult jargon that can discourage investigators. Standard statistical tests suffice. The unambiguous term “reproducible” is preferred when discussing data consistency. Study design and methodology are essential considerations when assessing a study’s validity. PMID:25289354

  13. Validity, Reliability, and the Questionable Role of Psychometrics in Plastic Surgery

    Directory of Open Access Journals (Sweden)

    Eric Swanson, MD

    2014-06-01

    Full Text Available Summary: This report examines the meaning of validity and reliability and the role of psychometrics in plastic surgery. Study titles increasingly include the word “valid” to support the authors’ claims. Studies by other investigators may be labeled “not validated.” Validity simply refers to the ability of a device to measure what it intends to measure. Validity is not an intrinsic test property. It is a relative term most credibly assigned by the independent user. Similarly, the word “reliable” is subject to interpretation. In psychometrics, its meaning is synonymous with “reproducible.” The definitions of valid and reliable are analogous to accuracy and precision. Reliability (both the reliability of the data and the consistency of measurements is a prerequisite for validity. Outcome measures in plastic surgery are intended to be surveys, not tests. The role of psychometric modeling in plastic surgery is unclear, and this discipline introduces difficult jargon that can discourage investigators. Standard statistical tests suffice. The unambiguous term “reproducible” is preferred when discussing data consistency. Study design and methodology are essential considerations when assessing a study’s validity.

  14. The Brief Negative Symptom Scale (BNSS): Independent validation in a large sample of Italian patients with schizophrenia.

    Science.gov (United States)

    Mucci, A; Galderisi, S; Merlotti, E; Rossi, A; Rocca, P; Bucci, P; Piegari, G; Chieffi, M; Vignapiano, A; Maj, M

    2015-07-01

    The Brief Negative Symptom Scale (BNSS) was developed to address the main limitations of the existing scales for the assessment of negative symptoms of schizophrenia. The initial validation of the scale by the group involved in its development demonstrated good convergent and discriminant validity, and a factor structure confirming the two domains of negative symptoms (reduced emotional/verbal expression and anhedonia/asociality/avolition). However, only relatively small samples of patients with schizophrenia were investigated. Further independent validation in large clinical samples might be instrumental to the broad diffusion of the scale in clinical research. The present study aimed to examine the BNSS inter-rater reliability, convergent/discriminant validity and factor structure in a large Italian sample of outpatients with schizophrenia. Our results confirmed the excellent inter-rater reliability of the BNSS (the intraclass correlation coefficient ranged from 0.81 to 0.98 for individual items and was 0.98 for the total score). The convergent validity measures had r values from 0.62 to 0.77, while the divergent validity measures had r values from 0.20 to 0.28 in the main sample (n=912) and in a subsample without clinically significant levels of depression and extrapyramidal symptoms (n=496). The BNSS factor structure was supported in both groups. The study confirms that the BNSS is a promising measure for quantifying negative symptoms of schizophrenia in large multicenter clinical studies. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  15. Discomfort Intolerance Scale: A Study of Reliability and Validity

    Directory of Open Access Journals (Sweden)

    Kadir ÖZDEL

    2012-03-01

    Full Text Available Objective: Discomfort Intolerance Scale was developed by Norman B. Schmidt et al. to assess the individual differences of capacity to withstand physical perturbations or uncomfortable bodily states (2006. The aim of this study is to investigate the validity and reliability of Discomfort Intolerance Scale-Turkish Version (RDÖ. Method: From two different universities, total of 225 students (male=167, female=58 were participated in this study. In order to determine the criterion validity, Beck Anxiety Inventory (BAI and State-Trait Anxiety Inventory (STAI were used. Construct validity was evaluated by factor analysis after the Kaiser-Meyer-Olkin (KMO and Barlett test had been performed. To assess the test-retest reliability the scale was re-applied to 54 participants 6 weeks later. Results: To assess construct validity of DIS, factor analyses were performed using varimax principal components analysis with varimax rotation. The factor analysis resulted in two factors named “discomfort (in tolerance” and “discomfort avoidance”. The Cronbach’s alpha coefficient for the entire scale, discomfort-(intolerance subscale, discomfortavoidance subscale were, .592, .670, .600 respectively. Correlations between two factors of DIS, discomfort intolerance and discomfort avoidance, and Trait Anxiety Inventory of STAI (State-Trait Anxiety Inventory were statistically significant at the level of 0.05. Test-retest reliability was statistically significant at the level of 0.01. Conclusion: Analysis demonstrated that DIS had a satisfactory level of reliability and validity in Turkish university students.

  16. Validity and Reliability of a Medicine Ball Explosive Power Test.

    Science.gov (United States)

    Stockbrugger, Barry A.; Haennel, Robert G.

    2001-01-01

    Evaluated the validity and reliability of a medicine ball throw test to evaluate explosive power. Data on competitive sand volleyball players who performed a medicine ball throw and a standard countermovement jump indicated that the medicine ball throw test was a valid and reliable way to assess explosive power for an analogous total-body movement…

  17. Inter-rater Reliability for Metrics Scored in a Binary Fashion-Performance Assessment for an Arthroscopic Bankart Repair.

    Science.gov (United States)

    Gallagher, Anthony G; Ryu, Richard K N; Pedowitz, Robert A; Henn, Patrick; Angelo, Richard L

    2018-05-02

    To determine the inter-rater reliability (IRR) of a procedure-specific checklist scored in a binary fashion for the evaluation of surgical skill and whether it meets a minimum level of agreement (≥0.8 between 2 raters) required for high-stakes assessment. In a prospective randomized and blinded fashion, and after detailed assessment training, 10 Arthroscopy Association of North America Master/Associate Master faculty arthroscopic surgeons (in 5 pairs) with an average of 21 years of surgical experience assessed the video-recorded 3-anchor arthroscopic Bankart repair performance of 44 postgraduate year 4 or 5 residents from 21 Accreditation Council for Graduate Medical Education orthopaedic residency training programs from across the United States. No paired scores of resident surgeon performance evaluated by the 5 teams of faculty assessors dropped below the 0.8 IRR level (mean = 0.93; range 0.84-0.99; standard deviation = 0.035). A comparison between the 5 assessor groups with 1 factor analysis of variance showed that there was no significant difference between the groups (P = .205). Pearson's product-moment correlation coefficient revealed a strong and statistically significant negative correlation, that is, -0.856 (P fashion meet the need and can show a high (>80%) IRR. Copyright © 2018 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.

  18. Reasoning with Inductive Argument Test: A Study of Validity and Reliability

    Directory of Open Access Journals (Sweden)

    Mehmet Emrah Karadere

    2013-12-01

    Conclusion: The preliminary data obtained from the study of reliability and validity of the scale shows that ‘Reasoning with Inductive Argument Test’ supports reliability and validity in Turkish population. [JCBPR 2013; 2(3.000: 156-161

  19. Harmonization process and reliability assessment of anthropometric measurements in the elderly EXERNET multi-centre study.

    Directory of Open Access Journals (Sweden)

    Alba Gómez-Cabello

    Full Text Available BACKGROUND: The elderly EXERNET multi-centre study aims to collect normative anthropometric data for old functionally independent adults living in Spain. PURPOSE: To describe the standardization process and reliability of the anthropometric measurements carried out in the pilot study and during the final workshop, examining both intra- and inter-rater errors for measurements. MATERIALS AND METHODS: A total of 98 elderly from five different regions participated in the intra-rater error assessment, and 10 different seniors living in the city of Toledo (Spain participated in the inter-rater assessment. We examined both intra- and inter-rater errors for heights and circumferences. RESULTS: For height, intra-rater technical errors of measurement (TEMs were smaller than 0.25 cm. For circumferences and knee height, TEMs were smaller than 1 cm, except for waist circumference in the city of Cáceres. Reliability for heights and circumferences was greater than 98% in all cases. Inter-rater TEMs were 0.61 cm for height, 0.75 cm for knee-height and ranged between 2.70 and 3.09 cm for the circumferences measured. Inter-rater reliabilities for anthropometric measurements were always higher than 90%. CONCLUSION: The harmonization process, including the workshop and pilot study, guarantee the quality of the anthropometric measurements in the elderly EXERNET multi-centre study. High reliability and low TEM may be expected when assessing anthropometry in elderly population.

  20. Validity and Reliability of the Arabic Token Test for Children

    Science.gov (United States)

    Alkhamra, Rana A.; Al-Jazi, Aya B.

    2016-01-01

    Background: The Token Test for Children (2nd edition) (TTFC) is a measure for assessing receptive language. In this study we describe the translation process, validity and reliability of the Arabic Token Test for Children (A-TTFC). Aims: The aim of this study is to translate, validate and establish the reliability of the Arabic Token Test for…

  1. Conceptualizing Essay Tests' Reliability and Validity: From Research to Theory

    Science.gov (United States)

    Badjadi, Nour El Imane

    2013-01-01

    The current paper on writing assessment surveys the literature on the reliability and validity of essay tests. The paper aims to examine the two concepts in relationship with essay testing as well as to provide a snapshot of the current understandings of the reliability and validity of essay tests as drawn in recent research studies. Bearing in…

  2. Construction of Valid and Reliable Test for Assessment of Students

    Science.gov (United States)

    Osadebe, P. U.

    2015-01-01

    The study was carried out to construct a valid and reliable test in Economics for secondary school students. Two research questions were drawn to guide the establishment of validity and reliability for the Economics Achievement Test (EAT). It is a multiple choice objective test of five options with 100 items. A sample of 1000 students was randomly…

  3. Validation of a patient interview for assessing reasons for antipsychotic discontinuation and continuation

    Directory of Open Access Journals (Sweden)

    Matza LS

    2012-07-01

    Full Text Available Louis S Matza,1 Glenn A Phillips,2 Dennis A Revicki,1 Haya Ascher-Svanum,3 Karen G Malley,4 Andrew C Palsgrove,1 Douglas E Faries,3 Virginia Stauffer,3 Bruce J Kinon,3 A George Awad,5 Richard SE Keefe,6 Dieter Naber71Outcomes Research, United BioSource Corporation, Bethesda, MD, 2Formerly with Eli Lilly and Company, Indianapolis, IN, 3Eli Lilly and Company, Indianapolis, IN, 4Malley Research Programming, Inc, Rockville, MD, USA; 5Department of Psychiatry and Behavioral Sciences; University of Toronto, Toronto, Canada; 6Duke University Medical Center, Durham NC, USA; 7Universitaetsklinikum Hamburg-Eppendorf, Hamburg, GermanyIntroduction: The Reasons for Antipsychotic Discontinuation Interview (RAD-I was developed to assess patients’ perceptions of reasons for discontinuing or continuing an antipsychotic. The current study examined reliability and validity of domain scores representing three factors contributing to these treatment decisions: treatment benefits, adverse events, and distal reasons other than direct effects of the medication.Methods: Data were collected from patients with schizophrenia or schizoaffective disorder and their treating clinicians. For approximately 25% of patients, a second rater completed the RAD-I for assessment of inter-rater reliability.Results: All patients (n = 121; 81 discontinuation, 40 continuation reported at least one reason for discontinuation or continuation (mean = 2.8 reasons for discontinuation; 3.4 for continuation. Inter-rater reliability was supported (kappas = 0.63–1.0. Validity of the discontinuation domain scores was supported by associations with symptom measures (the Positive and Negative Syndrome Scale for Schizophrenia, the Clinical Global Impression – Schizophrenia Scale; r = 0.30 to 0.51; all P < 0.01, patients’ primary reasons for discontinuation, and adverse events. However, the continuation domain scores were not significantly associated with these other indicators

  4. Reliability of the craniocervical posture assessment: visual and angular measurements using photographs and radiographs.

    Science.gov (United States)

    Gadotti, Inae C; Armijo-Olivo, Susan; Silveira, Anelise; Magee, David

    2013-01-01

    The purposes of this study were to determine the intrarater and interrater reliability of the craniocervical posture in a sagittal view using quantitative measurements on photographs and radiographs and to determine the agreement of the visual assessment of posture between raters. One photograph and 1 radiograph of the sagittal craniocervical posture were simultaneously taken from 39 healthy female subjects. Three angles were measured on the photographs and 10 angles on the radiographs of 22 subjects using Alcimage software (Alcimage; Uberlândia, MG, Brazil). Two repeated measurements were performed by 2 raters. The measurements were compared within and between raters to test the intrarater and interrater reliability, respectively. Intraclass correlation coefficient and SEM were used. κ Agreement was calculated for the visual assessment of 39 subjects using photographs and radiographs between 2 raters. Good to excellent intrarater and interrater intraclass correlation coefficient values were found on both photographs and radiographs. Interrater SEM was large and clinically significant for cervical lordosis photogrammetry and for 1 angle measuring cervical lordosis on radiographs. Interrater κ agreement for the visual assessment using photographs was poor (κ = 0.37). The raters were reliable to measure angles in photographs and radiographs to quantify craniocervical posture with exception of 2 angles measuring lordosis of the cervical spine when compared between raters. The visual assessment of posture between raters was not reliable. © 2013. Published by National University of Health Sciences All rights reserved.

  5. Optimal number of tests to achieve and validate product reliability

    International Nuclear Information System (INIS)

    Ahmed, Hussam; Chateauneuf, Alaa

    2014-01-01

    The reliability validation of engineering products and systems is mandatory for choosing the best cost-effective design among a series of alternatives. Decisions at early design stages have a large effect on the overall life cycle performance and cost of products. In this paper, an optimization-based formulation is proposed by coupling the costs of product design and validation testing, in order to ensure the product reliability with the minimum number of tests. This formulation addresses the question about the number of tests to be specified through reliability demonstration necessary to validate the product under appropriate confidence level. The proposed formulation takes into account the product cost, the failure cost and the testing cost. The optimization problem can be considered as a decision making system according to the hierarchy of structural reliability measures. The numerical examples show the interest of coupling design and testing parameters. - Highlights: • Coupled formulation for design and testing costs, with lifetime degradation. • Cost-effective testing optimization to achieve reliability target. • Solution procedure for nested aleatoric and epistemic variable spaces

  6. Distress Tolerance Scale: A Study of Reliability and Validity

    Directory of Open Access Journals (Sweden)

    Ahmet Emre SARGIN

    2012-11-01

    Full Text Available Objective: Distress Tolerance Scale (DTS is developed by Simons and Gaher in order to measure individual differences in the capacity of distress tolerance.The aim of this study is to assess the reliability and validity of the Turkish version of DTS. Method: One hundred and sixty seven university students (male=66, female=101 participated in this study. Beck Anxiety Inventory (BAI, State-trait Anxiety Inventory (STAI and Discomfort Intolerance Scale (DIS were used to determine the criterion validity. Construct validity was evaluated with factor analysis after the Kaiser-Meyer-Olkin (KMO and Barlett test had been performed. To assess the test-retest reliability, the scale was re-applied to 79 participants six weeks later. Results: To assess construct validity, factor analyses were performed using varimax principal components analysis with varimax rotation. While there were factors in the original study, our factor analysis resulted in three factors. Cronbach’s alpha coefficients for the entire scale and tolerance, regulation, self-efficacy subscales were .89, .90, .80 and .64 respectively. There were correlations at the level of 0.01 between the Trait Anxiety Inventory of STAI and BAI, and all the subscales of DTS and also between the State Anxiety Inventory and regulation subscale. Both of the subscales of DIS were correlated with the entire subscale and all the subscales except regulation at the level of 0.05.Test-retest reliability was statistically significant at the level of 0.01. Conclusion: Analysis demonstrated that DTS had a satisfactory level of reliability and validity in Turkish university students.

  7. Reliable and valid assessment of Lichtenstein hernia repair skills

    DEFF Research Database (Denmark)

    Carlsen, C G; Lindorff Larsen, Karen; Funch-Jensen, P

    2014-01-01

    PURPOSE: Lichtenstein hernia repair is a common surgical procedure and one of the first procedures performed by a surgical trainee. However, formal assessment tools developed for this procedure are few and sparsely validated. The aim of this study was to determine the reliability and validity...... of an assessment tool designed to measure surgical skills in Lichtenstein hernia repair. METHODS: Key issues were identified through a focus group interview. On this basis, an assessment tool with eight items was designed. Ten surgeons and surgical trainees were video recorded while performing Lichtenstein hernia...... a significant difference between the three groups which indicates construct validity, p skills can be assessed blindly by a single rater in a reliable and valid fashion with the new procedure-specific assessment tool. We recommend this tool for future assessment...

  8. Reliability and validity of the Dutch Recovery Stress Questionnaire for athletes

    NARCIS (Netherlands)

    Nederhof, Esther; Brink, Michel S.; Lemmink, Koen A. P. M.

    2008-01-01

    The purpose of the present study was to investigate the cross-cultural validity of the Recovery Stress Questionnaire for Athletes (RESTQ-sport) by analysing reliability and validity of a Dutch translation. Two studies were performed to assess test-retest reliability with a one week interval,

  9. Linguistic adaptation and validation into Spanish of the Diagnostic Interview for Borderline Personality Disorders-Revised (DIB-R).

    Science.gov (United States)

    Szerman, Néstor; Peris, M Dolores; Ruiz, Ana; Ruiz, Manuel; Gunderson, John G; Rejas, Javier

    2005-08-01

    This paper describes the linguistic adaptation and psychometric validation into Spanish of the Diagnostic Interview for Borderlines-Revised (DIB-R) scale for diagnosing borderline personality disorder (BPD). A conceptual equivalence approach was undertaken, including forward and backward translations of the scale and patient debriefing in a pilot phase. BPD and control patients were included in the validation study, and all of them were administered the scale by well trained interviewers, blinded to the clinical diagnosis. Reference diagnosis for BPD was done according to DSM-IV criteria. The interview was independently administered in a subset of patients by different interviewer to test inter-rater reliability . Reliability and validity of the instrument were tested by calculating the Cronbach alpha and Guttman split-half coefficients and by receiver operating characteristic (ROC) curve analysis, kappa agreement coefficient determination and assessment of sensitivity and specificity of the scale. A cohort of 111 subjects, 84 BPD patients (33.6 +/- 9.3 years) and 27 control subjects (34.9 +/- 9.3 years), were included in the study. A cut-off point > or = 7 showed a kappa agreement coefficient of 0.853 (95% confidence intervals: 0.739-0.967, p < 0.00001). The figures for sensitivity and specificity values were 0.964 (0.899-0.993) and 0.889 (0.708-0.977) respectively. Inter-rater reliability showed a kappa coefficient of 0.783 (p < 0.0001). The Spanish version of the DIB-R showed adequate psychometric properties for diagnosing BPD in Spain.

  10. A validation study using a modified version of Postural Assessment Scale for Stroke Patients: Postural Stroke Study in Gothenburg (POSTGOT

    Directory of Open Access Journals (Sweden)

    Danielsson Anna

    2011-10-01

    Full Text Available Abstract Background A modified version of Postural Assessment Scale for Stroke Patients (PASS was created with some changes in the description of the items and clarifications in the manual (e.g. much help was defined as support from 2 persons. The aim of this validation study was to assess intrarater and interrater reliability using this modified version of PASS, at a stroke unit, for patients in the acute phase after their first event of stroke. Methods In the intrarater reliability study 114 patients and in the interrater reliability study 15 patients were examined twice with the test within one to 24 hours in the first week after stroke. Spearman's rank correlation, Kappa coefficients, Percentage Agreement and the newer rank-invariant methods; Relative Position, Relative Concentration and Relative rank Variance were used for the statistical analysis. Results For the intrarater reliability Spearman's rank correlations were 0.88-0.98 and k were 0.70-0.93 for the individual items. Small, statistically significant, differences were found for two items regarding Relative Position and for one item regarding Relative Concentration. There was no Relative rank Variance for any single item. For the interrater reliability, Spearman's rank correlations were 0.77-0.99 for individual items. For some items there was a possible, even if not proved, reliability problem regarding Relative Position and Relative Concentration. There was no Relative rank Variance for the single items, except for a small Relative rank Variance for one item. Conclusions The high intrarater and interrater reliability shown for the modified Postural Assessment Scale for Stroke Patients, the Swedish version of Postural Assessment Scale for Stroke Patients, with traditional and newer statistical analyses, particularly for assessments performed by the same rater, support the use of the Swedish version of Postural Assessment Scale for Stroke Patients, in the acute stage after stroke both

  11. Facial Angiofibroma Severity Index (FASI): reliability assessment of a new tool developed to measure severity and responsiveness to therapy in tuberous sclerosis-associated facial angiofibroma.

    Science.gov (United States)

    Salido-Vallejo, R; Ruano, J; Garnacho-Saucedo, G; Godoy-Gijón, E; Llorca, D; Gómez-Fernández, C; Moreno-Giménez, J C

    2014-12-01

    Tuberous sclerosis complex (TSC) is an autosomal dominant neurocutaneous disorder characterized by the development of multisystem hamartomatous tumours. Topical sirolimus has recently been suggested as a potential treatment for TSC-associated facial angiofibroma (FA). To validate a reproducible scale created for the assessment of clinical severity and treatment response in these patients. We developed a new tool, the Facial Angiofibroma Severity Index (FASI) to evaluate the grade of erythema and the size and extent of FAs. In total, 30 different photographs of patients with TSC were shown to 56 dermatologists at each evaluation. Three evaluations using the same photographs but in a different random order were performed 1 week apart. Test and retest reliability and interobserver reproducibility were determined. There was good agreement between the investigators. Inter-rater reliability showed strong correlations (> 0.98; range 0.97-0.99) with inter-rater correlation coefficients (ICCs) for the FASI. The global estimated kappa coefficient for the degree of intra-rater agreement (test-retest) was 0.94 (range 0.91-0.97). The FASI is a valid and reliable tool for measuring the clinical severity of TSC-associated FAs, which can be applied in clinical practice to evaluate the response to treatment in these patients. © 2014 British Association of Dermatologists.

  12. Validity evidence and reliability of a simulated patient feedback instrument.

    Science.gov (United States)

    Schlegel, Claudia; Woermann, Ulrich; Rethans, Jan-Joost; van der Vleuten, Cees

    2012-01-27

    In the training of healthcare professionals, one of the advantages of communication training with simulated patients (SPs) is the SP's ability to provide direct feedback to students after a simulated clinical encounter. The quality of SP feedback must be monitored, especially because it is well known that feedback can have a profound effect on student performance. Due to the current lack of valid and reliable instruments to assess the quality of SP feedback, our study examined the validity and reliability of one potential instrument, the 'modified Quality of Simulated Patient Feedback Form' (mQSF). Content validity of the mQSF was assessed by inviting experts in the area of simulated clinical encounters to rate the importance of the mQSF items. Moreover, generalizability theory was used to examine the reliability of the mQSF. Our data came from videotapes of clinical encounters between six simulated patients and six students and the ensuing feedback from the SPs to the students. Ten faculty members judged the SP feedback according to the items on the mQSF. Three weeks later, this procedure was repeated with the same faculty members and recordings. All but two items of the mQSF received importance ratings of > 2.5 on a four-point rating scale. A generalizability coefficient of 0.77 was established with two judges observing one encounter. The findings for content validity and reliability with two judges suggest that the mQSF is a valid and reliable instrument to assess the quality of feedback provided by simulated patients.

  13. Content validity and reliability of the Copenhagen social relations questionnaire

    DEFF Research Database (Denmark)

    Lund, Rikke; Nielsen, Lene Snabe; Henriksen, Pia Wichmann

    2014-01-01

    OBJECTIVE: The aim of the present article is to describe the face and content validity as well as reliability of the Copenhagen Social Relations Questionnaire (CSRQ). METHOD: The face and content validity test was based on focus group discussions and individual interviews with 31 informants...... from the interviews. Two additional themes not covered by CSRQ on dynamics and reciprocity of social relations were identified. DISCUSSION: CSRQ holds satisfactory face and content validity as well as reliability, and is suitable for measuring structure and function of social relations including...

  14. [Reliability and validity of Driving Anger Scale in professional drivers in China].

    Science.gov (United States)

    Li, Z; Yang, Y M; Zhang, C; Li, Y; Hu, J; Gao, L W; Zhou, Y X; Zhang, X J

    2017-11-10

    Objective: To assess the reliability and validity of the Chinese version of Driving Anger Scale (DAS) in professional drivers in China and provide a scientific basis for the application of the scale in drivers in China. Methods: Professional drivers, including taxi drivers, bus drivers, truck drivers and school bus drivers, were selected to complete the questionnaire. Cronbach's α and split-half reliability were calculated to evaluate the reliability of DAS, and content, contract, discriminant and convergent validity were performed to measure the validity of the scale. Results: The overall Cronbach's α of DAS was 0.934 and the split-half reliability was 0.874. The correlation coefficient of each subscale with the total scale was 0.639-0.922. The simplified version of DAS supported a presupposed six-factor structure, explaining 56.371% of the total variance revealed by exploratory factor analysis. The DAS had good convergent and discriminant validity, with the success rate of calibration experiment of 100%. Conclusion: DAS has a good reliability and validity in professional drivers in China, and the use of DAS is worth promoting in divers.

  15. Evaluation of the psychometric properties of the phlebitis and infiltration scales for the assessment of complications of peripheral vascular access devices.

    Science.gov (United States)

    Groll, Dianne; Davies, Barbara; Mac Donald, Joan; Nelson, Susanne; Virani, Tazim

    2010-01-01

    To prevent complications from peripheral vascular access device (PVAD) therapy, the Infusion Nurses Society (INS) developed 2 scales to measure the extent and severity of phlebitis and infiltration in PVADs. This study evaluated the psychometric properties of these scales to validate them with respect to their interrater reliability, concurrent validity, feasibility, and acceptability. A total of 182 patients at 2 sites were enrolled, and 416 observations of PVAD sites were made. Two nurses independently rated each PVAD site for the presence or absence of phlebitis and/or infiltration by using the INS scales. The interrater reliability was calculated, as was the agreement of the observed versus charted incidence of phlebitis and infiltration (concurrent validity) and the ease of use of the scales (feasibility, acceptability). Interrater reliability for both the Phlebitis and Infiltration scales and concurrent validity were found to be statistically significant (P Phlebitis and Infiltration scales have been shown to be easy to use, valid, and reliable scales.

  16. Evaluating trauma team performance in a Level I trauma center: Validation of the trauma team communication assessment (TTCA-24).

    Science.gov (United States)

    DeMoor, Stephanie; Abdel-Rehim, Shady; Olmsted, Richard; Myers, John G; Parker-Raley, Jessica

    2017-07-01

    Nontechnical skills (NTS), such as team communication, are well-recognized determinants of trauma team performance and good patient care. Measuring these competencies during trauma resuscitations is essential, yet few valid and reliable tools are available. We aimed to demonstrate that the Trauma Team Communication Assessment (TTCA-24) is a valid and reliable instrument that measures communication effectiveness during activations. Two tools with adequate psychometric strength (Trauma Nontechnical Skills Scale [T-NOTECHS], Team Emergency Assessment Measure [TEAM]) were identified during a systematic review of medical literature and compared with TTCA-24. Three coders used each tool to evaluate 35 stable and 35 unstable patient activations (defined according to Advanced Trauma Life Support criteria). Interrater reliability was calculated between coders using the intraclass correlation coefficient. Spearman rank correlation coefficient was used to establish concurrent validity between TTCA-24 and the other two validated tools. Coders achieved an intraclass correlation coefficient of 0.87 for stable patient activations and 0.78 for unstable activations scoring excellent on the interrater agreement guidelines. The median score for each assessment showed good team communication for all 70 videos (TEAM, 39.8 of 54; T-NOTECHS, 17.4 of 25; and TTCA-24, 87.4 of 96). A significant correlation between TTTC-24 and T-NOTECHS was revealed (p = 0.029), but no significant correlation between TTCA-24 and TEAM (p = 0.77). Team communication was rated slightly better across all assessments for stable versus unstable patient activations, but not statistically significant. TTCA-24 correlated with T-NOTECHS, an instrument measuring nontechnical skills for trauma teams, but not TEAM, a tool that assesses communication in generic emergency settings. TTCA-24 is a reliable and valid assessment that can be a useful adjunct when evaluating interpersonal and team communication during trauma

  17. Validated assessment scales for the lower face.

    Science.gov (United States)

    Narins, Rhoda S; Carruthers, Jean; Flynn, Timothy C; Geister, Thorin L; Görtelmeyer, Roman; Hardas, Bhushan; Himmrich, Silvia; Jones, Derek; Kerscher, Martina; de Maio, Maurício; Mohrmann, Cornelia; Pooth, Rainer; Rzany, Berthold; Sattler, Gerhard; Buchner, Larry; Benter, Ursula; Breitscheidel, Lusine; Carruthers, Alastair

    2012-02-01

    Aging in the lower face leads to lines, wrinkles, depression of the corners of the mouth, and changes in lip volume and lip shape, with increased sagging of the skin of the jawline. Refined, easy-to-use, validated, objective standards assessing the severity of these changes are required in clinical research and practice. To establish the reliability of eight lower face scales assessing nasolabial folds, marionette lines, upper and lower lip fullness, lip wrinkles (at rest and dynamic), the oral commissure and jawline, aesthetic areas, and the lower face unit. Four 5-point rating scales were developed to objectively assess upper and lower lip wrinkles, oral commissures, and the jawline. Twelve experts rated identical lower face photographs of 50 subjects in two separate rating cycles using eight 5-point scales. Inter- and intrarater reliability of responses was assessed. Interrater reliability was substantial or almost perfect for all lower face scales, aesthetic areas, and the lower face unit. Intrarater reliability was high for all scales, areas and the lower face unit. Our rating scales are reliable tools for valid and reproducible assessment of the aging process in lower face areas. © 2012 by the American Society for Dermatologic Surgery, Inc. Published by Wiley Periodicals, Inc.

  18. Reliability and validity of a dual-probe personal computer-based muscle viewer for measuring the pennation angle of the medial gastrocnemius muscle in patients who have had a stroke.

    Science.gov (United States)

    Cho, Ji-Eun; Cho, Ki Hun; Yoo, Jun Sang; Lee, Su Jin; Lee, Wan-Hee

    2018-01-01

    Background A dual-probe personal computer-based muscle viewer (DPC-BMW) is advantageous in that it is relatively lightweight and easy to apply. Objective To investigate the reliability and validity of the DPC-BMW in comparison with those of a portable ultrasonography (P-US) device for measuring the pennation angle of the medial gastrocnemius (MG) muscle at rest and during contraction. Methods Twenty-four patients who had a stroke (18 men and 6 women) participated in this study. Using the DPC-BMW and P-US device, the pennation angle of the MG muscle on the affected side was randomly measured. Two examiners randomly obtained the images of all the participants in two separate test sessions, 7 days apart. Intraclass correlation coefficient (ICC), confidence interval, standard error of measurement, Bland-Altman plot, and Pearson correlation coefficient were used to estimate their reliability and validity. Results The ICC for the intrarater reliability of the MG muscle pennation angle measured using the DPC-BMW was > 0.916, indicating excellent reliability, and that for the interrater reliability ranged from 0.964 to 0.994. The P-US device also exhibited good reliability. A high correlation was found between the measurements of MG muscle pennation angle obtained using the DPC-BMW and that obtained using the P-US device (p < 0.01). Conclusion The DPC-BMW can provide clear images for accurate measurements, including measurements using dual probes. It has the advantage of rehabilitative US imaging for individuals who have had a stroke. More research studies are needed to evaluate the usefulness of the DPC-BMW in rehabilitation.

  19. Lower limb spasticity assessment using an inertial sensor: a reliability study

    International Nuclear Information System (INIS)

    Sterpi, I; Colombo, R; Caroli, A; Meazza, E; Maggioni, G; Pistarini, C

    2013-01-01

    Spasticity is a common motor impairment in patients with neurological disorders that can prevent functional recovery after rehabilitation. In the clinical setting, its assessment is carried out using standardized clinical scales. The aim of this study was to verify the applicability of inertial sensors for an objective measurement of quadriceps spasticity and evaluate its test–retest and inter-rater reliability during the implementation of the Wartenberg pendulum test. Ten healthy subjects and 11 patients in vegetative state with severe brain damage were enrolled in this study. Subjects were evaluated three times on three consecutive days. The test–retest reliability of measurement was assessed in the first two days. The third day was devoted to inter-rater reliability assessment. In addition, the lower limb muscle tone was bilaterally evaluated at the knee joint by the modified Ashworth scale. The factorial ANOVA analysis showed that the implemented method allowed us to discriminate between healthy and pathological conditions. The fairly low SEM and high ICC values obtained for the pendulum parameters indicated a good test–retest and inter-rater reliability of measurement. This study shows that an inertial sensor can be reliably used to characterize leg kinematics during the Wartenberg pendulum test and provide quantitative evaluation of quadriceps spasticity. (paper)

  20. Exploring the reliability and validity of the social-moral awareness test.

    Science.gov (United States)

    Livesey, Alexandra; Dodd, Karen; Pote, Helen; Marlow, Elizabeth

    2012-11-01

    The aim of the study was to explore the validity of the social-moral awareness test (SMAT) a measure designed for assessing socio-moral rule knowledge and reasoning in people with learning disabilities. Comparisons between Theory of Mind and socio-moral reasoning allowed the exploration of construct validity of the tool. Factor structure, reliability and discriminant validity were also assessed. Seventy-one participants with mild-moderate learning disabilities completed the two scales of the SMAT and two False Belief Tasks for Theory of Mind. Reliability of the SMAT was very good, and the scales were shown to be uni-dimensional in factor structure. There was a significant positive relationship between Theory of Mind and both SMAT scales. There is early evidence of the construct validity and reliability of the SMAT. Further assessment of the validity of the SMAT will be required. © 2012 Blackwell Publishing Ltd.

  1. Validity and Reliability of Farsi Version of Youth Sport Environment Questionnaire.

    Science.gov (United States)

    Eshghi, Mohammad Ali; Kordi, Ramin; Memari, Amir Hossein; Ghaziasgar, Ahmad; Mansournia, Mohammad-Ali; Zamani Sani, Seyed Hojjat

    2015-01-01

    The Youth Sport Environment Questionnaire (YSEQ) had been developed from Group Environment Questionnaire, a well-known measure of team cohesion. The aim of this study was to adapt and examine the reliability and validity of the Farsi version of the YSEQ. This version was completed by 455 athletes aged 13-17 years. Results of confirmatory factor analysis indicated that two-factor solution showed a good fit to the data. The results also revealed that the Farsi YSEQ showed high internal consistency, test-retest reliability, and good concurrent validity. This study indicated that the Farsi version of the YSEQ is a valid and reliable measure to assess team cohesion in sport setting.

  2. Reliability of levator scapulae index in subjects with and without scapular downward rotation syndrome.

    Science.gov (United States)

    Lee, Ji-Hyun; Cynn, Heon-Seock; Choi, Woo-Jeong; Jeong, Hyo-Jung; Yoon, Tae-Lim

    2016-05-01

    The objective of this study was to introduce levator scapulae (LS) measurement using a caliper and the levator scapulae index (LSI) and to investigate intra- and interrater reliability of the LSI in subjects with and without scapular downward rotation syndrome (SDRS). Two raters measured LS length twice in 38 subjects (19 with SDRS and 19 without SDRS). For reliability testing, intraclass correlation coefficients (ICCs), standard error of measurement (SEM), and minimal detectable change (MDC) were calculated. Intrarater reliability analysis resulted with ICCs ranging from 0.94 to 0.98 in subjects with SDRS and 0.96 to 0.98 in subjects without SDRS. These results represented that intrarater reliability in both groups were excellent for measuring LS length with the LSI. Interrater reliability was good (ICC: 0.82) in subjects with SDRS; however, interrater reliability was moderate (ICC: 0.75) in subjects without SDRS. Additionally, SEM and MDC were 0.13% and 0.36% in subjects with SDRS and 0.35% and 0.97% in subjects without SDRS. In subjects with SDRS, low dispersion of the measurement errors and MDC were shown. This study suggested that the LSI is a reliable method to measure LS length and is more reliable for subjects with SDRS. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. An Integrated Approach to Establish Validity and Reliability of Reading Tests

    Science.gov (United States)

    Razi, Salim

    2012-01-01

    This study presents the processes of developing and establishing reliability and validity of a reading test by administering an integrative approach as conventional reliability and validity measures superficially reveals the difficulty of a reading test. In this respect, analysing vocabulary frequency of the test is regarded as a more eligible way…

  4. A Valid and Reliable Tool to Assess Nursing Students` Clinical Performance

    OpenAIRE

    Mehrnoosh Pazargadi; Tahereh Ashktorab; Sharareh Khosravi; Hamid Alavi majd

    2013-01-01

    Background: The necessity of a valid and reliable assessment tool is one of the most repeated issues in nursing students` clinical evaluation. But it is believed that present tools are not mostly valid and can not assess students` performance properly.Objectives: This study was conducted to design a valid and reliable assessment tool for evaluating nursing students` performance in clinical education.Methods: In this methodological study considering nursing students` performance definition; th...

  5. Reliability and validity of television food advertising questionnaire in Malaysia.

    Science.gov (United States)

    Zalma, Abdul Razak; Safiah, Md Yusof; Ajau, Danis; Khairil Anuar, Md Isa

    2015-09-01

    Interventions to counter the influence of television food advertising amongst children are important. Thus, reliable and valid instrument to assess its effect is needed. The objective of this study was to determine the reliability and validity of such a questionnaire. The questionnaire was administered twice on 32 primary schoolchildren aged 10-11 years in Selangor, Malaysia. The interval between the first and second administration was 2 weeks. Test-retest method was used to examine the reliability of the questionnaire. Intra-rater reliability was determined by kappa coefficient and internal consistency by Cronbach's alpha coefficient. Construct validity was evaluated using factor analysis. The test-retest correlation showed moderate-to-high reliability for all scores (r = 0.40*, p = 0.02 to r = 0.95**, p = 0.00), with one exception, consumption of fast foods (r = 0.24, p = 0.20). Kappa coefficient showed acceptable-to-strong intra-rater reliability (K = 0.40-0.92), except for two items under knowledge on television food advertising (K = 0.26 and K = 0.21) and one item under preference for healthier foods (K = 0.33). Cronbach's alpha coefficient indicated acceptable internal consistency for all scores (0.45-0.60). After deleting two items under Consumption of Commonly Advertised Food, the items showed moderate-to-high loading (0.52, 0.84, 0.42 and 0.42) with the Scree plot showing that there was only one factor. The Kaiser-Meyer-Olkin was 0.60, showing that the sample was adequate for factor analysis. The questionnaire on television food advertising is reliable and valid to assess the effect of media literacy education on television food advertising on schoolchildren. © The Author (2013). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  6. STOPP (Screening Tool of Older Person's Prescriptions) and START (Screening Tool to Alert doctors to Right Treatment). Consensus validation.

    LENUS (Irish Health Repository)

    Gallagher, P

    2012-02-03

    OBJECTIVE: Older people experience more concurrent illnesses, are prescribed more medications and suffer more adverse drug events than younger people. Many drugs predispose older people to adverse events such as falls and cognitive impairment, thus increasing morbidity and health resource utilization. At the same time, older people are often denied potentially beneficial, clinically indicated medications without a valid reason. We aimed to validate a new screening tool of older persons\\' prescriptions incorporating criteria for potentially inappropriate drugs called STOPP (Screening Tool of Older Persons\\' Prescriptions) and criteria for potentially appropriate, indicated drugs called START (Screening Tool to Alert doctors to Right, i.e. appropriate, indicated Treatment). METHODS: A Delphi consensus technique was used to establish the content validity of STOPP\\/START. An 18-member expert panel from academic centers in Ireland and the United Kingdom completed two rounds of the Delphi process by mail survey. Inter-rater reliability was assessed by determining the kappa-statistic for measure of agreement on 100 data-sets. RESULTS: STOPP is comprised of 65 clinically significant criteria for potentially inappropriate prescribing in older people. Each criterion is accompanied by a concise explanation as to why the prescribing practice is potentially inappropriate. START consists of 22 evidence-based prescribing indicators for commonly encountered diseases in older people. Inter-rater reliability is favorable with a kappa-coefficient of 0.75 for STOPP and 0.68 for START. CONCLUSION: STOPP\\/START is a valid, reliable and comprehensive screening tool that enables the prescribing physician to appraise an older patient\\'s prescription drugs in the context of his\\/her concurrent diagnoses.

  7. Health Service Quality Scale: Brazilian Portuguese translation, reliability and validity.

    Science.gov (United States)

    Rocha, Luiz Roberto Martins; Veiga, Daniela Francescato; e Oliveira, Paulo Rocha; Song, Elaine Horibe; Ferreira, Lydia Masako

    2013-01-17

    The Health Service Quality Scale is a multidimensional hierarchical scale that is based on interdisciplinary approach. This instrument was specifically created for measuring health service quality based on marketing and health care concepts. The aim of this study was to translate and culturally adapt the Health Service Quality Scale into Brazilian Portuguese and to assess the validity and reliability of the Brazilian Portuguese version of the instrument. We conducted a cross-sectional, observational study, with public health system patients in a Brazilian university hospital. Validity was assessed using Pearson's correlation coefficient to measure the strength of the association between the Brazilian Portuguese version of the instrument and the SERVQUAL scale. Internal consistency was evaluated using Cronbach's alpha coefficient; the intraclass (ICC) and Pearson's correlation coefficients were used for test-retest reliability. One hundred and sixteen consecutive postoperative patients completed the questionnaire. Pearson's correlation coefficient for validity was 0.20. Cronbach's alpha for the first and second administrations of the final version of the instrument were 0.982 and 0.986, respectively. For test-retest reliability, Pearson's correlation coefficient was 0.89 and ICC was 0.90. The culturally adapted, Brazilian Portuguese version of the Health Service Quality Scale is a valid and reliable instrument to measure health service quality.

  8. Validity and reliability of the Turkish version of the Optimality Index-US (OI-US) to assess maternity care outcomes.

    Science.gov (United States)

    Yucel, Cigdem; Taskin, Lale; Low, Lisa Kane

    2015-12-01

    Although obstetrical interventions are used commonly in Turkey, there is no standardized evidence-based assessment tool to evaluate maternity care outcomes. The Optimality Index-US (OI-US) is an evidence-based tool that was developed for the purpose of measuring aggregate perinatal care processes and outcomes against an optimal or best possible standard. This index has been validated and used in Netherlands, USA and UK until now. The objective of this study was to adapt the OI-US to assess maternity care outcomes in Turkey. Translation and back translation were used to develop the Optimality Index-Turkey (OI-TR) version. To evaluate the content validity of the OI-TR, an expert panel group (n=10) reviewed the items and evidence-based quality of the OI-TR for application in Turkey. Following the content validity process, the OI-TR was used to assess 150 healthy and 150 high-risk pregnant women who gave birth at a high volume, urban maternity hospital in Turkey. The scores between the two groups were compared to assess the discriminant validity of the OI-TR. The percentage of agreement between two raters and the Kappa statistic were calculated to evaluate the reliability. Content validity was established for the OI-TR by an expert group. Discriminant validity was confirmed by comparing the OI scores of healthy pregnant women (mean OI score=77.65%) and those of high-risk pregnant women (mean OI score=78.60%). The percentage of agreement between the two raters was 96.19, and inter-rater agreement was provided for each item in the OI-TR. OI-TR is a valid and reliable tool that can be used to assess maternity care outcomes in Turkey. The results of this study indicate that although the risk statuses of the women differed, the type of care they received was essentially the same, as measured by the OI-TR. Care was not individualised based on risk and for a majority of items was inconsistent with evidence based practice, which is not optimal. Use of the OI-TR will help to

  9. The Children's Play Therapy Instrument (CPTI): Description, Development, and Reliability Studies

    Science.gov (United States)

    Kernberg, Paulina F.; Chazan, Saralea E.; Normandin, Lina

    1998-01-01

    The Children's Play Therapy Instrument (CPTI), its development, and reliability studies are described. The CPTI is a new instrument to examine a child's play activity in individual psychotherapy. Three independent raters used the CPTI to rate eight videotaped play therapy vignettes. Results were compared with the authors' consensual scores from a preliminary study. Generally good to excellent levels of interrater reliability were obtained for the independent raters on intraclass correlation coefficients for ordinal categories of the CPTI. Likewise, kappa levels were acceptable to excellent for nominal categories of the scale. The CPTI holds promise to become a reliable measure of play activity in child psychotherapy. Further research is needed to assess discriminant validity of the CPTI for use as a diagnostic tool and as a measure of process and outcome.(The Journal of Psychotherapy Practice and Research 1998; 7:196–207) PMID:9631341

  10. Construction and Evaluation of Reliability and Validity of Reasoning Ability Test

    Science.gov (United States)

    Bhat, Mehraj A.

    2014-01-01

    This paper is based on the construction and evaluation of reliability and validity of reasoning ability test at secondary school students. In this paper an attempt was made to evaluate validity, reliability and to determine the appropriate standards to interpret the results of reasoning ability test. The test includes 45 items to measure six types…

  11. Validity evidence for the Fundamentals of Laparoscopic Surgery (FLS) program as an assessment tool: a systematic review.

    Science.gov (United States)

    Zendejas, Benjamin; Ruparel, Raaj K; Cook, David A

    2016-02-01

    The Fundamentals of Laparoscopic Surgery (FLS) program uses five simulation stations (peg transfer, precision cutting, loop ligation, and suturing with extracorporeal and intracorporeal knot tying) to teach and assess laparoscopic surgery skills. We sought to summarize evidence regarding the validity of scores from the FLS assessment. We systematically searched for studies evaluating the FLS as an assessment tool (last search update February 26, 2013). We classified validity evidence using the currently standard validity framework (content, response process, internal structure, relations with other variables, and consequences). From a pool of 11,628 studies, we identified 23 studies reporting validity evidence for FLS scores. Studies involved residents (n = 19), practicing physicians (n = 17), and medical students (n = 8), in specialties of general (n = 17), gynecologic (n = 4), urologic (n = 1), and veterinary (n = 1) surgery. Evidence was most common in the form of relations with other variables (n = 22, most often expert-novice differences). Only three studies reported internal structure evidence (inter-rater or inter-station reliability), two studies reported content evidence (i.e., derivation of assessment elements), and three studies reported consequences evidence (definition of pass/fail thresholds). Evidence nearly always supported the validity of FLS total scores. However, the loop ligation task lacks discriminatory ability. Validity evidence confirms expected relations with other variables and acceptable inter-rater reliability, but other validity evidence is sparse. Given the high-stakes use of this assessment (required for board eligibility), we suggest that more validity evidence is required, especially to support its content (selection of tasks and scoring rubric) and the consequences (favorable and unfavorable impact) of assessment.

  12. Validity and Reliability of Farsi Version of Youth Sport Environment Questionnaire

    Directory of Open Access Journals (Sweden)

    Mohammad Ali Eshghi

    2015-01-01

    Full Text Available The Youth Sport Environment Questionnaire (YSEQ had been developed from Group Environment Questionnaire, a well-known measure of team cohesion. The aim of this study was to adapt and examine the reliability and validity of the Farsi version of the YSEQ. This version was completed by 455 athletes aged 13–17 years. Results of confirmatory factor analysis indicated that two-factor solution showed a good fit to the data. The results also revealed that the Farsi YSEQ showed high internal consistency, test-retest reliability, and good concurrent validity. This study indicated that the Farsi version of the YSEQ is a valid and reliable measure to assess team cohesion in sport setting.

  13. Reliability of capturing foot parameters using digital scanning and the neutral suspension casting technique

    Science.gov (United States)

    2011-01-01

    Background A clinical study was conducted to determine the intra and inter-rater reliability of digital scanning and the neutral suspension casting technique to measure six foot parameters. The neutral suspension casting technique is a commonly utilised method for obtaining a negative impression of the foot prior to orthotic fabrication. Digital scanning offers an alternative to the traditional plaster of Paris techniques. Methods Twenty one healthy participants volunteered to take part in the study. Six casts and six digital scans were obtained from each participant by two raters of differing clinical experience. The foot parameters chosen for investigation were cast length (mm), forefoot width (mm), rearfoot width (mm), medial arch height (mm), lateral arch height (mm) and forefoot to rearfoot alignment (degrees). Intraclass correlation coefficients (ICC) with 95% confidence intervals (CI) were calculated to determine the intra and inter-rater reliability. Measurement error was assessed through the calculation of the standard error of the measurement (SEM) and smallest real difference (SRD). Results ICC values for all foot parameters using digital scanning ranged between 0.81-0.99 for both intra and inter-rater reliability. For neutral suspension casting technique inter-rater reliability values ranged from 0.57-0.99 and intra-rater reliability values ranging from 0.36-0.99 for rater 1 and 0.49-0.99 for rater 2. Conclusions The findings of this study indicate that digital scanning is a reliable technique, irrespective of clinical experience, with reduced measurement variability in all foot parameters investigated when compared to neutral suspension casting. PMID:21375757

  14. RELIABILITY AND VALIDITY OF SUBJECTIVE ASSESSMENT OF LUMBAR LORDOSIS IN CONVENTIONAL RADIOGRAPHY.

    Science.gov (United States)

    Ruhinda, E; Byanyima, R K; Mugerwa, H

    2014-10-01

    Reliability and validity studies of different lumbar curvature analysis and measurement techniques have been documented however there is limited literature on the reliability and validity of subjective visual analysis. Radiological assessment of lumbar lordotic curve aids in early diagnosis of conditions even before neurologic changes set in. To ascertain the level of reliability and validity of subjective assessment of lumbar lordosis in conventional radiography. A blinded, repeated-measures diagnostic test was carried out on lumbar spine x-ray radiographs. Radiology Department at Joint Clinical Research Centre (JCRC), Mengo-Kampala-Uganda. Seventy (70) lateral lumbar x-ray films were used for this study and were obtained from the archive of JCRC radiology department at Butikiro house, Mengo-Kampala. Poor observer agreement, both inter- and intra-observer, with kappa values of 0.16 was found. Inter-observer agreement was poorer than intra-observer agreement. Kappa values significantly rose when the lumbar lordosis was clustered into four categories without grading each abnormality. The results confirm that subjective assessment of lumbar lordosis has low reliability and validity. Film quality has limited influence on the observer reliability. This study further shows that fewer scale categories of lordosis abnormalities produce better observer reliability.

  15. The Danish anal sphincter rupture questionnaire: Validity and reliability

    DEFF Research Database (Denmark)

    Due, Ulla; Ottesen, Marianne

    2008-01-01

    Objective. To revise, validate and test for reliability an anal sphincter rupture questionnaire in relation to construct, content and face validity. Setting and background. Since 1996 women with anal sphincter rupture (ASR) at one of the public university hospitals in Copenhagen, Denmark have bee...

  16. [Reliability and Validity of the Scale for Homophobia in Medicine Students].

    Science.gov (United States)

    Campo-Arias, Adalberto; Lafaurie, María Mercedes; Gaitán-Duarte, Hernando G

    2012-12-01

    There are several scales to quantify homophobia in different populations. However, the reliability and validity of these instruments among Colombian students are unknown. Consequently, this work is intended to assess reliability (inner consistency) as well as the validity of the Scale for Homophobia in Medicine students from a private university in Bogotá (Colombia). Methodological study with 199 Medicine students from 1st to 5th semester that filled out the Homophobia Scale form, the general welfare questionnaire, the Attitude Towards Gays and Lesbians Scale (ATGL), WHO-5 (divergent validity) and the Francis Scale of Attitude Toward Christianity (nomologic validity). Pearson's correlations were computed, the Cronbach's alfa coefficient, the omega coefficient (construct's reliability) and confirmatory factorial analysis. The Scale for Homophobia showed an alpha Cronbach coefficient of 0,785, an omega coefficient of 0,790 and a Pearson correlation with the ATGL of 0,844; with WHO-5, -0,059; and a Francis Scale of Attitude Toward Christianity, 0,187. The Scale toward Homophobia exhibited a relevant factor of 44,7% of the total variance. The Scale for Homophobia showed acceptable reliability and validity. New studies should investigate the stability of the scale and the nomologic validity regarding other constructs. Copyright © 2012 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.

  17. Reliability and validity of the Turkish version of the Structured Clinical Interview for DSM-IV Dissociative Disorders (SCID-D): a preliminary study.

    Science.gov (United States)

    Kundakçi, Turgut; Sar, Vedat; Kiziltan, Emre; Yargiç, Ilhan L; Tutkun, Hamdi

    2014-01-01

    A total of 34 consecutive patients with dissociative identity disorder or dissociative disorder not otherwise specified were evaluated using the Turkish version of the Structured Clinical Interview for DSM-IV Dissociative Disorders (SCID-D). They were compared with a matched control group composed of 34 patients who had a nondissociative psychiatric disorder. Interrater reliability was evaluated by 3 clinicians who assessed videotaped interviews conducted with 5 dissociative and 5 nondissociative patients. All subjects who were previously diagnosed by clinicians as having a dissociative disorder were identified as positive, and all subjects who were previously diagnosed as not having a dissociative disorder were identified as negative. The scores of the main symptom clusters and the total score of the SCID-D differentiated dissociative patients from the nondissociative group. There were strong correlations between the SCID-D and the Dissociative Experiences Scale total and subscale scores. These results are promising for the validity and reliability of the Turkish version of the SCID-D. However, as the present study was conducted on a predominantly female sample with very severe dissociation, these findings should not be generalized to male patients, to dissociative disorders other than dissociative identity disorder, or to broader clinical or nonclinical populations.

  18. The reliability of physical examination tests for the diagnosis of anterior cruciate ligament rupture--A systematic review.

    Science.gov (United States)

    Lange, Toni; Freiberg, Alice; Dröge, Patrik; Lützner, Jörg; Schmitt, Jochen; Kopkow, Christian

    2015-06-01

    Systematic literature review. Despite their frequent application in routine care, a systematic review on the reliability of clinical examination tests to evaluate the integrity of the ACL is missing. To summarize and evaluate intra- and interrater reliability research on physical examination tests used for the diagnosis of ACL tears. A comprehensive systematic literature search was conducted in MEDLINE, EMBASE and AMED until May 30th 2013. Studies were included if they assessed the intra- and/or interrater reliability of physical examination tests for the integrity of the ACL. Methodological quality was evaluated with the Quality Appraisal of Reliability Studies (QAREL) tool by two independent reviewers. 110 hits were achieved of which seven articles finally met the inclusion criteria. These studies examined the reliability of four physical examination tests. Intrarater reliability was assessed in three studies and ranged from fair to almost perfect (Cohen's k = 0.22-1.00). Interrater reliability was assessed in all included studies and ranged from slight to almost perfect (Cohen's k = 0.02-0.81). The Lachman test is the physical tests with the highest intrarater reliability (Cohen's k = 1.00), the Lachman test performed in prone position the test with the highest interrater reliability (Cohen's k = 0.81). Included studies were partly of low methodological quality. A meta-analysis could not be performed due to the heterogeneity in study populations, reliability measures and methodological quality of included studies. Systematic investigations on the reliability of physical examination tests to assess the integrity of the ACL are scarce and of varying methodological quality. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. The Surgical Safety Checklist and Teamwork Coaching Tools: a study of inter-rater reliability.

    Science.gov (United States)

    Huang, Lyen C; Conley, Dante; Lipsitz, Stu; Wright, Christopher C; Diller, Thomas W; Edmondson, Lizabeth; Berry, William R; Singer, Sara J

    2014-08-01

    To assess the inter-rater reliability (IRR) of two novel observation tools for measuring surgical safety checklist performance and teamwork. Data surgical safety checklists can promote adherence to standards of care and improve teamwork in the operating room. Their use has been associated with reductions in mortality and other postoperative complications. However, checklist effectiveness depends on how well they are performed. Authors from the Safe Surgery 2015 initiative developed a pair of novel observation tools through literature review, expert consultation and end-user testing. In one South Carolina hospital participating in the initiative, two observers jointly attended 50 surgical cases and independently rated surgical teams using both tools. We used descriptive statistics to measure checklist performance and teamwork at the hospital. We assessed IRR by measuring percent agreement, Cohen's κ, and weighted κ scores. The overall percent agreement and κ between the two observers was 93% and 0.74 (95% CI 0.66 to 0.79), respectively, for the Checklist Coaching Tool and 86% and 0.84 (95% CI 0.77 to 0.90) for the Surgical Teamwork Tool. Percent agreement for individual sections of both tools was 79% or higher. Additionally, κ scores for six of eight sections on the Checklist Coaching Tool and for two of five domains on the Surgical Teamwork Tool achieved the desired 0.7 threshold. However, teamwork scores were high and variation was limited. There were no significant changes in the percent agreement or κ scores between the first 10 and last 10 cases observed. Both tools demonstrated substantial IRR and required limited training to use. These instruments may be used to observe checklist performance and teamwork in the operating room. However, further refinement and calibration of observer expectations, particularly in rating teamwork, could improve the utility of the tools. Published by the BMJ Publishing Group Limited. For permission to use (where not already

  20. The Vocal Cord Dysfunction Questionnaire: Validity and Reliability of the Persian Version.

    Science.gov (United States)

    Ghaemi, Hamide; Khoddami, Seyyedeh Maryam; Soleymani, Zahra; Zandieh, Fariborz; Jalaie, Shohreh; Ahanchian, Hamid; Khadivi, Ehsan

    2017-12-25

    The aim of this study was to develop, validate, and assess the reliability of the Persian version of Vocal Cord Dysfunction Questionnaire (VCDQ P ). The study design was cross-sectional or cultural survey. Forty-four patients with vocal fold dysfunction (VFD) and 40 healthy volunteers were recruited for the study. To assess the content validity, the prefinal questions were given to 15 experts to comment on its essential. Ten patients with VFD rated the importance of VCDQ P in detecting face validity. Eighteen of the patients with VFD completed the VCDQ 1 week later for test-retest reliability. To detect absolute reliability, standard error of measurement and smallest detected change were calculated. Concurrent validity was assessed by completing the Persian Chronic Obstructive Pulmonary Disease (COPD) Assessment Test (CAT) by 34 patients with VFD. Discriminant validity was measured from 34 participants. The VCDQ was further validated by administering the questionnaire to 40 healthy volunteers. Validation of the VCDQ as a treatment outcome tool was conducted in 18 patients with VFD using pre- and posttreatment scores. The internal consistency was confirmed (Cronbach α = 0.78). The test-retest reliability was excellent (intraclass correlation coefficient = 0.97). The standard error of measurement and smallest detected change values were acceptable (0.39 and 1.08, respectively). There was a significant correlation between the VCDQ P and the CAT total scores (P validity was significantly different. The VCDQ scores in patients with VFD before and after treatment was significantly different (P valid and reliable self-administered questionnaire in Persian-speaking population. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.