WorldWideScience

Sample records for internal consistency inter-rater

  1. Effects of Rating Training on Inter-Rater Consistency for Developing a Dental Hygiene Clinical Rater Qualification System

    Directory of Open Access Journals (Sweden)

    Jeong Ran Park

    2007-12-01

    Full Text Available We tried to develop itemized evaluation criteria and a clinical rater qualification system through rating training of inter-rater consistency for experienced clinical dental hygienists and dental hygiene clinical educators. A total of 15 clinical dental hygienists with 1-year careers participated as clinical examination candidates, while 5 dental hygienists with 3-year educations and clinical careers or longer participated as clinical raters. They all took the clinical examination as examinees. The results were compared, and the consistency of competence was measured. The comparison of clinical competence between candidates and clinical raters showed that the candidate group?占퐏 mean clinical competence ranged from 2.96 to 3.55 on a 5-point system in a total of 3 instruments (Probe, Explorer, Curet, while the clinical rater group?占퐏 mean clinical competence ranged from 4.05 to 4.29. There was a higher inter-rater consistency after education of raters in the following 4 items: Probe, Explorer, Curet, and insertion on distal surface. The mean score distribution of clinical raters ranged from 75% to 100%, which was more uniform in the competence to detect an artificial calculus than that of candidates (25% to 100%. According to the above results, there was a necessity in the operating clinical rater qualification system for comprehensive dental hygiene clinicians. Furthermore, in order to execute the clinical rater qualification system, it will be necessary to keep conducting a series of studies on educational content, time, frequency, and educator level.

  2. International inter-rater agreement in scoring acne severity utilizing cloud-based image sharing of mobile phone photographs.

    Science.gov (United States)

    Foolad, Negar; Ornelas, Jennifer N; Clark, Ashley K; Ali, Ifrah; Sharon, Victoria R; Al Mubarak, Luluah; Lopez, Andrés; Alikhan, Ali; Al Dabagh, Bishr; Firooz, Alireza; Awasthi, Smita; Liu, Yu; Li, Chin-Shang; Sivamani, Raja K

    2017-09-01

    Cloud-based image sharing technology allows facilitated sharing of images. Cloud-based image sharing technology has not been well-studied for acne assessments or treatment preferences, among international evaluators. We evaluated inter-rater variability of acne grading and treatment recommendations among an international group of dermatologists that assessed photographs. This is a prospective, single visit photographic study to assess inter-rater agreement of acne photographs shared through an integrated mobile device, cloud-based, and HIPAA-compliant platform. Inter-rater agreements for global acne assessment and acne lesion counts were evaluated by the Kendall's coefficient of concordance while correlations between treatment recommendations and acne severity were calculated by Spearman's rank correlation coefficient. There was good agreement for the evaluation of inflammatory lesions (KCC = 0.62, P cloud-based image sharing for acne assessment. Cloud-based sharing may facilitate acne care and research among international collaborators. © 2017 The International Society of Dermatology.

  3. Inter-rater and intra-rater reliability of a clinical protocol for measuring turnout in collegiate dancers.

    Science.gov (United States)

    Greene, Amanda; Lasner, Andrea; Deu, Rajwinder; Oliphant, Seth; Johnson, Kenneth

    2018-02-02

    Reliable methods of measuring turnout in dancers and comparing active turnout (used in class) with functional (uncompensated) turnout are needed. Authors have suggested measurement techniques but there is no clinically useful, easily reproducible technique with established inter-rater and intra-rater reliability. We adapted a technique based on previous research, which is easily reproducible. We hypothesized excellent inter-rater and intra-rater reliability between experienced physical therapists (PTs) and a briefly trained faculty member from a university's department of dance. Thirty-two participants were recruited from the same dance department. Dancers' active and functional turnout was measured by each rater. We found that our technique for measuring active and functional turnout has excellent inter-rater and intra-rater reliability when performed by two experienced PTs and by one briefly trained university-level dance faculty member. For active turnout, inter-rater reliability was 0.78 among all raters and 0.82 among only the PT raters; intra-rater reliability was 0.82 among all raters and 0.85 among only the PT raters. For functional turnout, inter-rater reliability was 0.86 among all raters and 0.88 among only the PT raters; intra-rater reliability was 0.87 among all raters and 0.88 among only the PT raters. The measurement technique described provides a standardized protocol with excellent inter-rater and intra-rater reliability when performed by experienced PTs or by a briefly trained university-level dance faculty member.

  4. Inter-rater and intra-rater reliability of the Bahasa Melayu version of Rose Angina Questionnaire.

    Science.gov (United States)

    Hassan, N B; Choudhury, S R; Naing, L; Conroy, R M; Rahman, A R A

    2007-01-01

    The objective of the study is to translate the Rose Questionnaire (RQ) into a Bahasa Melayu version and adapt it cross-culturally, and to measure its inter-rater and intrarater reliability. This cross sectional study was conducted in the respondents' homes or workplaces in Kelantan, Malaysia. One hundred respondents aged 30 and above with different socio-demographic status were interviewed for face validity. For each inter-rater and intra-rater reliability, a sample of 150 respondents was interviewed. Inter-rater and intra-rater reliabilities were assessed by Cohen's kappa. The overall inter-rater agreements by the five pair of interviewers at point one and two were 0.86, and intrarater reliability by the five interviewers on the seven-item questionnaire at poinone and two was 0.88, as measured by kappa coefficient. The translated Malay version of RQ demonstrated an almost perfect inter-rater and intra-rater reliability and further validation such as sensitivity and specificity analysis of this translated questionnaire is highly recommended.

  5. Inter-rater and intra-rater reliability of a movement control test in shoulder.

    Science.gov (United States)

    Rajasekar, S; Bangera, Rakshith K; Sekaran, Padmanaban

    2017-07-01

    Movement faults are commonly observed in patients with musculoskeletal pain. The Kinetic Medial Rotation Test (KMRT) is a movement control test used to identify movement faults of the scapula and gleno-humeral joints during arm movement. Objective tests such as the KMRT need to be reliable and valid for the results to be applied across different clinical settings and patient populations. The primary objective of the present study was to determine the intra-rater and inter-rater reliability of KMRT in subjects with and without shoulder pain. Sixty subjects were included in this study based on specific inclusion and exclusion criteria. Two musculoskeletal physiotherapists with different levels of clinical experience performed the tests. The intra-rater reliability was tested in twenty asymptomatic subjects by a single assessor at two week intervals. An equal number of subjects with and without shoulder pain were tested by both the assessors to determine the inter-rater reliability. Both components of the KMRT, the Gleno- Humeral Anterior Translation (GHAT) and the Scapular Forward Tilt (SCFT) were tested. The Kappa values for inter-rater reliability of the GHAT and SCFT were K = 0.68 & K = 0.65 respectively in subjects with shoulder pain. In asymptomatic subjects, the inter-rater reliability of GHAT was K = 0.61 and SCFT was K = 0.85. Intra-rater reliability ranged from K = 0.66 for GHAT to K = 0.87 for SCFT. Our study found substantial agreement in inter-rater reliability of KMRT in subjects with shoulder pain, whereas substantial to near perfect agreement was found in intra-rater and inter-rater reliability of KMRT in subjects without shoulder pain. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Introducing a new definition of a near fall: intra-rater and inter-rater reliability.

    Science.gov (United States)

    Maidan, I; Freedman, T; Tzemah, R; Giladi, N; Mirelman, A; Hausdorff, J M

    2014-01-01

    Near falls (NFs) are more frequent than falls, and may occur before falls, potentially predicting fall risk. As such, identification of a NF is important. We aimed to assess intra and inter-rater reliability of the traditional definition of a NF and to demonstrate the potential utility of a new definition. To this end, 10 older adults, 10 idiopathic elderly fallers, and 10 patients with Parkinson's disease (PD) walked in an obstacle course while wearing a safety harness. All walks were videotaped. Forty-nine video segments were extracted to create 2 clips each of 8.48 min. Four raters scored each event using the traditional definition and, two weeks later, using the new definition. A fifth rater used only the new definition. Intra-rater reliability was determined using Kappa (K) statistics and inter-rater reliability was determined using ICC. Using the traditional definition, three raters had poor intra-rater reliability (K0.137) and one rater had moderate intra-rater reliability (K=0.624, pdefinition, inter-rater reliability between the four raters was moderate (ICC=0.667, pdefinition showed high intra-rater (K>0.601, pdefinition of NF is required. The results of the present study suggest that the proposed new definition increases intra and inter-rater reliability, a critical step for using NFs to quantify fall risk. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Simulated patient training: Using inter-rater reliability to evaluate simulated patient consistency in nursing education.

    Science.gov (United States)

    MacLean, Sharon; Geddes, Fiona; Kelly, Michelle; Della, Phillip

    2018-03-01

    Simulated patients (SPs) are frequently used for training nursing students in communication skills. An acknowledged benefit of using SPs is the opportunity to provide a standardized approach by which participants can demonstrate and develop communication skills. However, relatively little evidence is available on how to best facilitate and evaluate the reliability and accuracy of SPs' performances. The aim of this study is to investigate the effectiveness of an evidenced based SP training framework to ensure standardization of SPs. The training framework was employed to improve inter-rater reliability of SPs. A quasi-experimental study was employed to assess SP post-training understanding of simulation scenario parameters using inter-rater reliability agreement indices. Two phases of data collection took place. Initially a trial phase including audio-visual (AV) recordings of two undergraduate nursing students completing a simulation scenario is rated by eight SPs using the Interpersonal Communication Assessments Scale (ICAS) and Quality of Discharge Teaching Scale (QDTS). In phase 2, eight SP raters and four nursing faculty raters independently evaluated students' (N=42) communication practices using the QDTS. Intraclass correlation coefficients (ICC) were >0.80 for both stages of the study in clinical communication skills. The results support the premise that if trained appropriately, SPs have a high degree of reliability and validity to both facilitate and evaluate student performance in nurse education. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.

  8. Measuring the quality of life in mild to very severe dementia: testing the inter-rater and intra-rater reliability of the German version of the QUALIDEM.

    Science.gov (United States)

    Dichter, Martin Nikolaus; Schwab, Christian G G; Meyer, Gabriele; Bartholomeyczik, Sabine; Dortmann, Olga; Halek, Margareta

    2014-05-01

    Quality of life (Qol) is an increasingly used outcome measure in dementia research. The QUALIDEM is a dementia-specific and proxy-rated Qol instrument. We aimed to determine the inter-rater and intra-rater reliability in residents with dementia in German nursing homes. The QUALIDEM consists of nine subscales that were applied to a sample of 108 people with mild to severe dementia and six consecutive subscales that were applied to a sample of 53 people with very severe dementia. The proxy raters were 49 registered nurses and nursing assistants. Inter-rater and intra-rater reliability scores were calculated on the subscale and item level. None of the QUALIDEM subscales showed strong inter-rater reliability based on the single-measure Intra-Class Correlation Coefficient (ICC) for absolute agreement ≥ 0.70. Based on the average-measure ICC for four raters, eight subscales for people with mild to severe dementia (care relationship, positive affect, negative affect, restless tense behavior, social relations, social isolation, feeling at home and having something to do) and five subscales for very severe dementia (care relationship, negative affect, restless tense behavior, social relations and social isolation) yielded a strong inter-rater agreement (ICC: 0.72-0.86). All of the QUALIDEM subscales, regardless of dementia severity, showed strong intra-rater agreement. The ICC values ranged between 0.70 and 0.79 for people with mild to severe dementia and between 0.75 and 0.87 for people with very severe dementia. This study demonstrated insufficient inter-rater reliability and sufficient intra-rater reliability for all subscales of both versions of the German QUALIDEM. The degree of inter-rater reliability can be improved by collaborative Qol rating by more than one nurse. The development of a measurement manual with accurate item definitions and a standardized education program for proxy raters is recommended.

  9. Intra-rater and inter-rater reliability of a medical record abstraction study on transition of care after childhood cancer.

    Directory of Open Access Journals (Sweden)

    Micòl E Gianinazzi

    Full Text Available The abstraction of data from medical records is a widespread practice in epidemiological research. However, studies using this means of data collection rarely report reliability. Within the Transition after Childhood Cancer Study (TaCC which is based on a medical record abstraction, we conducted a second independent abstraction of data with the aim to assess a intra-rater reliability of one rater at two time points; b the possible learning effects between these two time points compared to a gold-standard; and c inter-rater reliability.Within the TaCC study we conducted a systematic medical record abstraction in the 9 Swiss clinics with pediatric oncology wards. In a second phase we selected a subsample of medical records in 3 clinics to conduct a second independent abstraction. We then assessed intra-rater reliability at two time points, the learning effect over time (comparing each rater at two time-points with a gold-standard and the inter-rater reliability of a selected number of variables. We calculated percentage agreement and Cohen's kappa.For the assessment of the intra-rater reliability we included 154 records (80 for rater 1; 74 for rater 2. For the inter-rater reliability we could include 70 records. Intra-rater reliability was substantial to excellent (Cohen's kappa 0-6-0.8 with an observed percentage agreement of 75%-95%. In all variables learning effects were observed. Inter-rater reliability was substantial to excellent (Cohen's kappa 0.70-0.83 with high agreement ranging from 86% to 100%.Our study showed that data abstracted from medical records are reliable. Investigating intra-rater and inter-rater reliability can give confidence to draw conclusions from the abstracted data and increase data quality by minimizing systematic errors.

  10. Inter-rater agreement on PIVC-associated phlebitis signs, symptoms and scales.

    Science.gov (United States)

    Marsh, Nicole; Mihala, Gabor; Ray-Barruel, Gillian; Webster, Joan; Wallis, Marianne C; Rickard, Claire M

    2015-10-01

    Many peripheral intravenous catheter (PIVC) infusion phlebitis scales and definitions are used internationally, although no existing scale has demonstrated comprehensive reliability and validity. We examined inter-rater agreement between registered nurses on signs, symptoms and scales commonly used in phlebitis assessment. Seven PIVC-associated phlebitis signs/symptoms (pain, tenderness, swelling, erythema, palpable venous cord, purulent discharge and warmth) were observed daily by two raters (a research nurse and registered nurse). These data were modelled into phlebitis scores using 10 different tools. Proportions of agreement (e.g. positive, negative), observed and expected agreements, Cohen's kappa, the maximum achievable kappa, prevalence- and bias-adjusted kappa were calculated. Two hundred ten patients were recruited across three hospitals, with 247 sets of paired observations undertaken. The second rater was blinded to the first's findings. The Catney and Rittenberg scales were the most sensitive (phlebitis in >20% of observations), whereas the Curran, Lanbeck and Rickard scales were the most restrictive (≤2% phlebitis). Only tenderness and the Catney (one of pain, tenderness, erythema or palpable cord) and Rittenberg scales (one of erythema, swelling, tenderness or pain) had acceptable (more than two-thirds, 66.7%) levels of inter-rater agreement. Inter-rater agreement for phlebitis assessment signs/symptoms and scales is low. This likely contributes to the high degree of variability in phlebitis rates in literature. We recommend further research into assessment of infrequent signs/symptoms and the Catney or Rittenberg scales. New approaches to evaluating vein irritation that are valid, reliable and based on their ability to predict complications need exploration. © 2015 John Wiley & Sons, Ltd.

  11. Computerized back postural assessment in physiotherapy practice: Intra-rater and inter-rater reliability of the MIDAS system.

    Science.gov (United States)

    McAlpine, R T; Bettany-Saltikov, J A; Warren, J G

    2009-01-01

    Assessment of spinal posture during physiotherapy practice is routine, yet few objective measures exist to this end. The Middlesbrough Integrated Digital Assessment System (MIDAS) is a low cost portable system able to record 3D information on posture. The purpose of this study was to assess both the intra-rater and inter-rater reliability of the MIDAS system. Twenty-five healthy subjects were recruited. A repeated measures design was used to record fifteen pre-palpated landmarks on the back of each subject. To limit the sources of variability, the principal researcher palpated the landmarks for each subject. Each of three raters took two measurements on each subject in a standardized upright posture. X (medio-lateral), Y (antero-posterior) and Z (height) landmark positions were recorded via a computer interface. Both intra-rater agreement (mean ICCs - rater 1 r=0.970, rater 2 r=0.965 and rater 3 r=0.965, pMIDAS demonstrated both high inter-rater and intra-rater reliability and provides an objective method for the assessment of posture in physiotherapy practice.

  12. Can Physicians Identify Inappropriate Nuclear Stress Tests? An Examination of Inter-rater Reliability for the 2009 Appropriate Use Criteria for Radionuclide Imaging

    Science.gov (United States)

    Ye, Siqin; Rabbani, LeRoy E.; Kelly, Christopher R.; Kelly, Maureen R.; Lewis, Matthew; Paz, Yehuda; Peck, Clara L.; Rao, Shaline; Bokhari, Sabahat; Weiner, Shepard D.; Einstein, Andrew J.

    2014-01-01

    Background We sought to determine inter-rater reliability of the 2009 Appropriate Use Criteria (AUC) for radionuclide imaging (RNI) and whether physicians at various levels of training can effectively identify nuclear stress tests with inappropriate indications. Methods and Results Four hundred patients were randomly selected from a consecutive cohort of patients undergoing nuclear stress testing at an academic medical center. Raters with different levels of training (including cardiology attending physicians, cardiology fellows, internal medicine hospitalists, and internal medicine interns) classified individual nuclear stress tests using the 2009 AUC. Consensus classification by two cardiologists was considered the operational gold standard, and sensitivity and specificity of individual raters for identifying inappropriate tests was calculated. Inter-rater reliability of the AUC was assessed using Cohen’s kappa statistics for pairs of different raters. The mean age of patients was 61.5 years; 214 (54%) were female. The cardiologists rated 256 (64%) of 400 NSTs as appropriate, 68 (18%) as uncertain, 55 (14%) as inappropriate; 21 (5%) tests were unable to be classified. Inter-rater reliability for non-cardiologist raters was modest (unweighted Cohen’s kappa, 0.51, 95% confidence interval, 0.45 to 0.55). Sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, while specificity ranged from 85% to 97%. Conclusions Inter-rater reliability for the 2009 AUC for RNI is modest, and there is considerable variation in the ability of raters at different levels of training to identify inappropriate tests. PMID:25563660

  13. Inter- and intra-rater reliability of nasal auscultation in daycare children.

    Science.gov (United States)

    Santos, Rita; Silva Alexandrino, Ana; Tomé, David; Melo, Cristina; Mesquita Montes, António; Costa, Daniel; Pinto Ferreira, João

    2018-02-01

    The aim of this study was to assess nasal auscultation's intra- and inter-rater reliability and to analyze ear and respiratory clinical condition according to nasal auscultation. Cross-sectional study performed in 125 children aged up to 3 years old attending daycare centers. Nasal auscultation, tympanometry and Paediatric Respiratory Severity Score (PRSS) were applied to all children. Nasal sounds were classified by an expert panel in order to determine nasal auscultation's intra and inter- rater reliability. The classification of nasal sounds was assessed against tympanometric and PRSS values. Nasal auscultation revealed substantial inter-rater (K=0.75) and intra-rater (K=0.69; K=0.61 and K=0.72) reliability. Children with a "non-obstructed" classification revealed a lower peak pressure (t=-3.599, Pauscultation revealed substantial intra- and inter-rater reliability. Nasal auscultation exhibited important differences according to ear and respiratory clinical conditions. Nasal auscultation in pediatrics seems to be an original topic as well as a simple method that can be used to identify early signs of nasopharyngeal obstruction.

  14. Inter-rater reliability of shoulder measurements in middle-aged women.

    Science.gov (United States)

    De Groef, A; Van Kampen, M; Vervloesem, N; Clabau, E; Christiaens, M-R; Neven, P; Geraerts, I; Struyf, F; Devoogdt, N

    2017-06-01

    To investigate inter-rater reliability of a set of shoulder measurements including inclinometry [shoulder range of motion (ROM)], acromion-table distance and pectoralis minor muscle length (static scapular positioning), upward rotation with two inclinometers (scapular kinematics) and pain pressure thresholds (muscle tenderness) in middle-aged women. Observational study. Thirty symptom-free middle-aged women (first cohort) were measured by two raters. All measurements with an intraclass correlation coefficient (ICC) below 0.75 were retested after an additional training period in a second cohort of 30 symptom-free middle-aged women. Inter-rater reliability of all variables was measured with the ICC (95% confidence interval) and standard error of measurement (SEM). Acromion-table distance (ICC=0.91, SEM 0.22 to 0.28% of body length), pectoralis minor muscle length (ICC=0.91, SEM 0.16% of body length), pain pressure thresholds (ICC=0.78 to 0.85, SEM 0.39 to 0.70kg) and abduction ROM (ICC=0.77, SEM 5°) showed good to excellent inter-rater reliability in the first cohort. After an additional training period, forward flexion ROM showed good inter-rater reliability (ICC=0.83, SEM 5°), scapular upward rotation in resting position showed moderate reliability (ICC=0.52, SEM 2°), and other scaption angles showed weak reliability (ICC=0.26 to 0.43, SEM 3 to 8°). In a battery of clinical tools to evaluate factors contributing to shoulder pain, static scapular positioning and pressure pain thresholds were found to have good to excellent inter-rater reliability in middle-aged women. Additional training is recommended for measurements with a gravity inclinometer. Copyright © 2016 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.

  15. Inter-rater reliability of case-note audit: a systematic review.

    Science.gov (United States)

    Lilford, Richard; Edwards, Alex; Girling, Alan; Hofer, Timothy; Di Tanna, Gian Luca; Petty, Jane; Nicholl, Jon

    2007-07-01

    The quality of clinical care is often assessed by retrospective examination of case-notes (charts, medical records). Our objective was to determine the inter-rater reliability of case-note audit. We conducted a systematic review of the inter-rater reliability of case-note audit. Analysis was restricted to 26 papers reporting comparisons of two or three raters making independent judgements about the quality of care. Sixty-six separate comparisons were possible, since some papers reported more than one measurement of reliability. Mean kappa values ranged from 0.32 to 0.70. These may be inflated due to publication bias. Measured reliabilities were found to be higher for case-note reviews based on explicit, as opposed to implicit, criteria and for reviews that focused on outcome (including adverse effects) rather than process errors. We found an association between kappa and the prevalence of errors (poor quality care), suggesting alternatives such as tetrachoric and polychoric correlation coefficients be considered to assess inter-rater reliability. Comparative studies should take into account the relationship between kappa and the prevalence of the events being measured.

  16. Intra and inter-rater reliability study of pelvic floor muscle dynamometric measurements

    Directory of Open Access Journals (Sweden)

    Natalia M. Martinho

    2015-04-01

    Full Text Available OBJECTIVE: The aim of this study was to evaluate the intra and inter-rater reliability of pelvic floor muscle (PFM dynamometric measurements for maximum and average strengths, as well as endurance. METHOD: A convenience sample of 18 nulliparous women, without any urogynecological complaints, aged between 19 and 31 (mean age of 25.4±3.9 participated in this study. They were evaluated using a pelvic floor dynamometer based on load cell technology. The dynamometric evaluations were repeated in three successive sessions: two on the same day with a rest period of 30 minutes between them, and the third on the following day. All participants were evaluated twice in each session; first by examiner 1 followed by examiner 2. The vaginal dynamometry data were analyzed using three parameters: maximum strength, average strength, and endurance. The Intraclass Correlation Coefficient (ICC was applied to estimate the PFM dynamometric measurement reliability, considering a good level as being above 0.75. RESULTS: The intra and inter-raters' analyses showed good reliability for maximum strength (ICCintra-rater1=0.96, ICCintra-rater2=0.95, and ICCinter-rater=0.96, average strength (ICCintra-rater1=0.96, ICCintra-rater2=0.94, and ICCinter-rater=0.97, and endurance (ICCintra-rater1=0.88, ICCintra-rater2=0.86, and ICCinter-rater=0.92 dynamometric measurements. CONCLUSIONS: The PFM dynamometric measurements showed good intra- and inter-rater reliability for maximum strength, average strength and endurance, which demonstrates that this is a reliable device that can be used in clinical practice.

  17. Inter-rater and test-retest reliability of quality assessments by novice student raters using the Jadad and Newcastle-Ottawa Scales.

    Science.gov (United States)

    Oremus, Mark; Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

    2012-01-01

    Quality assessment of included studies is an important component of systematic reviews. The authors investigated inter-rater and test-retest reliability for quality assessments conducted by inexperienced student raters. Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle-Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13-20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. McMaster Integrative Neuroscience Discovery and Study Program. 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test-retest reliability using ICC(2,1). Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI -0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were -0.14 (95% CI -0.28 to 0.00) to 0.39 (95% CI -0.02 to 0.81) for the NOS cohort and -0.20 (95% CI -0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case-control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test-retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were -0.19 (95% CI -0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case-control, the ICC(2,1)s were 0.46 (95% CI -0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95). Inter-rater reliability was generally poor

  18. Inter-rater and intra-rater agreement of confocal microscopy imaging in diagnosing and subtyping basal cell carcinoma

    NARCIS (Netherlands)

    Kadouch, D. J.; van Haersma de With, A.; Elshot, Y. S.; Peppelman, M.; Bekkenk, M. W.; Wolkerstorfer, A.; Eekhout, I.; Prinsen, C. A. C.; de Rie, M. A.

    2017-01-01

    Reflectance confocal microscopy (RCM) imaging can be used to diagnose and subtype basal cell carcinoma (BCC) but relies on individual morphologic pattern recognition that might vary among users. We assessed the inter-rater and intra-rater agreement of RCM in correctly diagnosing and subtyping BCC.

  19. The Smile Esthetic Index (SEI): A method to measure the esthetics of the smile. An intra-rater and inter-rater agreement study.

    Science.gov (United States)

    Rotundo, Roberto; Nieri, Michele; Bonaccini, Daniele; Mori, Massimiliano; Lamberti, Elena; Massironi, Domenico; Giachetti, Luca; Franchi, Lorenzo; Venezia, Piero; Cavalcanti, Raffaele; Bondi, Elena; Farneti, Mauro; Pinchi, Vilma; Buti, Jacopo

    2015-01-01

    To propose a method to measure the esthetics of the smile and to report its validation by means of an intra-rater and inter-rater agreement analysis. Ten variables were chosen as determinants for the esthetics of a smile: smile line and facial midline, tooth alignment, tooth deformity, tooth dischromy, gingival dischromy, gingival recession, gingival excess, gingival scars and diastema/missing papillae. One examiner consecutively selected seventy smile pictures, which were in the frontal view. Ten examiners, with different levels of clinical experience and specialties, applied the proposed assessment method twice on the selected pictures, independently and blindly. Intraclass correlation coefficient (ICC) and Fleiss' kappa) statistics were performed to analyse the intra-rater and inter-rater agreement. Considering the cumulative assessment of the Smile Esthetic Index (SEI), the ICC value for the inter-rater agreement of the 10 examiners was 0.62 (95% CI: 0.51 to 0.72), representing a substantial agreement. Intra-rater agreement ranged from 0.86 to 0.99. Inter-rater agreement (Fleiss' kappa statistics) calculated for each variable ranged from 0.17 to 0.75. The SEI was a reproducible method, to assess the esthetic component of the smile, useful for the diagnostic phase and for setting appropriate treatment plans.

  20. Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability

    Science.gov (United States)

    Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca

    2018-01-01

    Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…

  1. Inter-rater and intra-rater agreement on the Nordic Orofacial Test--Screening examination in children, adolescents and young adults with cerebral palsy.

    Science.gov (United States)

    Edvinsson, Siv Elisabet; Lundqvist, Lars-Olov

    2014-02-01

    To evaluate inter-rater and intra-rater agreement on the Nordic Orofacial Test-Screening (NOT-S) examination applied to children, adolescents and young adults with cerebral palsy (CP). Using the NOT-S examination, two speech and language pathologists independently assessed video recordings of 48 subjects with CP aged 5-22 years and representing all CP sub-diagnoses and levels of gross motor function and manual ability. Thirty-one subjects were reassessed. Fifteen out of 17 items in the NOT-S examination domains (1) Face at rest, (2) Nose breathing, (3) Facial expression, (4) Masticatory muscle and jaw function, (5) Oral motor function and (6) Speech were rated using a 'yes' (dysfunction observed)/'no' format, generating an overall score of 0-6. Inter-rater agreement: Twelve out of 15 items and five out of six domains showed acceptable unweighted kappa values (κ = 0.46-1.00). The lowest kappa value was found for domain 4 (κ = -0.04), although it had high inter-rater agreement (92%). The linear weighted kappa value for the overall NOT-S examination score was 0.65 (95% CI = 0.49-0.82). Intra-rater agreement: All items and domains showed acceptable unweighted kappa values (items 0.58-1.00 and 0.59-1.00, domains 0.81-1.00 and 0.62-0.89) for both raters. The linear weighted kappa value for the overall NOT-S examination score was 0.81 (95% CI = 0.63-0.99) for rater A and 0.54 (95% CI = 0.25-0.82) for rater B. The NOT-S examination has acceptable inter-rater and intra-rater agreement when used in young individuals with CP.

  2. Inter-Rater Reliability of Cyclotorsion Measurements Using Fundus Photography.

    Science.gov (United States)

    Dysli, Muriel; Kanku, Madeleine; Traber, Ghislaine L

    2018-04-01

    The foveo-papillary angle (FPA) on fundus photographs is the accepted standard for the measurement of ocular cyclotorsion. We assessed the inter-rater reliability of this method in healthy subjects and in patients with trochlear nerve palsies. In this methodological study, fundus photographs of healthy subjects and of patients with trochlear nerve palsies were made with a fundus camera (Zeiss Fundus Camera FF 450 plus, Jena, Germany). Three independent observers measured the FPA on the fundus photographs of all subjects in synedra View (synedra View 16, Version 16.0.0.11, Innsbruck, Austria). One hundred and four eyes of 52 subjects (26 healthy controls and 26 patients) were assessed. The mean FPA of the healthy controls was 5.80 degrees (°) [± 0.44 standard error of the mean (SEM)] compared to 11.55° (± 0.80 SEM) for patients with trochlear nerve palsies. The inter-rater reliability of all measured FPAs showed an intraclass correlation coefficient (ICC) of 0.98 (95% CI 0.97 - 0.98). The inter-rater reliability of objective cyclotorsion measurements using fundus photographs was very high. Georg Thieme Verlag KG Stuttgart · New York.

  3. Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales

    Science.gov (United States)

    Oremus, Carolina; Hall, Geoffrey B C; McKinnon, Margaret C

    2012-01-01

    Introduction Quality assessment of included studies is an important component of systematic reviews. Objective The authors investigated inter-rater and test–retest reliability for quality assessments conducted by inexperienced student raters. Design Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle–Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13–20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. Setting McMaster Integrative Neuroscience Discovery and Study Program. Participants 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. Main outcome measures The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test–retest reliability using ICC(2,1). Results Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI −0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were −0.14 (95% CI −0.28 to 0.00) to 0.39 (95% CI −0.02 to 0.81) for the NOS cohort and −0.20 (95% CI −0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case–control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test–retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were −0.19 (95% CI −0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case–control, the ICC(2

  4. Unfolding the phenomenon of inter-rater agreement

    DEFF Research Database (Denmark)

    Slaug, Bjørn; Schilling, Oliver; Helle, Tina

    2011-01-01

    Objective: The overall objective was to unfold the phenomenon of inter-rater agreement: to identify potential sources of variation in agreement data and to explore how they can be statistically accounted for. The ultimate aim was to propose recommendations for in-depth examination of agreement, i...

  5. Measurement of the Inter-Rater Reliability Rate Is Mandatory for Improving the Quality of a Medical Database: Experience with the Paulista Lung Cancer Registry.

    Science.gov (United States)

    Lauricella, Leticia L; Costa, Priscila B; Salati, Michele; Pego-Fernandes, Paulo M; Terra, Ricardo M

    2018-06-01

    Database quality measurement should be considered a mandatory step to ensure an adequate level of confidence in data used for research and quality improvement. Several metrics have been described in the literature, but no standardized approach has been established. We aimed to describe a methodological approach applied to measure the quality and inter-rater reliability of a regional multicentric thoracic surgical database (Paulista Lung Cancer Registry). Data from the first 3 years of the Paulista Lung Cancer Registry underwent an audit process with 3 metrics: completeness, consistency, and inter-rater reliability. The first 2 methods were applied to the whole data set, and the last method was calculated using 100 cases randomized for direct auditing. Inter-rater reliability was evaluated using percentage of agreement between the data collector and auditor and through calculation of Cohen's κ and intraclass correlation. The overall completeness per section ranged from 0.88 to 1.00, and the overall consistency was 0.96. Inter-rater reliability showed many variables with high disagreement (>10%). For numerical variables, intraclass correlation was a better metric than inter-rater reliability. Cohen's κ showed that most variables had moderate to substantial agreement. The methodological approach applied to the Paulista Lung Cancer Registry showed that completeness and consistency metrics did not sufficiently reflect the real quality status of a database. The inter-rater reliability associated with κ and intraclass correlation was a better quality metric than completeness and consistency metrics because it could determine the reliability of specific variables used in research or benchmark reports. This report can be a paradigm for future studies of data quality measurement. Copyright © 2018 American College of Surgeons. Published by Elsevier Inc. All rights reserved.

  6. Impact of educational intervention on the inter-rater agreement of nasal endoscopy interpretation

    Science.gov (United States)

    Colley, Patrick; Mace, Jess C.; Schaberg, Madeleine R.; Smith, Timothy L.; Tabaee, Abtin

    2015-01-01

    OBJECTIVE Nasal endoscopy is integral to the evaluation of sinonasal disorders. However, prior studies have shown significant variability in the inter-rater agreement of nasal endoscopy interpretation amongst practicing rhinologists. The objective of the current study is to evaluate the inter-rater agreement of nasal endoscopy amongst otolaryngology residents from a single training program at baseline and following an educational intervention. METHODS 11 otolaryngology residents completed nasal endoscopy grading forms for 8 digitally recorded nasal endoscopic examinations. An instructional lecture reviewing nasal endoscopy interpretation was subsequently provided. The residents then completed grading forms for 8 different nasal endoscopic examinations. Inter-rate agreement amongst residents for the pre- and post-lecture videos was calculated using the unweighted Fleiss’ kappa statistic (Kf) and intra-class correlation agreement (ICC). RESULTS Inter-rater agreement improved from a baseline level of fair (Kf range 0.268–0.383) to a post-educational level of moderate (Kf range 0.401–0.547) for nasal endoscopy findings of middle meatus mucosa, middle turbinate mucosa, middle meatus discharge, sphenoethmoid recess mucosa, sphenoethmoid recess discharge and atypical lesions (ICC, pendoscopy interpretation amongst otolaryngology residents. The inter-rater agreement for the majority of the characteristics that were evaluated improved after educational intervention. Further study is needed to improve nasal endoscopy interpretation. PMID:25781864

  7. Inter-rater reliability of diagnostic criteria for sacroiliac joint-, disc- and facet joint pain.

    Science.gov (United States)

    van Tilburg, Cornelis W J; Groeneweg, Johannes G; Stronks, Dirk L; Huygen, Frank J P M

    2017-01-01

    Several diagnostic criteria sets are described in the literature to identify low back pain subtypes, but very little is known about the inter-rater reliability of these criteria. We conducted a study to determine the reliability of diagnostic tests that point towards SI joint-, disc- or facet joint pain. Inter-rater reliability study alongside three randomized clinical trials. Multidisciplinary pain center of general hospital. Patients aged 18 or more with medical history and physical examination suggestive of sacroiliac joint-, disc- and facet joint pain on lumbar level. Making use of nowadays most common used diagnostic criteria, a physical examination is taken independently by three physicians (two pain physicians and one orthopedic surgeon). Inter-rater reliability (Kappa (κ) measure of agreement) and significance (p) between raters are presented. Strengths of agreement, indicated with κ values above 0,20, are presented in order of agreement. One hundred patients were included. None of the parameters from the physical investigation had κ values of more than 0.21 (fair) in all pairs of raters. Between two raters (C and D), there was an almost perfect agreement on three parameters, more specifically ``Abnormal sensory and motor examination, hyperactive or diminished reflexes'', ``Sitting exam shows no reflex, motor or sensory signs in the legs'' and ``Straight leg raising (Laségue) negative between 30 and 70 degrees of flexion''. The ``Drop test positive'' parameters had moderate strength of agreement between raters A and D and fair strength between raters A and B. The ``Digital interspinous pressure test positive'' had moderate strength of agreement between raters C and D and fair strength of agreement between raters A and B as well as raters B and C. Three other parameters had a fair strength of agreement between two raters, all other parameters had a slight or poor strength of agreement. Inter-rater reliability, confidence intervals and significance of

  8. Inter-rater reliability of measures to characterize the tobacco retail environment in Mexico

    Directory of Open Access Journals (Sweden)

    Marissa G Hall

    2015-11-01

    Full Text Available Objective. To evaluate the inter-rater reliability of a data collection instrument to assess the tobacco retail environ- ment in Mexico, after major marketing regulations were implemented. Materials and methods. In 2013, two data collectors independently evaluated 21 stores in two census tracts, through a data collection instrument that assessed the presence of price promotions, whether single cigarettes were sold, the number of visible advertisements, the pre- sence of signage prohibiting the sale of cigarettes to minors, and characteristics of cigarette pack displays. We evaluated the inter-rater reliability of the collected data, through the calculation of metrics such as intraclass correlation coefficient, percent agreement, Cohen’s kappa and Krippendorff’s alpha. Results. Most measures demonstrated substantial or perfect inter-rater reliability. Conclusions. Our results indicate the potential utility of the data collection instrument for future point-of-sale research.

  9. Intra and Inter-Rater Reliability of Screening for Movement Impairments: Movement Control Tests from The Foundation Matrix

    Science.gov (United States)

    Mischiati, Carolina R.; Comerford, Mark; Gosford, Emma; Swart, Jacqueline; Ewings, Sean; Botha, Nadine; Stokes, Maria; Mottram, Sarah L.

    2015-01-01

    Pre-season screening is well established within the sporting arena, and aims to enhance performance and reduce injury risk. With the increasing need to identify potential injury with greater accuracy, a new risk assessment process has been produced; The Performance Matrix (battery of movement control tests). As with any new method of objective testing, it is fundamental to establish whether the same results can be reproduced between examiners and by the same examiner on consecutive occasions. This study aimed to determine the intra-rater test re-test and inter-rater reliability of tests from a component of The Performance Matrix, The Foundation Matrix. Twenty participants were screened by two experienced musculoskeletal therapists using nine tests to assess the ability to control movement during specific tasks. Movement evaluation criteria for each test were rated as pass or fail. The therapists observed participants real-time and tests were recorded on video to enable repeated ratings four months later to examine intra-rater reliability (videos rated two weeks apart). Overall test percentage agreement was 87% for inter-rater reliability; 98% Rater 1, 94% Rater 2 for test re-test reliability; and 75% for real-time versus video. Intraclass-correlation coefficients (ICCs) were excellent between raters (0.81) and within raters (Rater 1, 0.96; Rater 2, 0.88) but poor for real-time versus video (0.23). Reliability for individual components of each test was more variable: inter-rater, 68-100%; intra-rater, 88-100% Rater 1, 75-100% Rater 2; and real-time versus video 31-100%. Cohen’s Kappa values for inter-rater reliability were 0.0-1.0; intra-rater 0.6-1.0 for Rater 1; -0.1-1.0 for Rater 2; and -0.1-1 for real-time versus video. It is concluded that both inter and intra-rater reliability of tests in The Foundation Matrix are acceptable when rated by experienced therapists. Recommendations are made for modifying some of the criteria to improve reliability where

  10. Inter-Rater Reliability and Validity of the Australian Football League’s Kicking and Handball Tests

    Science.gov (United States)

    Cripps, Ashley J.; Hopper, Luke S.; Joyce, Christopher

    2015-01-01

    Talent identification tests used at the Australian Football League’s National Draft Combine assess the capacities of athletes to compete at a professional level. Tests created for the National Draft Combine are also commonly used for talent identification and athlete development in development pathways. The skills tests created by the Australian Football League required players to either handball (striking the ball with the hand) or kick to a series of 6 randomly generated targets. Assessors subjectively rate each skill execution giving a 0-5 score for each disposal. This study aimed to investigate the inter-rater reliability and validity of the skills tests at an adolescent sub-elite level. Male Australian footballers were recruited from sub-elite adolescent teams (n = 121, age = 15.7 ± 0.3 years, height = 1.77 ± 0.07 m, mass = 69.17 ± 8.08 kg). The coaches (n = 7) of each team were also recruited. Inter-rater reliability was assessed using Inter-class correlations (ICC) and Limits of Agreement statistics. Both the kicking (ICC = 0.96, p handball tests (ICC = 0.89, p handball test. Key points The skill tests created by the AFL demonstrated acceptable levels of relative and absolute inter-rater reliability. Both the AFL’s skills tests are able to differentiate between athletes dominant and non-dominant limbs. However, only the kicking test could consistently differentiated between score outcomes over a range of Australian Football specific disposal distances. Both tests demonstrated poor concurrent validity, with no correlation found between coaches’ perceptions of technical skills and actual skill outcomes measured. PMID:26336356

  11. Inter-rater agreement in visual assessment of footpad dermatitis in Danish broiler chickens

    DEFF Research Database (Denmark)

    Oliveira, A.R.S.; Lund, Vibe Pedersen; Christensen, Jens Peter

    2017-01-01

    1. The performance of the scoring in the Danish footpad dermatitis (FPD) surveillance system was evaluated by determining inter-rater agreement in visual inspection of FPD in broilers between two independent raters (R1 and R2) and the official scoring at a Danish slaughterhouse. 2. FPD scores were...

  12. Systematic reviews need to consider applicability to disadvantaged populations: inter-rater agreement for a health equity plausibility algorithm.

    Science.gov (United States)

    Welch, Vivian; Brand, Kevin; Kristjansson, Elizabeth; Smylie, Janet; Wells, George; Tugwell, Peter

    2012-12-19

    Systematic reviews have been challenged to consider effects on disadvantaged groups. A priori specification of subgroup analyses is recommended to increase the credibility of these analyses. This study aimed to develop and assess inter-rater agreement for an algorithm for systematic review authors to predict whether differences in effect measures are likely for disadvantaged populations relative to advantaged populations (only relative effect measures were addressed). A health equity plausibility algorithm was developed using clinimetric methods with three items based on literature review, key informant interviews and methodology studies. The three items dealt with the plausibility of differences in relative effects across sex or socioeconomic status (SES) due to: 1) patient characteristics; 2) intervention delivery (i.e., implementation); and 3) comparators. Thirty-five respondents (consisting of clinicians, methodologists and research users) assessed the likelihood of differences across sex and SES for ten systematic reviews with these questions. We assessed inter-rater reliability using Fleiss multi-rater kappa. The proportion agreement was 66% for patient characteristics (95% confidence interval: 61%-71%), 67% for intervention delivery (95% confidence interval: 62% to 72%) and 55% for the comparator (95% confidence interval: 50% to 60%). Inter-rater kappa, assessed with Fleiss kappa, ranged from 0 to 0.199, representing very low agreement beyond chance. Users of systematic reviews rated that important differences in relative effects across sex and socioeconomic status were plausible for a range of individual and population-level interventions. However, there was very low inter-rater agreement for these assessments. There is an unmet need for discussion of plausibility of differential effects in systematic reviews. Increased consideration of external validity and applicability to different populations and settings is warranted in systematic reviews to meet this

  13. Intra-Rater, Inter-Rater and Test-Retest Reliability of an Instrumented Timed Up and Go (iTUG Test in Patients with Parkinson's Disease.

    Directory of Open Access Journals (Sweden)

    Rob C van Lummel

    Full Text Available The "Timed Up and Go" (TUG is a widely used measure of physical functioning in older people and in neurological populations, including Parkinson's Disease. When using an inertial sensor measurement system (instrumented TUG [iTUG], the individual components of the iTUG and the trunk kinematics can be measured separately, which may provide relevant additional information.The aim of this study was to determine intra-rater, inter-rater and test-retest reliability of the iTUG in patients with Parkinson's Disease.Twenty eight PD patients, aged 50 years or older, were included. For the iTUG the DynaPort Hybrid (McRoberts, The Hague, The Netherlands was worn at the lower back. The device measured acceleration and angular velocity in three directions at a rate of 100 samples/s. Patients performed the iTUG five times on two consecutive days. Repeated measurements by the same rater on the same day were used to calculate intra-rater reliability. Repeated measurements by different raters on the same day were used to calculate intra-rater and inter-rater reliability. Repeated measurements by the same rater on different days were used to calculate test-retest reliability.Nineteen ICC values (15% were ≥ 0.9 which is considered as excellent reliability. Sixty four ICC values (49% were ≥ 0.70 and < 0.90 which is considered as good reliability. Thirty one ICC values (24% were ≥ 0.50 and < 0.70, indicating moderate reliability. Sixteen ICC values (12% were ≥ 0.30 and < 0.50 indicating poor reliability. Two ICT values (2% were < 0.30 indicating very poor reliability.In conclusion, in patients with Parkinson's disease the intra-rater, inter-rater, and test-retest reliability of the individual components of the instrumented TUG (iTUG was excellent to good for total duration and for turning durations, and good to low for the sub durations and for the kinematics of the SiSt and StSi. The results of this fully automated analysis of instrumented TUG movements

  14. Feasibility and Inter-Rater Reliability of Physical Performance Measures in Acutely Admitted Older Medical Patients

    DEFF Research Database (Denmark)

    Bodilsen, Ann Christine; Juul-Larsen, Helle Gybel; Petersen, Janne

    2015-01-01

    OBJECTIVE: Physical performance measures can be used to predict functional decline and increased dependency in older persons. However, few studies have assessed the feasibility or reliability of such measures in hospitalized older patients. Here we assessed the feasibility and inter-rater reliabi......OBJECTIVE: Physical performance measures can be used to predict functional decline and increased dependency in older persons. However, few studies have assessed the feasibility or reliability of such measures in hospitalized older patients. Here we assessed the feasibility and inter......-rater reliability of four simple measures of physical performance in acutely admitted older medical patients. DESIGN: During the first 24 hours of hospitalization, the following were assessed twice by different raters in 52 (≥ 65 years) patients admitted for acute medical illness: isometric hand grip strength, 4......, and 30-s chair stand were 8%, 7%, and 18%, and the SRD95% values were 22%, 17%, and 49%. CONCLUSION: In acutely admitted older medical patients, grip strength, gait speed, and the Cumulated Ambulation Score measurements were feasible and showed high inter-rater reliability when administered by different...

  15. Ocular Motor Score (OMS): a clinical tool to evaluating ocular motor functions in children. Intrarater and inter-rater agreement.

    Science.gov (United States)

    Olsson, Monica; Teär Fahnehjelm, Kristina; Rydberg, Agneta; Ygge, Jan

    2015-08-01

    Ocular motor score (OMS) is a new clinical test protocol for evaluating ocular motor functions in children and young adults. OMS is a set of 15 important and relevant non-invasive ocular motor function parameters derived from clinical practice. The aim of the study was to evaluate OMS according to intrarater and inter-rater agreement. Forty children aged 4-10 years, 23 girls median age 6.5 (range 4.3-9.3) and 17 boys median age 5.8 (range 4.1-9.8) were included. The ocular motor functions were assessed and scored according to the OMS protocol. The examinations were videotaped. To obtain the intrarater agreement, the first author examined and scored the children twice, first in the clinic and 2 weeks later by watching the videotape. To obtain the inter-rater agreement, three other raters independently scored the ocular motor function of the children by watching the videotapes. The overall observed intrarater agreement was 88%, and the observed inter-rater agreement between the three raters was 80%. For none of the subtests was there an observed intrarater agreement lower than 65%. Three of the subtests had an observed inter-rater agreement of 65% or below. Overall there was high observed intra- and inter-rater agreement for the OMS test protocol. Subtests such as saccades and smooth pursuit were more difficult for raters to score similarly according the clinical OMS test protocol. © 2015 Acta Ophthalmologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.

  16. Inter-rater reliability of three standardized functional tests in patients with low back pain

    Science.gov (United States)

    Tidstrand, Johan; Horneij, Eva

    2009-01-01

    Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs). Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ) and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0), for sitting on a Bobath ball good (κ: 0.79) and very good (κ: 0.88) and for the unilateral pelvic lift: good (κ: 0.61) and moderate (κ: 0.47). Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their ability to evaluate lumbar

  17. Inter-rater reliability of three standardized functional tests in patients with low back pain

    Directory of Open Access Journals (Sweden)

    Tidstrand Johan

    2009-06-01

    Full Text Available Abstract Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs. Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0, for sitting on a Bobath ball good (κ: 0.79 and very good (κ: 0.88 and for the unilateral pelvic lift: good (κ: 0.61 and moderate (κ: 0.47. Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their

  18. Inter- and intra-rater reproducibility of semiautomatic determination of volume parameters in cardiac magnetic resonance imaging

    International Nuclear Information System (INIS)

    Trieb, Thomas; Glodny, Bernhard; Scheiblhofer, Martin; Wolf, Christian; Metzler, Bernhard; Pachinger, Otmar; Jaschke, Werner R.; Schocke, Michael F.H.

    2008-01-01

    Purpose: The purpose of this study was to evaluate inter- and intra-rater reproducibility in volume assessment using cardiac magnetic resonance imaging (CMRI). Methods: Twenty-five healthy volunteers and 106 patients were included into this retrospective study and received CMRI. The patients were divided in three groups (group I, 80 patients with arrhythmia; group II, 20 patients with cardiomyopathy; group III, 6 patients after correction of septum defects). Therefore, the images were semiautomatically segmented by an experienced and an unexperienced radiologists. The analysis of end-diastolic volume (EDV), end-systolic volume (ESV) and stroke volume (SV) as well as ejection fraction (EF) and myocardial mass (MM) were performed twice by an experienced and an unexperienced radiologists. The intra-class correlation coefficients (ICC) were determined for the evaluation of inter- and intra-rater variance. Results: The intra-rater reproducibility for determination of EF, ESV, EDV and MM was excellent with ICCs ranging from 0.88 to 0.99 (all p < 0.001). The inter-observer reproducibility for these parameters was also excellent with ICCs ranging from 0.91 to 0.98 (all p < 0.001). The assessment of the SV showed an excellent intra-rater agreement with ICCs of 0.96 and 0.92 (both p < 0.001), but only a moderate ICC for the inter-rater reproducibility (0.54, p < 0.001). Conclusions: Our study shows that assessment of cardiac volumes can be performed on CMRIs with an excellent reproducibility by both experienced and unexperienced investigators

  19. THE INTRA- AND INTER-RATER RELIABILITY OF THE SOCCER INJURY MOVEMENT SCREEN (SIMS).

    Science.gov (United States)

    McCunn, Robert; Aus der Fünten, Karen; Govus, Andrew; Julian, Ross; Schimpchen, Jan; Meyer, Tim

    2017-02-01

    The growing volume of movement screening research reveals a belief among practitioners and researchers alike that movement quality may have an association with injury risk. However, existing movement screening tools have not considered the sport-specific movement and injury patterns relevant to soccer. The present study introduces the Soccer Injury Movement Screen (SIMS), which has been designed specifically for use within soccer. Furthermore, the purpose of the present study was to assess the intra- and inter-rater reliability of the SIMS and determine its suitability for use in further research. The study utilized a test-retest design to discern reliablility. Twenty-five (11 males, 14 females) healthy, recreationally active university students (age 25.5 ± 4.0 years, height 171 ± 9 cm, weight 64.7 ± 12.6 kg) agreed to participate. The SIMS contains five sub-tests: the anterior reach, single-leg deadlift, in-line lunge, single-leg hop for distance and tuck jump. Each movement was scored out of 10 points and summed to produce a composite score out of 50. The anterior reach and single-leg hop for distance were scored in real-time while the remaining tests were filmed and scored retrospectively. Three raters conducted the SIMS with each participant on three occasions separated by an average of three and a half days (minimum one day, maximum seven days). Rater 1 re-scored the filmed movements for all participants on all occasions six months later to establish the 'pure' intra-rater (intra-occasion) reliability for those movements. Intraclass correlation coefficient (ICC) values for intra- and inter-rater composite score reliability ranged from 0.66-0.72 and 0.79-0.86 respectively. Weighted kappa values representing the intra- and inter-rater reliability of the individual sub-tests ranged from 0.35-0.91 indicating fair to almost perfect agreement. Establishing the reliability of the SIMS is a prerequisite for further research seeking to investigate

  20. Inter-rater reliability of AMSTAR is dependent on the pair of reviewers.

    Science.gov (United States)

    Pieper, Dawid; Jacobs, Anja; Weikert, Beate; Fishta, Alba; Wegewitz, Uta

    2017-07-11

    Inter-rater reliability (IRR) is mainly assessed based on only two reviewers of unknown expertise. The aim of this paper is to examine differences in the IRR of the Assessment of Multiple Systematic Reviews (AMSTAR) and R(evised)-AMSTAR depending on the pair of reviewers. Five reviewers independently applied AMSTAR and R-AMSTAR to 16 systematic reviews (eight Cochrane reviews and eight non-Cochrane reviews) from the field of occupational health. Responses were dichotomized and reliability measures were calculated by applying Holsti's method (r) and Cohen's kappa (κ) to all potential pairs of reviewers. Given that five reviewers participated in the study, there were ten possible pairs of reviewers. Inter-rater reliability varied for AMSTAR between r = 0.82 and r = 0.98 (median r = 0.88) using Holsti's method and κ = 0.41 and κ = 0.69 (median κ = 0.52) using Cohen's kappa and for R-AMSTAR between r = 0.77 and r = 0.89 (median r = 0.82) and κ = 0.32 and κ = 0.67 (median κ = 0.45) depending on the pair of reviewers. The same pair of reviewers yielded the highest IRR for both instruments. Pairwise Cohen's kappa reliability measures showed a moderate correlation between AMSTAR and R-AMSTAR (Spearman's ρ =0.50). The mean inter-rater reliability for AMSTAR was highest for item 1 (κ = 1.00) and item 5 (κ = 0.78), while lowest values were found for items 3, 8, 9 and 11, which showed only fair agreement. Inter-rater reliability varies widely depending on the pair of reviewers. There may be some shortcomings associated with conducting reliability studies with only two reviewers. Further studies should include additional reviewers and should probably also take account of their level of expertise.

  1. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

    Directory of Open Access Journals (Sweden)

    Kevin A. Hallgren

    2012-02-01

    Full Text Available Many research designs require the assessment of inter-rater reliability (IRR to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR.

  2. High inter-rater reliability, agreement, and convergent validity of Constant score in patients with clavicle fractures

    DEFF Research Database (Denmark)

    Ban, Ilija; Troelsen, Anders; Kristensen, Morten Tange

    2016-01-01

    BACKGROUND: The Constant score (CS) has been the primary endpoint in most studies on clavicle fractures. However, the CS was not developed to assess patients with clavicle fractures. Our aim was to examine inter-rater reliability and agreement of the CS in patients with clavicle fractures...... standardized CS assessment at a mean of 6.8 weeks (SD, 1.0 weeks) after injury. Reliability and agreement of the CS were determined by 2 raters. The interclass correlation coefficient (ICC2,1), standard error of measurement, minimal detectable change, Cronbach α coefficient, and Pearson correlation coefficient...... were estimated. RESULTS: Inter-rater reliability of the total CS was excellent (interclass correlation coefficient, 0.94; 95% confidence interval, 0.88-0.97), with no systematic difference between the 2 raters (P = .75). The standard error of measurement (measurement error at the group level) was 4...

  3. SU-E-T-511: Inter-Rater Variability in Classification of Incidents in a New Incident Reporting System

    International Nuclear Information System (INIS)

    Pappas, D; Reis, S; Ali, A; Kapur, A

    2015-01-01

    Purpose To determine how consistent the results of different raters are when reviewing the same cases within the Radiation Oncology Incident Learning System (ROILS). Methods Three second-year medical physics graduate students filled out incident reports in spreadsheets set up to mimic ROILS. All students studied the same 33 cases and independently entered their assessments, for a total of 99 reviewed cases. The narratives for these cases were obtained from a published International Commission on Radiological Protection (ICRP) report which included shorter narratives selected from the Radiation Oncology Safety Information System (ROSIS) database. Each category of questions was reviewed to see how consistent the results were by utilizing free-marginal multirater kappa analysis. The percentage of cases where all raters shared full agreement or full disagreement was recorded to show which questions were answered consistently by multiple raters for a given case. The consistency among the raters was analyzed between ICRP and ROSIS cases to see if either group led to more reliable results. Results The categories where all raters agreed 100 percent in their choices were the event type (93.94 percent of cases 0.946 kappa) and the likelihood of the event being harmful to the patient (42.42 percent of cases 0.409 kappa). The categories where all raters disagreed 100 percent in their choices were the dosimetric severity scale (39.39 percent of cases 0.139 kappa) and the potential future toxicity (48.48 percent of cases 0.205 kappa). ROSIS had more cases where all raters disagreed than ICRP (23.06 percent of cases compared to 15.58 percent, respectively). Conclusion Despite reviewing the same cases, the results among the three raters was widespread. ROSIS narratives were shorter than ICRP, which suggests that longer narratives lead to more consistent results. This study shows that the incident reporting system can be optimized to yield more consistent results

  4. SU-E-T-511: Inter-Rater Variability in Classification of Incidents in a New Incident Reporting System

    Energy Technology Data Exchange (ETDEWEB)

    Pappas, D; Reis, S; Ali, A [Hofstra University, Hempstead, NY (United States); Kapur, A [Long Island Jewish Medical Center, New Hyde Park, NY (United States)

    2015-06-15

    Purpose To determine how consistent the results of different raters are when reviewing the same cases within the Radiation Oncology Incident Learning System (ROILS). Methods Three second-year medical physics graduate students filled out incident reports in spreadsheets set up to mimic ROILS. All students studied the same 33 cases and independently entered their assessments, for a total of 99 reviewed cases. The narratives for these cases were obtained from a published International Commission on Radiological Protection (ICRP) report which included shorter narratives selected from the Radiation Oncology Safety Information System (ROSIS) database. Each category of questions was reviewed to see how consistent the results were by utilizing free-marginal multirater kappa analysis. The percentage of cases where all raters shared full agreement or full disagreement was recorded to show which questions were answered consistently by multiple raters for a given case. The consistency among the raters was analyzed between ICRP and ROSIS cases to see if either group led to more reliable results. Results The categories where all raters agreed 100 percent in their choices were the event type (93.94 percent of cases 0.946 kappa) and the likelihood of the event being harmful to the patient (42.42 percent of cases 0.409 kappa). The categories where all raters disagreed 100 percent in their choices were the dosimetric severity scale (39.39 percent of cases 0.139 kappa) and the potential future toxicity (48.48 percent of cases 0.205 kappa). ROSIS had more cases where all raters disagreed than ICRP (23.06 percent of cases compared to 15.58 percent, respectively). Conclusion Despite reviewing the same cases, the results among the three raters was widespread. ROSIS narratives were shorter than ICRP, which suggests that longer narratives lead to more consistent results. This study shows that the incident reporting system can be optimized to yield more consistent results.

  5. Grant Peer Review: Improving Inter-Rater Reliability with Training.

    Science.gov (United States)

    Sattler, David N; McKnight, Patrick E; Naney, Linda; Mathis, Randy

    2015-01-01

    This study developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-rater reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers--especially those with experience--have good understanding of the grant review rating scale. The findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. The results underscore the benefits of and need for specialized peer reviewer training.

  6. Intra-rater and inter-rater reliability of the standardized ultrasound protocol for assessing subacromial structures

    DEFF Research Database (Denmark)

    Hougs Kjær, Birgitte; Ellegaard, Karen; Wieland, Ina

    2017-01-01

    BACKGROUND: US-examinations related to shoulder impingement (SI) often vary due to methodological differences, examiner positions, transducers, and recording parameters. Reliable US protocols for examination of different structures related to shoulder impingement are therefore needed. OBJECTIVES...... of the supraspinatus tendon (SUPRA) and subacromial subdeltoid (SASD) bursa in two imaging positions, and the acromial humeral distance (AHD) in one position. Additionally, agreement on dynamic impingement (DI) examination was performed. The intra- and inter-rater reliability was carried out on the same day...

  7. Inter-Rater Agreement of Auscultation, Palpable Fremitus, and Ventilator Waveform Sawtooth Patterns Between Clinicians.

    Science.gov (United States)

    Berry, Marc P; Martí, Joan-Daniel; Ntoumenopoulos, George

    2016-10-01

    Clinicians often use numerous bedside assessments for secretion retention in participants who are receiving invasive mechanical ventilation. This study aimed to evaluate inter-rater agreement between clinicians when using standard clinical assessments of secretion retention and whether differences in clinician experience influenced inter-rater agreement. Seventy-one mechanically ventilated participants were assessed by a research clinician and by one of 13 ICU clinicians. Each clinician conducted a standardized assessment of lung auscultation, palpation for chest-wall (rhonchal) fremitus, and ventilator inspiratory/expiratory flow-time waveforms for the sawtooth pattern. On the presence of breath sounds, agreement ranged from absolute to moderate in the upper zones and the lower zones, respectively. Kappa values for abnormal and adventitious lung sounds achieved moderate agreement in the upper zones, less than chance agreement to substantial agreement in the middle zones, and moderate agreement to almost perfect agreement in the lower zones. Moderate to almost perfect agreement was established for palpable fremitus in the upper zones, moderate to substantial agreement in the middle zones, and less than chance to moderate agreement in the lower zones. Inter-rater agreement on the presence of expiratory sawtooth pattern identification showed moderate agreement. The level of percentage agreement between the research and ICU clinicians for each respiratory assessment studied did not relate directly to level of clinical experience. Inter-rater agreement for all assessments showed variability between lung regions but maintained reasonable percentage agreement in mechanically ventilated participants. The level of percentage agreement achieved between clinicians did not directly relate to clinical experience for all respiratory assessments. Therefore, these respiratory assessments should not necessarily be viewed in isolation but interpreted within the context of a full

  8. Inter-rater reliability of three musculoskeletal physical examination techniques used to assess motion in three planes while standing.

    Science.gov (United States)

    Prather, Heidi; Hunt, Devyani; Steger-May, Karen; Hayes, Marcie Harris; Knaus, Evan; Clohisy, John

    2009-07-01

    .81-0.95), respectively. The inter-rater reliability for 3 basic maneuvers of the Total Body Functional Profile is good among musculoskeletal health care providers of different disciplines. These 3 maneuvers may be used consistently as part of the musculoskeletal physical examination.

  9. [Inter-rater concordance of the "Nursing Activities Score" in intensive care].

    Science.gov (United States)

    Valls-Matarín, Josefa; Salamero-Amorós, Maria; Roldán-Gil, Carmen; Quintana-Riera, Salvador

    2015-01-01

    To evaluate inter-rater concordance in the valuation of the "Nursing Activities Score". Cross-sectional descriptive study conducted from December 2012 until June 2013 in a general intensive care unit with twelve beds. Three evaluator nurses, simultaneously and independently, through the patient daily charts, scored the nursing workload using Nursing Activities Score scale in all patients admitted over 18 years old. Three hundreds and thirty-nine records were collected. The intra-class correlation coefficient (ICC) between evaluators was 0.92 (0.89-0.94). A perfect concordance was obtained in 39.1% of the items, with 52.2% having a high, and 8.7% having lower concordance, corresponding to two of the items with multiple scoring options. Significant differences between two of the evaluators (P=.049) were found. Although the inter-rater concordance was high, more accurate records are needed to reduce the variability of the items with multiple options and to allow more accuracy in the interpretation and measurement of the data regarding nursing workload. Copyright © 2015 Elsevier España, S.L.U. All rights reserved.

  10. Development and inter-rater reliability of a standardized verbal instruction manual for the Chinese Geriatric Depression Scale-short form.

    Science.gov (United States)

    Wong, M T P; Ho, T P; Ho, M Y; Yu, C S; Wong, Y H; Lee, S Y

    2002-05-01

    The Geriatric Depression Scale (GDS) is a common screening tool for elderly depression in Hong Kong. This study aimed at (1) developing a standardized manual for the verbal administration and scoring of the GDS-SF, and (2) comparing the inter-rater reliability between the standardized and non-standardized verbal administration of GDS-SF. Two studies were reported. In Study 1, the process of developing the manual was described. In Study 2, we compared the inter-rater reliabilities of GDS-SF scores using the standardized verbal instructions and the traditional non-standardized administration. Results of Study 2 indicated that the standardized procedure in verbal administration and scoring improved the inter-rater reliabilities of GDS-SF. Copyright 2002 John Wiley & Sons, Ltd.

  11. Inter-rater reliability in the classification of supraspinatus tendon tears using 3D ultrasound – a question of experience?

    Directory of Open Access Journals (Sweden)

    Giorgio Tamborrini

    2016-09-01

    Full Text Available Background: Three-dimensional (3D ultrasound of the shoulder is characterized by a comparable accuracy to two-dimensional (2D ultrasound. No studies investigating 2D versus 3D inter-rater reliability in the detection of supraspinatus tendon tears taking into account the level of experience of the raters have been carried out so far. Objectives: The aim of this study was to determine the inter-rater reliability in the analysis of 3D ultrasound image sets of the supraspinatus tendon between sonographer with different levels of experience. Patients and methods: Non-interventional, prospective, observational pilot study of 2309 images of 127 adult patients suffering from unilateral shoulder pain. 3D ultrasound image sets were scored by three raters independently. The intra-and interrater reliabilities were calculated. Results: There was an excellent intra-rater reliability of rater A in the overall classification of supraspinatus tendon tears (2D vs 3D κ = 0.892, pairwise reliability 93.81%, 3D scoring round 1 vs 3D scoring round 2 κ = 0.875, pairwise reliability 92.857%. The inter-rater reliability was only moderate compared to rater B on 3D (κ = 0.497, pairwise reliability 70.95% and fair compared to rater C (κ = 0.238, pairwise reliability 42.38%. Conclusions: The reliability of 3D ultrasound of the supraspinatus tendon depends on the level of experience of the sonographer. Experience in 2D ultrasound does not seem to be sufficient for the analysis of 3D ultrasound imaging sets. Therefore, for a 3D ultrasound analysis new diagnostic criteria have to be established and taught even to experienced 2D sonographers to improve reproducibility.

  12. Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department

    Directory of Open Access Journals (Sweden)

    Paul Walsh

    2014-11-01

    Full Text Available Objectives. To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so.Study Design and Setting. We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial ‘gestalt’ assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other’s assessment. Our primary analysis was graphical. We also calculated Cohen’s κ, Gwet’s agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement.Results. We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9–14.6, 99/159 (62% were boys and 22/159 (14% were admitted. Overall 118/159 (74% and 119/159 (75% were classified as well appearing on initial ‘gestalt’ impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet’s AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of ‘not ill appearing’ were more reliable than others.Conclusion. The inter-rater reliability of emergency providers’ assessment of overall clinical appearance was adequate when described graphically and by Gwet’s AC. Different summary statistics yield different results for the same dataset.

  13. Intra- and inter-rater reliability of the Knee Society Knee Score when used by two physiotherapists in patients post total knee arthroplasty

    Directory of Open Access Journals (Sweden)

    S. Gopal

    2010-01-01

    Full Text Available Background and Purpose: It has yet to be shown whether routine physiotherapy plays a role in the rehabilitation of patients post totalknee arthroplasty (Rajan et al 2004. Physiotherapists should be using validoutcome measures to provide evidence of the benefit of their intervention. The aim of this study was to establish the intra and inter-rater reliability of the Knee Society Knee Score, a scoring system developed by Insall et al(1989. The Knee Society Knee Score can be used to assess the integrity of theknee joint of patients undergoing total knee arthroplasty. Since the scoreinvolves clinical testing, the intra-rater reliability of the clinician should be established prior to using the scores as datain clinical research. W here multiple clinicians are involved, inter-rater reliability should also be established.Design: This was a correlation study.Subjects: A  sample of thirty patients post total knee arthroplasty attending the arthroplasty clinic at Johannesburg Hospital between six weeks and twelve months postoperatively.M ethod: Recruited patients were evaluated twice with a time interval of one hour between each assessment. Statistical A nalysis: The intra- and inter-rater reliability were estimated using Intraclass Correlation Coefficient (ICC. R esults: The intra-rater reliability showed excellent reliability (h= 0.95 for Examiner A  and good reliability (h= 0.71for Examiner B. The inter-rater reliability showed moderate reliability (h= 0.67 during test one and h= 0.66 during test two.Conclusion: The KSKS has good intra-rater reliability when tested within a period of one hour. The KSKS demonstrated moderate agreement for inter rater reliability.

  14. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age

    NARCIS (Netherlands)

    van Daalen, Emma; Kemner, Chantal; Dietz, Claudine; Swinkels, Sophie H. N.; Buitelaar, Jan K.; van Engeland, Herman

    2009-01-01

    To examine the inter-rater reliability and stability of autism spectrum disorder (ASD) diagnoses made at a very early age in children identified through a screening procedure around 14 months of age. In a prospective design, preschoolers were recruited from a screening study for ASD. The inter-rater

  15. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age.

    NARCIS (Netherlands)

    Daalen, E. van; Kemner, C.; Dietz, C.; Swinkels, S.H.N.; Buitelaar, J.K.; Engeland, H.M. van

    2009-01-01

    To examine the inter-rater reliability and stability of autism spectrum disorder (ASD) diagnoses made at a very early age in children identified through a screening procedure around 14 months of age. In a prospective design, preschoolers were recruited from a screening study for ASD. The inter-rater

  16. Inter-rater reliability of an observation-based ergonomics assessment checklist for office workers.

    Science.gov (United States)

    Pereira, Michelle Jessica; Straker, Leon Melville; Comans, Tracy Anne; Johnston, Venerina

    2016-12-01

    To establish the inter-rater reliability of an observation-based ergonomics assessment checklist for computer workers. A 37-item (38-item if a laptop was part of the workstation) comprehensive observational ergonomics assessment checklist comparable to government guidelines and up to date with empirical evidence was developed. Two trained practitioners assessed full-time office workers performing their usual computer-based work and evaluated the suitability of workstations used. Practitioners assessed each participant consecutively. The order of assessors was randomised, and the second assessor was blinded to the findings of the first. Unadjusted kappa coefficients between the raters were obtained for the overall checklist and subsections that were formed from question-items relevant to specific workstation equipment. Twenty-seven office workers were recruited. The inter-rater reliability between two trained practitioners achieved moderate to good reliability for all except one checklist component. This checklist has mostly moderate to good reliability between two trained practitioners. Practitioner Summary: This reliable ergonomics assessment checklist for computer workers was designed using accessible government guidelines and supplemented with up-to-date evidence. Employers in Queensland (Australia) can fulfil legislative requirements by using this reliable checklist to identify and subsequently address potential risk factors for work-related injury to provide a safe working environment.

  17. Test-re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females.

    Science.gov (United States)

    Beardsley, Chris; Egerton, Tim; Skinner, Brendon

    2016-01-01

    Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81-0.88), test-re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and test-re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test-re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test-re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test-re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.

  18. Inter-rater reliability of healthcare professional skills' portfolio assessments: The Andalusian Agency for Healthcare Quality model

    Directory of Open Access Journals (Sweden)

    Antonio Almuedo-Paz

    2014-07-01

    Full Text Available This study aims to determine the reliability of assessment criteria used for a portfolio at the Andalusian Agency for Healthcare Quality (ACSA. Data: all competences certification processes, regardless of their discipline. Period: 2010-2011. Three types of tests are used: 368 certificates, 17,895 reports and 22,642 clinical practice reports (N = 3,010 candidates. The tests were evaluated in pairs by the ACSA team of raters using two categories: valid and invalid. Results: The percentage agreement in assessments of certificates was 89,9%, while for the reports of clinical practice was 85,1 % and for clinical practice reports was 81,7%. The inter-rater agreement coefficients (kappa ranged from 0,468 to 0,711. Discussion: The results of this study show that the inter-rater reliability of assessments varies from fair to good. Compared with other similar studies, the results put the reliability of the model in a comfortable position. Among the improvements incorporated, progressive automation of evaluations must be highlighted.

  19. Assessment of disabilities in stroke patients with apraxia : Internal consistency and inter-observer reliability

    NARCIS (Netherlands)

    van Heugten, CM; Dekker, J; Deelman, BG; Stehmann-Saris, JC; Kinebanian, A

    1999-01-01

    In this paper the internal consistency and inter-observer reliability of the assessment of disabilities in stroke patients with apraxia is presented. Disabilities were assessed by means of observation of activities of daily living (ADL). The study was conducted at occupational therapy departments in

  20. Assessment of disabilities in stroke patients with apraxia: internal consistency and inter-observer reliability.

    NARCIS (Netherlands)

    Heugten, C.M. van; Dekker, J.; Deelman, B.G.; Stehmann-Saris, J.C.; Kinebanian, A.

    1999-01-01

    In this paper the internal consistency and inter-observer reliability of the assessment of disabilities in stroke patients with apraxia is presented. Disabilities were assessed by means of observation of activities of daily living (ADL). The study was conducted at occupational therapy departments in

  1. Test–re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females

    Directory of Open Access Journals (Sweden)

    Chris Beardsley

    2016-03-01

    Full Text Available Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test–re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81–0.88, test–re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88–0.95, and test–re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65 and good on the right side (ICC = 0.85. Conclusion. Inter-rater reliability and test–re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test–re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test–re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.

  2. Inter-rater reliability of a modified version of Delitto et al.’s classification-based system for low back pain : a pilot study

    NARCIS (Netherlands)

    Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C W

    2016-01-01

    Study design:: Observational inter-rater reliability study. Objectives: To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3)

  3. The inter-rater reliability and prognostic value of coma scales in Nepali children with acute encephalitis syndrome.

    Science.gov (United States)

    Ray, Stephen; Rayamajhi, Ajit; Bonnett, Laura J; Solomon, Tom; Kneen, Rachel; Griffiths, Michael J

    2018-02-01

    Background Acute encephalitis syndrome (AES) is a common cause of coma in Nepali children. The Glasgow coma scale (GCS) is used to assess the level of coma in these patients and predict outcome. Alternative coma scales may have better inter-rater reliability and prognostic value in encephalitis in Nepali children, but this has not been studied. The Adelaide coma scale (ACS), Blantyre coma scale (BCS) and the Alert, Verbal, Pain, Unresponsive scale (AVPU) are alternatives to the GCS which can be used. Methods Children aged 1-14 years who presented to Kanti Children's Hospital, Kathmandu with AES between September 2010 and November 2011 were recruited. All four coma scales (GCS, ACS, BCS and AVPU) were applied on admission, 48 h later and on discharge. Inter-rater reliability (unweighted kappa) was measured for each. Correlation and agreement between total coma score and outcome (Liverpool outcome score) was measured by Spearman's rank and Bland-Altman plot. The prognostic value of coma scales alone and in combination with physiological variables was investigated in a subgroup (n = 22). A multivariable logistic regression model was fitted by backward stepwise. Results Fifty children were recruited. Inter-rater reliability using the variables scales was fair to moderate. However, the scales poorly predicted clinical outcome. Combining the scales with physiological parameters such as systolic blood pressure improved outcome prediction. Conclusion This is the first study to compare four coma scales in Nepali children with AES. The scales exhibited fair to moderate inter-rater reliability. However, the study is inadequately powered to answer the question on the relationship between coma scales and outcome. Further larger studies are required.

  4. Chest computed tomography-based scoring of thoracic sarcoidosis: Inter-rater reliability of CT abnormalities

    Energy Technology Data Exchange (ETDEWEB)

    Heuvel, D.A.V. den; Es, H.W. van; Heesewijk, J.P. van; Spee, M. [St. Antonius Hospital Nieuwegein, Department of Radiology, Nieuwegein (Netherlands); Jong, P.A. de [University Medical Center Utrecht, Department of Radiology, Utrecht (Netherlands); Zanen, P.; Grutters, J.C. [University Medical Center Utrecht, Division Heart and Lungs, Utrecht (Netherlands); St. Antonius Hospital Nieuwegein, Center of Interstitial Lung Diseases, Department of Pulmonology, Nieuwegein (Netherlands)

    2015-09-15

    To determine inter-rater reliability of sarcoidosis-related computed tomography (CT) findings that can be used for scoring of thoracic sarcoidosis. CT images of 51 patients with sarcoidosis were scored by five chest radiologists for various abnormal CT findings (22 in total) encountered in thoracic sarcoidosis. Using intra-class correlation coefficient (ICC) analysis, inter-rater reliability was analysed and reported according to the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) criteria. A pre-specified sub-analysis was performed to investigate the effect of training. Scoring was trained in a distinct set of 15 scans in which all abnormal CT findings were represented. Median age of the 51 patients (36 men, 70 %) was 43 years (range 26 - 64 years). All radiographic stages were present in this group. ICC ranged from 0.91 for honeycombing to 0.11 for nodular margin (sharp versus ill-defined). The ICC was above 0.60 in 13 of the 22 abnormal findings. Sub-analysis for the best-trained observers demonstrated an ICC improvement for all abnormal findings and values above 0.60 for 16 of the 22 abnormalities. In our cohort, reliability between raters was acceptable for 16 thoracic sarcoidosis-related abnormal CT findings. (orig.)

  5. Chest computed tomography-based scoring of thoracic sarcoidosis: Inter-rater reliability of CT abnormalities

    International Nuclear Information System (INIS)

    Heuvel, D.A.V. den; Es, H.W. van; Heesewijk, J.P. van; Spee, M.; Jong, P.A. de; Zanen, P.; Grutters, J.C.

    2015-01-01

    To determine inter-rater reliability of sarcoidosis-related computed tomography (CT) findings that can be used for scoring of thoracic sarcoidosis. CT images of 51 patients with sarcoidosis were scored by five chest radiologists for various abnormal CT findings (22 in total) encountered in thoracic sarcoidosis. Using intra-class correlation coefficient (ICC) analysis, inter-rater reliability was analysed and reported according to the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) criteria. A pre-specified sub-analysis was performed to investigate the effect of training. Scoring was trained in a distinct set of 15 scans in which all abnormal CT findings were represented. Median age of the 51 patients (36 men, 70 %) was 43 years (range 26 - 64 years). All radiographic stages were present in this group. ICC ranged from 0.91 for honeycombing to 0.11 for nodular margin (sharp versus ill-defined). The ICC was above 0.60 in 13 of the 22 abnormal findings. Sub-analysis for the best-trained observers demonstrated an ICC improvement for all abnormal findings and values above 0.60 for 16 of the 22 abnormalities. In our cohort, reliability between raters was acceptable for 16 thoracic sarcoidosis-related abnormal CT findings. (orig.)

  6. Translation, adaptation and inter-rater reliability of the administration manual for the Fugl-Meyer assessment.

    Science.gov (United States)

    Michaelsen, Stella M; Rocha, André S; Knabben, Rodrigo J; Rodrigues, Luciano P; Fernandes, Claudia G C

    2011-01-01

    Recently, the reliability of the Brazilian version of the Fugl-Meyer Assessment (FMA) was assessed through the scoring given according to observations made by a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. Eighteen adults (59±10 years) with chronic hemiparesis (38±35 months after a stroke) took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98) and lower limbs (ICC=0.90), as well as for movement sense (ICC=0.98) and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively). The reliability was moderate for tactile sensitivity (0.75). The joint pain assessment presented low reliability. The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.

  7. IRR (Inter-Rater Reliability) of a COP (Classroom Observation Protocol)--A Critical Appraisal

    Science.gov (United States)

    Rui, Ning; Feldman, Jill M.

    2012-01-01

    Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…

  8. The Use of Bayesian Networks to Assess the Quality of Evidence from Research Synthesis: 2. Inter-Rater Reliability and Comparison with Standard GRADE Assessment.

    Directory of Open Access Journals (Sweden)

    Alexis Llewellyn

    Full Text Available The grades of recommendation, assessment, development and evaluation (GRADE approach is widely implemented in systematic reviews, health technology assessment and guideline development organisations throughout the world. We have previously reported on the development of the Semi-Automated Quality Assessment Tool (SAQAT, which enables a semi-automated validity assessment based on GRADE criteria. The main advantage to our approach is the potential to improve inter-rater agreement of GRADE assessments particularly when used by less experienced researchers, because such judgements can be complex and challenging to apply without training. This is the first study examining the inter-rater agreement of the SAQAT.We conducted two studies to compare: a the inter-rater agreement of two researchers using the SAQAT independently on 28 meta-analyses and b the inter-rater agreement between a researcher using the SAQAT (who had no experience of using GRADE and an experienced member of the GRADE working group conducting a standard GRADE assessment on 15 meta-analyses.There was substantial agreement between independent researchers using the Quality Assessment Tool for all domains (for example, overall GRADE rating: weighted kappa 0.79; 95% CI 0.65 to 0.93. Comparison between the SAQAT and a standard GRADE assessment suggested that inconsistency was parameterised too conservatively by the SAQAT. Therefore the tool was amended. Following amendment we found fair-to-moderate agreement between the standard GRADE assessment and the SAQAT (for example, overall GRADE rating: weighted kappa 0.35; 95% CI 0.09 to 0.87.Despite a need for further research, the SAQAT may aid consistent application of GRADE, particularly by less experienced researchers.

  9. Inter-rater reliability of h-index scores calculated by Web of Science and Scopus for clinical epidemiology scientists.

    Science.gov (United States)

    Walker, Benjamin; Alavifard, Sepand; Roberts, Surain; Lanes, Andrea; Ramsay, Tim; Boet, Sylvain

    2016-06-01

    We investigated the inter-rater reliability of Web of Science (WoS) and Scopus when calculating the h-index of 25 senior scientists in the Clinical Epidemiology Program of the Ottawa Hospital Research Institute. Bibliometric information and the h-indices for the subjects were computed by four raters using the automatic calculators in WoS and Scopus. Correlation and agreement between ratings was assessed using Spearman's correlation coefficient and a Bland-Altman plot, respectively. Data could not be gathered from Google Scholar due to feasibility constraints. The Spearman's rank correlation between the h-index of scientists calculated with WoS was 0.81 (95% CI 0.72-0.92) and with Scopus was 0.95 (95% CI 0.92-0.99). The Bland-Altman plot showed no significant rater bias in WoS and Scopus; however, the agreement between ratings is higher in Scopus compared to WoS. Our results showed a stronger relationship and increased agreement between raters when calculating the h-index of a scientist using Scopus compared to WoS. The higher inter-rater reliability and simple user interface used in Scopus may render it the more effective database when calculating the h-index of senior scientists in epidemiology. © 2016 Health Libraries Group.

  10. Ultrasound assessment for grading structural tendon changes in supraspinatus tendinopathy: an inter-rater reliability study

    DEFF Research Database (Denmark)

    Ingwersen, Kim Gordon; Hjarbæk, John; Eshøj, Henrik

    2016-01-01

    Aim To evaluate the inter-rater reliability of measuring structural changes in the tendon of patients, clinically diagnosed with supraspinatus tendinopathy (cases) and healthy participants (controls), on ultrasound (US) images captured by standardised procedures. Methods A total of 40 participant...

  11. Inter-rater agreement among orthodontists in a blocked experiment.

    Science.gov (United States)

    Korn, E L; Baumrind, S

    1985-01-01

    Five orthodontists were asked to predict for 64 patients a particular dichotomous outcome of treatment based on pre-treatment X-ray films. The orthodontists rated the cases in blocks of size 4-6 with the knowledge of the number of positive outcomes in each block. We discuss the reasons why this blocked design is appropriate whenever clinicians are asked to rate cases which have not been randomly selected from a clinical practice similar to their own. We give a simple description of the inter-rater agreement for this type of blocked experiment as well as a procedure to test that the agreement is no better than that expected by random independent assignment.

  12. Analyses of inter-rater reliability between professionals, medical students and trained school children as assessors of basic life support skills.

    Science.gov (United States)

    Beck, Stefanie; Ruhnke, Bjarne; Issleib, Malte; Daubmann, Anne; Harendza, Sigrid; Zöllner, Christian

    2016-10-07

    Training of lay-rescuers is essential to improve survival-rates after cardiac arrest. Multiple campaigns emphasise the importance of basic life support (BLS) training for school children. Trainings require a valid assessment to give feedback to school children and to compare the outcomes of different training formats. Considering these requirements, we developed an assessment of BLS skills using MiniAnne and tested the inter-rater reliability between professionals, medical students and trained school children as assessors. Fifteen professional assessors, 10 medical students and 111-trained school children (peers) assessed 1087 school children at the end of a CPR-training event using the new assessment format. Analyses of inter-rater reliability (intraclass correlation coefficient; ICC) were performed. Overall inter-rater reliability of the summative assessment was high (ICC = 0.84, 95 %-CI: 0.84 to 0.86, n = 889). The number of comparisons between peer-peer assessors (n = 303), peer-professional assessors (n = 339), and peer-student assessors (n = 191) was adequate to demonstrate high inter-rater reliability between peer- and professional-assessors (ICC: 0.76), peer- and student-assessors (ICC: 0.88) and peer- and other peer-assessors (ICC: 0.91). Systematic variation in rating of specific items was observed for three items between professional- and peer-assessors. Using this assessment and integrating peers and medical students as assessors gives the opportunity to assess hands-on skills of school children with high reliability.

  13. Intra- and Inter-rater Agreement of Superior Vena Cava Flow and Right Ventricular Outflow Measurements in Late Preterm and Term Neonates.

    Science.gov (United States)

    Mahoney, Liam; Fernandez-Alvarez, Jose R; Rojas-Anaya, Hector; Aiton, Neil; Wertheim, David; Seddon, Paul; Rabe, Heike

    2018-02-24

    To explore the intra- and inter-rater agreement of superior vena cava (SVC) flow and right ventricular (RV) outflow in healthy and unwell late preterm neonates (33-37 weeks' gestational age), term neonates (≥37 weeks' gestational age), and neonates receiving total-body cooling. The intra- and inter-rater agreement (n = 25 and 41 neonates, respectively) rates for SVC flow and RV outflow were determined by echocardiography in healthy and unwell late preterm and term neonates with the use of Bland-Altman plots, the repeatability coefficient, the repeatability index, and intraclass correlation coefficients. The intra-rater repeatability index values were 41% for SVC flow and 31% for RV outflow, with intraclass correlation coefficients indicating good agreement for both measures. The inter-rater repeatability index values for SVC flow and RV outflow were 63% and 51%, respectively, with intraclass correlation coefficients indicating moderate agreement for both measures. If SVC flow or RV outflow is used in the hemodynamic treatment of neonates, sequential measurements should ideally be performed by the same clinician to reduce potential variability. © 2018 by the American Institute of Ultrasound in Medicine.

  14. Examining Design and Inter-Rater Reliability of a Rubric Measuring Research Quality across Multiple Disciplines

    Directory of Open Access Journals (Sweden)

    Marilee J. Bresciani

    2009-05-01

    Full Text Available The paper presents a rubric to help evaluate the quality of research projects. The rubric was applied in a competition across a variety of disciplines during a two-day research symposium at one institution in the southwest region of the United States of America. It was collaboratively designed by a faculty committee at the institution and was administered to 204 undergraduate, master, and doctoral oral presentations by approximately 167 different evaluators. No training or norming of the rubric was given to 147 of the evaluators prior to the competition. The findings of the inter-rater reliability analysis reveal substantial agreement among the judges, which contradicts literature describing the fact that formal norming must occur prior to seeing substantial levels of inter-rater reliability. By presenting the rubric along with the methodology used in its design and evaluation, it is hoped that others will find this to be a useful tool for evaluating documents and for teaching research methods.

  15. Inter-rater reliability of nursing home quality indicators in the U.S

    Directory of Open Access Journals (Sweden)

    Roy Jason

    2003-11-01

    Full Text Available Abstract Background In the US, Quality Indicators (QI's profiling and comparing the performance of hospitals, health plans, nursing homes and physicians are routinely published for consumer review. We report the results of the largest study of inter-rater reliability done on nursing home assessments which generate the data used to derive publicly reported nursing home quality indicators. Methods We sampled nursing homes in 6 states, selecting up to 30 residents per facility who were observed and assessed by research nurses on 100 clinical assessment elements contained in the Minimum Data Set (MDS and compared these with the most recent assessment in the record done by facility nurses. Kappa statistics were generated for all data items and derived for 22 QI's over the entire sample and for each facility. Finally, facilities with many QI's with poor Kappa levels were compared to those with many QI's with excellent Kappa levels on selected characteristics. Results A total of 462 facilities in 6 states were approached and 219 agreed to participate, yielding a response rate of 47.4%. A total of 5758 residents were included in the inter-rater reliability analyses, around 27.5 per facility. Patients resembled the traditional nursing home resident, only 43.9% were continent of urine and only 25.2% were rated as likely to be discharged within the next 30 days. Results of resident level comparative analyses reveal high inter-rater reliability levels (most items >.75. Using the research nurses as the "gold standard", we compared composite quality indicators based on their ratings with those based on facility nurses. All but two QI's have adequate Kappa levels and 4 QI's have average Kappa values in excess of .80. We found that 16% of participating facilities performed poorly (Kappa .75 on 12 or more QI's. No facility characteristics were related to reliability of the data on which Qis are based. Conclusion While a few QI's being used for public reporting

  16. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age.

    Science.gov (United States)

    van Daalen, Emma; Kemner, Chantal; Dietz, Claudine; Swinkels, Sophie H N; Buitelaar, Jan K; van Engeland, Herman

    2009-11-01

    To examine the inter-rater reliability and stability of autism spectrum disorder (ASD) diagnoses made at a very early age in children identified through a screening procedure around 14 months of age. In a prospective design, preschoolers were recruited from a screening study for ASD. The inter-rater reliability of the diagnosis of ASD was measured through an independent assessment of a randomly selected subsample of 38 patients by two other psychiatrists. The diagnoses at 23 months and 42 months of 131 patients, based on the clinical assessment and the diagnostic classifications of standardised instruments, were compared to evaluate stability of the diagnosis of ASD. Inter-rater reliability on a diagnosis of ASD versus non-ASD at 23 months was 87% with a weighted kappa of 0.74 (SE 0.11). The stability of the different diagnoses in the autism spectrum was 63% for autistic disorder, 54% for pervasive developmental disorder, not otherwise specified (PDD-NOS), and 91% for the whole category of ASD. Most diagnostic changes at 42 months were within the autism spectrum from autistic disorder to PDD-NOS and were mainly due to diminished symptom severity. Children who moved outside the ASD category at 42 months made significantly larger gains in cognitive and language skills than children with a stable ASD diagnosis. In conclusion, the inter-rater reliability and stability of the diagnoses of ASD established at 23 months in this population-based sample of very young children are good.

  17. Inter-rater reliability and agreement of the 6-minute walk test in females with hip fractures

    DEFF Research Database (Denmark)

    Overgaard, Jan; Larsen, Camilla Marie; Tange Kristensen, Morten

    physiotherapy students independently examined (randomized order) a convenient sample of 20 participants; their assessments were separated by two days, and testing followed instructions from the American Thoracic Society. Hip pain was assessed with the Verbal Ranking Scale. Participants (all women) with a mean...... (SD) age of 78.1 ± 5.9 years performed the test within a mean of 31.5 ± 5.8 days post-surgery; 10 had a cervical and 10 a trochanteric fracture. Excellent inter-rater reliability; ICC2.1 = 0.92 (95% CI, 0.81 - 0.97) was found, and the standard error of measurement (SEM) and smallest real difference.......6 meters longer, at the second trial (P = 0.002). Participants with moderate hip fracture-related pain walked a shorter distance than those with no or light pain during the first test (P = 0.04), while this was not the case during the second (P = 0.25). Excellent inter-rater reliability was found...

  18. Intra- and inter-rater reliability of 3D passive intervertebral motion in subjects with nonspecific neck pain assessed by physical therapy students: A pilot study.

    Science.gov (United States)

    Rossettini, Giacomo; Rondoni, Angie; Lovato, Tommaso; Strobe, Marco; Verzè, Elisa; Vicentini, Marco; Testa, Marco

    2016-06-03

    Passive Intervertebral Movements (PIVMs) are commonly used to assess and treat patients with nonspecific neck pain. Only very few studies have investigated 3D movements until now. This study assessed intra- and inter-rater reliability of three-dimensional (3D) cervical PIVMs performed by physical therapy students in patients with nonspecific neck pain. Thirty-one patients, mean age 47.2 ± 7.2 years, were independently evaluated by 2 physical therapy students. The raters (A and B) assessed mobility, end-feel and pain provocation performing bilaterally the 3D cervical segmental side-bending test (3D CSSB) from levels C2-C3 to C6-C7. Percentage agreement (raw, positive and negative), Cohen's kappa (95% CI), prevalence index and bias index were calculated to estimate intra- and inter-reliability. Intra-rater reliability showed kappa values ranging between fair and substantial (k 0.29-0.80) for pain provocation, mobility and end-feel, with percentage agreements between 61%-90%. Inter-rater reliability presented kappa values ranging between fair and substantial (k 0.22-0.62) for pain provocation, mobility and end-feel, with percentage agreements between 61% and 80%. Intra-rater reliability of 3D PIVMs was superior to inter-rater reliability in patients with nonspecific neck pain. The most repeatable evaluation parameter was pain. However overall poor reliability suggests avoiding the use of these techniques alone to examine patients and measure their outcome. Further studies are needed to investigate PIVMs reliability in combination with other assessment procedure in symptomatic patients.

  19. Inter-rater agreement of the PEWS tools used in Central Denmark Region

    DEFF Research Database (Denmark)

    Jensen, Claus Sixtus; Aagaard, Hanne; Olesen, Hanne Vebert

    2017-01-01

    BACKGROUND: Paediatric early warning score (PEWS) assessment tools can assist healthcare providers in the timely detection and recognition of subtle patient condition changes signalling clinical deterioration. However, PEWS tools instrument data are only as reliable and accurate as the caregivers...... agreement. The nurses assigned the exact same aggregated score for both PEWS models in 76% of the cases. In 98% of the PEWS assessments, the aggregated PEWS scores assigned by the nurses were equal to or below 1 point in both models. CONCLUSION: The study showed good to very good inter-rater reliability...

  20. Inter-rater Agreement of Clinicians' Treatment Recommendations Based on Modified Barium Swallow Study Reports.

    Science.gov (United States)

    Slovarp, Laurie; Danielson, Jennifer; Liss, Julie

    2018-06-07

    The modified barium swallow study (MBSS) is a commonly used radiographic procedure for diagnosis and treatment of swallowing disorders. Despite attempts by dysphagia specialists to standardize the MBSS, most institutions have not adopted such standardized procedures. High variability of assessment patterns arguably contribute to variability of treatment recommendations made from diagnostic information derived from the MBSS report. An online survey was distributed to speech-language pathologists (SLPs) participating in American Speech Language Hearing Association (ASHA) listservs. Sixty-three SLPs who treat swallowing disorders participated. Participating SLPs reviewed two MBSS reports and chose physiologic treatment targets (e.g., tongue base retraction) based on each report. One report primarily contained symptomatology (e.g., aspiration, pharyngeal residue) with minimal information on impaired physiology (e.g., laryngeal incompetence, reduced hyolaryngeal elevation/excursion). In contrast, the second report contained a clear description of impaired physiology to explain the dysphagia symptoms. Fleiss kappa coefficients were used to analyze inter-rater agreement across the high and low physiology report types. Results revealed significantly higher inter-rater agreement across clinicians when reviewing reports with clear explanation(s) of physiologic impairment relative to reports that primarily focused on symptomatology. Clinicians also reported significantly greater satisfaction and treatment confidence following review of reports with clear description(s) of impaired physiology.

  1. Inter-Rater Reliability of Provider Interpretations of Irritable Bowel Syndrome Food and Symptom Journals.

    Science.gov (United States)

    Zia, Jasmine; Chung, Chia-Fang; Xu, Kaiyuan; Dong, Yi; Schenk, Jeanette M; Cain, Kevin; Munson, Sean; Heitkemper, Margaret M

    2017-11-04

    There are currently no standardized methods for identifying trigger food(s) from irritable bowel syndrome (IBS) food and symptom journals. The primary aim of this study was to assess the inter-rater reliability of providers' interpretations of IBS journals. A second aim was to describe whether these interpretations varied for each patient. Eight providers reviewed 17 IBS journals and rated how likely key food groups (fermentable oligo-di-monosaccharides and polyols, high-calorie, gluten, caffeine, high-fiber) were to trigger IBS symptoms for each patient. Agreement of trigger food ratings was calculated using Krippendorff's α-reliability estimate. Providers were also asked to write down recommendations they would give to each patient. Estimates of agreement of trigger food likelihood ratings were poor (average α = 0.07). Most providers gave similar trigger food likelihood ratings for over half the food groups. Four providers gave the exact same written recommendation(s) (range 3-7) to over half the patients. Inter-rater reliability of provider interpretations of IBS food and symptom journals was poor. Providers favored certain trigger food likelihood ratings and written recommendations. This supports the need for a more standardized method for interpreting these journals and/or more rigorous techniques to accurately identify personalized IBS food triggers.

  2. Inter-Rater Reliability of Provider Interpretations of Irritable Bowel Syndrome Food and Symptom Journals

    Directory of Open Access Journals (Sweden)

    Jasmine Zia

    2017-11-01

    Full Text Available There are currently no standardized methods for identifying trigger food(s from irritable bowel syndrome (IBS food and symptom journals. The primary aim of this study was to assess the inter-rater reliability of providers’ interpretations of IBS journals. A second aim was to describe whether these interpretations varied for each patient. Eight providers reviewed 17 IBS journals and rated how likely key food groups (fermentable oligo-di-monosaccharides and polyols, high-calorie, gluten, caffeine, high-fiber were to trigger IBS symptoms for each patient. Agreement of trigger food ratings was calculated using Krippendorff’s α-reliability estimate. Providers were also asked to write down recommendations they would give to each patient. Estimates of agreement of trigger food likelihood ratings were poor (average α = 0.07. Most providers gave similar trigger food likelihood ratings for over half the food groups. Four providers gave the exact same written recommendation(s (range 3–7 to over half the patients. Inter-rater reliability of provider interpretations of IBS food and symptom journals was poor. Providers favored certain trigger food likelihood ratings and written recommendations. This supports the need for a more standardized method for interpreting these journals and/or more rigorous techniques to accurately identify personalized IBS food triggers.

  3. INTER-RATER RELIABILITY FOR MOVEMENT PATTERN ANALYSIS (MPA: MEASURING PATTERNING OF BEHAVIORS VERSUS DISCRETE BEHAVIOR COUNTS AS INDICATORS OF DECISION-MAKING STYLE

    Directory of Open Access Journals (Sweden)

    Brenda L Connors

    2014-06-01

    Full Text Available The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic – the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts – and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from Movement Pattern Analysis (MPA, an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective, inter-rater reliability for patterning (proportional indicators of each factor was significantly higher and excellent (ICC = .89. Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring discrete behavioral counts versus patterning of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns.

  4. The impact of revised DSM-5 criteria on the relative distribution and inter-rater reliability of eating disorder diagnoses in a residential treatment setting.

    Science.gov (United States)

    Thomas, Jennifer J; Eddy, Kamryn T; Murray, Helen B; Tromp, Marilou D P; Hartmann, Andrea S; Stone, Melissa T; Levendusky, Philip G; Becker, Anne E

    2015-09-30

    This study evaluated the relative distribution and inter-rater reliability of revised DSM-5 criteria for eating disorders in a residential treatment program. Consecutive adolescent and young adult females (N=150) admitted to a residential eating disorder treatment facility were assigned both DSM-IV and DSM-5 diagnoses by a clinician (n=14) via routine clinical interview and a research assessor (n=4) via structured interview. We compared the frequency of diagnostic assignments under each taxonomy and by type of assessor. We evaluated concordance between clinician and researcher assignment through inter-rater reliability kappa and percent agreement. Significantly fewer patients received either clinician or researcher diagnoses of a residual eating disorder under DSM-5 (clinician-12.0%; researcher-31.3%) versus DSM-IV (clinician-28.7%; researcher-59.3%), with the majority of reassigned DSM-IV residual cases reclassified as DSM-5 anorexia nervosa. Researcher and clinician diagnoses showed moderate inter-rater reliability under DSM-IV (κ=.48) and DSM-5 (κ=.57), though agreement for specific DSM-5 other specified feeding or eating disorder (OSFED) presentations was poor (κ=.05). DSM-5 revisions were associated with significantly less frequent residual eating disorder diagnoses, but not with reduced inter-rater reliability. Findings support specific dimensions of clinical utility for revised DSM-5 criteria for eating disorders. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  5. Inter- and intra-rater reliability of 3D kinematics during maximum mouth opening of asymptomatic subjects.

    Science.gov (United States)

    Calixtre, Leticia Bojikian; Nakagawa, Theresa Helissa; Alburquerque-Sendín, Francisco; da Silva Grüninger, Bruno Leonardo; de Sena Rosa, Lianna Ramalho; Oliveira, Ana Beatriz

    2017-11-07

    Previous studies evaluated 3D human jaw movements using kinematic analysis systems during mouth opening, but information on the reliability of such measurements is still scarce. The purpose of this study was to analyze within- and between-session reliabilities, inter-rater reliability, standard error of measurement (SEM), minimum detectable change (MDC) and consistency of agreement across raters and sessions of 3D kinematic variables during maximum mouth opening (MMO). Thirty-six asymptomatic subjects from both genders were evaluated on two different days, five to seven days apart. Subjects performed three MMO movements while kinematic data were collected. Intraclass correlation coefficient (ICC), SEM and MDC were calculated for all variables, and Bland-Altman plots were constructed. Jaw radius and width were the most reproducible variables (ICC>0.81) and demonstrated minor error. Incisor displacement during MMO and angular movements in the sagittal plane presented good reliability (ICC from 0.61 to 0.8) and small errors and, consequently, could be used in future studies with the same methodology and population. The variables with smaller amplitudes (condylar translations during mouth opening and closing and mandibular movements on the frontal and transversal planes) were less reliable (ICCmandibular movements in the frontal and transversal planes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Inter-Rater Agreement of Pressure Ulcer Risk and Prevention Measures in the National Database of Nursing Quality Indicators(®) (NDNQI).

    Science.gov (United States)

    Waugh, Shirley Moore; Bergquist-Beringer, Sandra

    2016-06-01

    In this descriptive multi-site study, we examined inter-rater agreement on 11 National Database of Nursing Quality Indicators(®) (NDNQI(®) ) pressure ulcer (PrU) risk and prevention measures. One hundred twenty raters at 36 hospitals captured data from 1,637 patient records. At each hospital, agreement between the most experienced rater and each other team rater was calculated for each measure. In the ratings studied, 528 patients were rated as "at risk" for PrU and, therefore, were included in calculations of agreement for the prevention measures. Prevalence-adjusted kappa (PAK) was used to interpret inter-rater agreement because prevalence of single responses was high. The PAK values for eight measures indicated "substantial" to "near perfect" agreement between most experienced and other team raters: Skin assessment on admission (.977, 95% CI [.966-.989]), PrU risk assessment on admission (.978, 95% CI [.964-.993]), Time since last risk assessment (.790, 95% CI [.729-.852]), Risk assessment method (.997, 95% CI [.991-1.0]), Risk status (.877, 95% CI [.838-.917]), Any prevention (.856, 95% CI [.76-.943]), Skin assessment (.956, 95% CI [.904-1.0]), and Pressure-redistribution surface use (.839, 95% CI [.763-.916]). For three intervention measures, PAK values fell below the recommended value of ≥.610: Routine repositioning (.577, 95% CI [.494-.661]), Nutritional support (.500, 95% CI [.418-.581]), and Moisture management (.556, 95% CI [.469-.643]). Areas of disagreement were identified. Findings provide support for the reliability of 8 of the 11 measures. Further clarification of data collection procedures is needed to improve reliability for the less reliable measures. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  7. The inter-rater reliability of the incontinence-associated dermatitis intervention tool-D (IADIT-D) between two independent registered nurses of nursing home residents in long-term care facilities.

    Science.gov (United States)

    Braunschmidt, Brigitte; Müller, Gerhard; Jukic-Puntigam, Margareta; Steininger, Alfred

    2013-01-01

    Incontinence-associated dermatitis (IAD) is the clinical manifestation of moisture related skin damage (Beeckman, Woodward, & Gray, 2011). Valid assessment instruments are needed for risk assessment and classification of IAD. Aim of the quantitative-descriptive cross-sectional study was to determine the inter-rater reliability of the item scores of the German Incontinence Associated Dermatitis Intervention Tool (IADIT-D) between two independent assessors of nursing home residents (n = 381) in long-term care facilities. The 19 pairs of assessors consisted of registered nurses. The data analysis was computed first with the calculation of the total percentage of agreement. Because this value is not randomly adjusted, the calculation of the Kappa-coefficients and AC1-Statistic was done as well. The total percentage of the inter-rater agreement was 84% (n = 319). In a second step of analysis, the calculation of all items determined high (kappa = .70) and very high agreement (AC1 = .83) levels, respectively. For the risk assessment (kappa = .82; AC1 = .94), the values amounted to very high agreement levels and for the classification (kappa(w) = .70; AC1 = .76) to high agreement levels. The high to very high agreement values of IADIT-D demonstrate that the items can be regarded as stable in regards to the inter-rater reliability for the use in long-term care facilities. In addition, further validation studies are needed.

  8. The Berg Balance Scale has high intra- and inter-rater reliability but absolute reliability varies across the scale: a systematic review.

    Science.gov (United States)

    Downs, Stephen; Marquez, Jodie; Chiarelli, Pauline

    2013-06-01

    What is the intra-rater and inter-rater relative reliability of the Berg Balance Scale? What is the absolute reliability of the Berg Balance Scale? Does the absolute reliability of the Berg Balance Scale vary across the scale? Systematic review with meta-analysis of reliability studies. Any clinical population that has undergone assessment with the Berg Balance Scale. Relative intra-rater reliability, relative inter-rater reliability, and absolute reliability. Eleven studies involving 668 participants were included in the review. The relative intrarater reliability of the Berg Balance Scale was high, with a pooled estimate of 0.98 (95% CI 0.97 to 0.99). Relative inter-rater reliability was also high, with a pooled estimate of 0.97 (95% CI 0.96 to 0.98). A ceiling effect of the Berg Balance Scale was evident for some participants. In the analysis of absolute reliability, all of the relevant studies had an average score of 20 or above on the 0 to 56 point Berg Balance Scale. The absolute reliability across this part of the scale, as measured by the minimal detectable change with 95% confidence, varied between 2.8 points and 6.6 points. The Berg Balance Scale has a higher absolute reliability when close to 56 points due to the ceiling effect. We identified no data that estimated the absolute reliability of the Berg Balance Scale among participants with a mean score below 20 out of 56. The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects. The review was only able to comment on the absolute reliability of the Berg Balance Scale among people with moderately poor to normal balance. Copyright © 2013 Australian Physiotherapy Association. Published by .. All rights reserved.

  9. Test–re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females

    OpenAIRE

    Chris Beardsley; Tim Egerton; Brendon Skinner

    2016-01-01

    Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females.\\ud \\ud Method. The inter-rater reliability and test–re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart.\\ud \\ud Results. For measuring pel...

  10. BurnCase 3D software validation study: Burn size measurement accuracy and inter-rater reliability.

    Science.gov (United States)

    Parvizi, Daryousch; Giretzlehner, Michael; Wurzer, Paul; Klein, Limor Dinur; Shoham, Yaron; Bohanon, Fredrick J; Haller, Herbert L; Tuca, Alexandru; Branski, Ludwik K; Lumenta, David B; Herndon, David N; Kamolz, Lars-P

    2016-03-01

    The aim of this study was to compare the accuracy of burn size estimation using the computer-assisted software BurnCase 3D (RISC Software GmbH, Hagenberg, Austria) with that using a 2D scan, considered to be the actual burn size. Thirty artificial burn areas were pre planned and prepared on three mannequins (one child, one female, and one male). Five trained physicians (raters) were asked to assess the size of all wound areas using BurnCase 3D software. The results were then compared with the real wound areas, as determined by 2D planimetry imaging. To examine inter-rater reliability, we performed an intraclass correlation analysis with a 95% confidence interval. The mean wound area estimations of the five raters using BurnCase 3D were in total 20.7±0.9% for the child, 27.2±1.5% for the female and 16.5±0.1% for the male mannequin. Our analysis showed relative overestimations of 0.4%, 2.8% and 1.5% for the child, female and male mannequins respectively, compared to the 2D scan. The intraclass correlation between the single raters for mean percentage of the artificial burn areas was 98.6%. There was also a high intraclass correlation between the single raters and the 2D Scan visible. BurnCase 3D is a valid and reliable tool for the determination of total body surface area burned in standard models. Further clinical studies including different pediatric and overweight adult mannequins are warranted. Copyright © 2016 Elsevier Ltd and ISBI. All rights reserved.

  11. Quality of nursing intensity data: inter-rater reliability of the patient classification after two decades in clinical use.

    Science.gov (United States)

    Liljamo, Pia; Kinnunen, Ulla-Mari; Ohtonen, Pasi; Saranto, Kaija

    2017-09-01

    The aim of this study was to measure the inter-rater reliability of the Oulu Patient Classification and to discuss existing methods of reliability testing. The Oulu Patient Classification, part of the RAFAELA ® System, has been developed to assist nursing managers with the proper allocation of nursing resources. Due to the increased intensity of inpatient care during recent years, there is a need for the reliability testing of the classification, which has been in clinical use for 20 years. Retrospective statistical study. To test inter-rater reliability, a pair of nurses classified the same patients, without knowledge of each other's ratings, as a part of annually conducted standardization. Data on the parallel classifications (n = 19,997) was obtained from inpatient units (n = 32) with different specialties at a university hospital in Finland during 2010-2015. Parallel classification practices were also analysed. The reliability of the overall classification and its subareas were calculated using suitable statistical coefficients. Inter-rater reliability coefficients were a reliable or almost perfect means of considering the nursing intensity category and various practices, but there were detectable differences between subareas. The lowest agreement levels occurred in the subareas 'Planning and Coordination of Nursing Care' and 'Guiding of Care/Continued Care and Emotional Support'. There is a need to develop the descriptions of subareas and to clarify the related concepts. Precise nursing documentation can promote a high level of agreement and reliable results. The traditional overall proportion of agreement does not provide an adequate picture of reliability - weighted kappa coefficients should be used instead. © 2017 John Wiley & Sons Ltd.

  12. Inter-Rater Reliability of Neck Reflex Points in Women with Chronic Neck Pain.

    Science.gov (United States)

    Weinschenk, Stefan; Göllner, Richard; Hollmann, Markus W; Hotz, Lorenz; Picardi, Susanne; Hubbert, Katharina; Strowitzki, Thomas; Meuser, Thomas

    2016-01-01

    Neck reflex points (NRP) are tender soft tissue areas of the cervical region that display reflectory changes in response to chronic inflammations of correlated regions in the visceral cranium. Six bilateral areas, NRP C0, C1, C2, C3, C4 and C7, are detectable by palpating the lateral neck. We investigated the inter-rater reliability of NRP to assess their potential clinical relevance. 32 consecutive patients with chronic neck pain were examined for NRP tenderness by an experienced physician and an inexperienced medical student in a blinded design. A detailed description of the palpation technique is included in this section. Absence of pain was defined as pain index (PI) = 0, slight tenderness = 1, and marked pain = 2. Findings were evaluated either by pair-wise Cohen's kappa (ĸ) or by percentage of agreement (PA). Examiners identified 40% and 41% of positive NRP, respectively (PI > 0, physician: 155, student: 157) with a slight preference for the left side (1.2:1). The number of patients identified with >6 positive NRP by the examiners was similar (13 vs. 12 patients). ĸ values ranged from 0.52 to 0.95. The overall kappa was ĸ = 0.80 for the left and ĸ = 0.74 for the right side. PA varied from 78.1% to 96.9% with strongest agreement at NRP C0, NRP C2, and NRP C7. Inter-rater agreement was independent of patients' age, gender, body mass index and examiner's experience. The high reproducibility suggests the clinical relevance of NRP in women. © 2016 S. Karger GmbH, Freiburg.

  13. Transcultural Adaptation of GRID Hamilton Rating Scale For Depression (GRID-HAMD) to Brazilian Portuguese and Evaluation of the Impact of Training Upon Inter-Rater Reliability.

    Science.gov (United States)

    Henrique-Araújo, Ricardo; Osório, Flávia L; Gonçalves Ribeiro, Mônica; Soares Monteiro, Ivandro; Williams, Janet B W; Kalali, Amir; Alexandre Crippa, José; Oliveira, Irismar Reis De

    2014-07-01

    GRID-HAMD is a semi-structured interview guide developed to overcome flaws in HAM-D, and has been incorporated into an increasing number of studies. Carry out the transcultural adaptation of GRID-HAMD into the Brazilian Portuguese language, evaluate the inter-rater reliability of this instrument and the training impact upon this measure, and verify the raters' opinions of said instrument. The transcultural adaptation was conducted by appropriate methodology. The measurement of inter-rater reliability was done by way of videos that were evaluated by 85 professionals before and after training for the use of this instrument. The intraclass correlation coefficient (ICC) remained between 0.76 and 0.90 for GRID-HAMD-21 and between 0.72 and 0.91 for GRID-HAMD-17. The training did not have an impact on the ICC, except for a few groups of participants with a lower level of experience. Most of the participants showed high acceptance of GRID-HAMD, when compared to other versions of HAM-D. The scale presented adequate inter-rater reliability even before training began. Training did not have an impact on this measure, except for a few groups with less experience. GRID-HAMD received favorable opinions from most of the participants.

  14. Inter-rater reliability of PATH observations for assessment of ergonomic risk factors in hospital work.

    Science.gov (United States)

    Park, Jung-Keun; Boyer, Jon; Tessler, Jamie; Casey, Jeffrey; Schemm, Linda; Gore, Rebecca; Punnett, Laura

    2009-07-01

    This study examined the inter-rater reliability of expert observations of ergonomic risk factors by four analysts. Ten jobs were observed at a hospital using a newly expanded version of the PATH method (Buchholz et al. 1996), to which selected upper extremity exposures had been added. Two of the four raters simultaneously observed each worker onsite for a total of 443 observation pairs containing 18 categorical exposure items each. For most exposure items, kappa coefficients were 0.4 or higher. For some items, agreement was higher both for the jobs with less rapid hand activity and for the analysts with a higher level of ergonomic job analysis experience. These upper extremity exposures could be characterised reliably with real-time observation, given adequate experience and training of the observers. The revised version of PATH is applicable to the analysis of jobs where upper extremity musculoskeletal strain is of concern.

  15. Inter-rater reliability of kinesthetic measurements with the KINARM robotic exoskeleton.

    Science.gov (United States)

    Semrau, Jennifer A; Herter, Troy M; Scott, Stephen H; Dukelow, Sean P

    2017-05-22

    Kinesthesia (sense of limb movement) has been extremely difficult to measure objectively, especially in individuals who have survived a stroke. The development of valid and reliable measurements for proprioception is important to developing a better understanding of proprioceptive impairments after stroke and their impact on the ability to perform daily activities. We recently developed a robotic task to evaluate kinesthetic deficits after stroke and found that the majority (~60%) of stroke survivors exhibit significant deficits in kinesthesia within the first 10 days post-stroke. Here we aim to determine the inter-rater reliability of this robotic kinesthetic matching task. Twenty-five neurologically intact control subjects and 15 individuals with first-time stroke were evaluated on a robotic kinesthetic matching task (KIN). Subjects sat in a robotic exoskeleton with their arms supported against gravity. In the KIN task, the robot moved the subjects' stroke-affected arm at a preset speed, direction and distance. As soon as subjects felt the robot begin to move their affected arm, they matched the robot movement with the unaffected arm. Subjects were tested in two sessions on the KIN task: initial session and then a second session (within an average of 18.2 ± 13.8 h of the initial session for stroke subjects), which were supervised by different technicians. The task was performed both with and without the use of vision in both sessions. We evaluated intra-class correlations of spatial and temporal parameters derived from the KIN task to determine the reliability of the robotic task. We evaluated 8 spatial and temporal parameters that quantify kinesthetic behavior. We found that the parameters exhibited moderate to high intra-class correlations between the initial and retest conditions (Range, r-value = [0.53-0.97]). The robotic KIN task exhibited good inter-rater reliability. This validates the KIN task as a reliable, objective method for quantifying

  16. [Quality assurance in coding expertise of hospital cases in the German DRG system. Evaluation of inter-rater reliability in MDK expertise].

    Science.gov (United States)

    Huber, H; Brambrink, M; Funk, R; Rieger, M

    2012-10-01

    The purpose of this study was to evaluate differences in the D-DRG results of a hospital case by 2 independently coding MKD raters. Calculation of the 2-inter-rater reliability was performed by examination of the coding of individual hospital cases. The reasons for the non-agreement of the expert evaluations and suggestions to improve the process are discussed. From the expert evaluation pool of the MDK-WL a random sample of 0.7% of the 57,375 expertises was taken. Distribution equality with the basic total was tested by the χ² test or, respectively, Fisher's exact test. For the total of 402 individual hospital cases, the G-DRG case sums of 2 experts of the MDK were determined independently and the results checked for each individual case for agreement or non-agreement. The corresponding confidence intervals with standard errors were analysed to test if certain major diagnosis categories (MDC) were statistically significantly more affected by differing expertise results than others. In 280 of the total 402 tested hospital cases, the 2 MDK raters independently reached the same G-DRG results; in 122 cases the G-DRG case sums determined by the 2 raters differed (agreement 70%; CI 65.2-74.1). Different DRG results between the 2 experts occurred regularly in the entire MDC spectrum. No MDC chapter in which significant differences between the 2 raters arose could be identified. The results of our study demonstrate an almost 70% agreement in the evaluation of hospital cost accounts by 2 independently operating MDK. This result leaves room for improvement. Optimisation potentials can be recognised on the basis of the results. Potential for improvement was established in combination with regular further training and the expansion of binding internal code recommendations as well as exchange of code-relevant information among experts in internal forums. The presented model is in principle suitable for cross-border examinations within the MDK system with the advantage that

  17. Qualitative soil moisture assessment in semi-arid Africa - the role of experience and training on inter-rater reliability

    Science.gov (United States)

    Rinderer, M.; Komakech, H. C.; Müller, D.; Wiesenberg, G. L. B.; Seibert, J.

    2015-08-01

    Soil and water management is particularly relevant in semi-arid regions to enhance agricultural productivity. During periods of water scarcity, soil moisture differences are important indicators of the soil water deficit and are traditionally used for allocating water resources among farmers of a village community. Here we present a simple, inexpensive soil wetness classification scheme based on qualitative indicators which one can see or touch on the soil surface. It incorporates the local farmers' knowledge on the best soil moisture conditions for seeding and brick making in the semi-arid environment of the study site near Arusha, Tanzania. The scheme was tested twice in 2014 with farmers, students and experts (April: 40 persons, June: 25 persons) for inter-rater reliability, bias of individuals and functional relation between qualitative and quantitative soil moisture values. During the test in April farmers assigned the same wetness class in 46 % of all cases, while students and experts agreed on about 60 % of all cases. Students who had been trained in how to apply the method gained higher inter-rater reliability than their colleagues with only a basic introduction. When repeating the test in June, participants were given improved instructions, organized in small subgroups, which resulted in a higher inter-rater reliability among farmers. In 66 % of all classifications, farmers assigned the same wetness class and the spread of class assignments was smaller. This study demonstrates that a wetness classification scheme based on qualitative indicators is a robust tool and can be applied successfully regardless of experience in crop growing and education level when an in-depth introduction and training is provided. The use of a simple and clear layout of the assessment form is important for reliable wetness class assignments.

  18. Qualitative soil moisture assessment in semi-arid Africa: the role of experience and training on inter-rater reliability

    Science.gov (United States)

    Rinderer, M.; Komakech, H.; Müller, D.; Seibert, J.

    2015-03-01

    Soil and water management is particularly relevant in semi-arid regions to enhance agricultural productivity. During periods of water scarcity soil moisture differences are important indicators of the soil water deficit and are traditionally used for allocating water resources among farmers of a village community. Here we present a simple, inexpensive soil wetness classification scheme based on qualitative indicators which one can see or touch on the soil surface. It incorporates the local farmers' knowledge on the best soil moisture conditions for seeding and brick making in the semi-arid environment of the study site near Arusha, Tanzania. The scheme was tested twice in 2014 with farmers, students and experts (April: 40 persons, June: 25 persons) for inter-rater reliability, bias of individuals and functional relation between qualitative and quantitative soil moisture values. During the test in April farmers assigned the same wetness class in 46% of all cases while students and experts agreed in about 60% of all cases. Students who had been trained in how to apply the method gained higher inter-rater reliability than their colleagues with only a basic introduction. When repeating the test in June, participants were given improved instructions, organized in small sub-groups, which resulted in a higher inter-rater reliability among farmers. In 66% of all classifications farmers assigned the same wetness class and the spread of class assignments was smaller. This study demonstrates that a wetness classification scheme based on qualitative indicators is a robust tool and can be applied successfully regardless of experience in crop growing and education level when an in-depth introduction and training is provided. The use of a simple and clear layout of the assessment form is important for reliable wetness class assignments.

  19. Reproducibility of tender point examination in chronic low back pain patients as measured by intrarater and inter-rater reliability and agreement

    DEFF Research Database (Denmark)

    Jensen, Ole Kudsk; Callesen, Jacob; Nielsen, Merete Graakjaer

    2013-01-01

    back examination and return-to-work intervention, 43 and 39 patients, respectively (18 women, 46%) entered and completed the study. MAIN OUTCOME MEASURES: The reliability was estimated by the intraclass correlation coefficient (ICC), and agreement was calculated for up to ±3 TPs. Furthermore......, the smallest detectable difference was calculated. RESULTS: TP examination was performed twice by two consultants in rheumatology and rehabilitation at 20 min intervals and repeated 1 week later. Intrarater reliability in the more and less experienced rater was ICC 0.84 (95% CI 0.69 to 0.98) and 0.72 (95% CI 0.......49 to 0.95), respectively. The figures for inter-rater reliability were intermediate between these figures. In more than 70% of the cases, the raters agreed within ±3 TPs in both men and women and between test days. The smallest detectable difference between raters was 5, and for the more and less...

  20. The intra- and inter-rater reliability of five clinical muscle performance tests in patients with and without neck pain

    Science.gov (United States)

    2013-01-01

    Background This study investigates the reliability of muscle performance tests using cost- and time-effective methods similar to those used in clinical practice. When conducting reliability studies, great effort goes into standardising test procedures to facilitate a stable outcome. Therefore, several test trials are often performed. However, when muscle performance tests are applied in the clinical setting, clinicians often only conduct a muscle performance test once as repeated testing may produce fatigue and pain, thus variation in test results. We aimed to investigate whether cervical muscle performance tests, which have shown promising psychometric properties, would remain reliable when examined under conditions similar to those of daily clinical practice. Methods The intra-rater (between-day) and inter-rater (within-day) reliability was assessed for five cervical muscle performance tests in patients with (n = 33) and without neck pain (n = 30). The five tests were joint position error, the cranio-cervical flexion test, the neck flexor muscle endurance test performed in supine and in a 45°-upright position and a new neck extensor test. Results Intra-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.48-0.82), the cranio-cervical flexion test (ICC ≥ 0.69), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.68) and in a 45°-upright position (ICC ≥ 0.41) with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement (ICC = 0.14-0.41). Likewise, inter-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.51-0.75), the cranio-cervical flexion test (ICC ≥ 0.85), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.70) and in a 45°-upright position (ICC ≥ 0.56). However, only slight to fair agreement was found for the neck extensor test (ICC

  1. Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

    Science.gov (United States)

    van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M.

    2018-01-01

    In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…

  2. The Bath metrology index as assessed by a trained and an untrained rater in patients with spondylarthropathy: a study of intra- and inter-rater agreements

    DEFF Research Database (Denmark)

    Madsen, O R; Hansen, L B; Rytter, A

    2008-01-01

    -rater and inter-rater reproducibility of BASMI scoring in 30 Danish patients with SpA (median age 40 years, range 22-56 years) fulfilling the European Spondylarthropathy Study Group criteria, 25 of them satisfying the modified New York Criteria for AS. Measurements were performed twice on two different days......The Bath ankylosing spondylitis metrology index (BASMI; range 0-10) has gained widespread use in daily clinical practice as an objective measure of spinal stiffness not only in patients with ankylosing spondylitis (AS) but also in patients with other spondylarthropathies (SpA). We examined intra...... (median interval 7 days, range 4-11) by a trained physiotherapist (PT) and by an untrained nurse who had undergone a single 1-h training session with the PT. The median BASMI score obtained by the PT on the two test days was 3.5 (range 1-8) and 3.0 (range 1-8), respectively (NS). Test-retest BASMI scores...

  3. The Pooling-score (P-score): inter- and intra-rater reliability in endoscopic assessment of the severity of dysphagia.

    Science.gov (United States)

    Farneti, D; Fattori, B; Nacci, A; Mancini, V; Simonelli, M; Ruoppolo, G; Genovese, E

    2014-04-01

    This study evaluated the intra- and inter-rater reliability of the Pooling score (P-score) in clinical endoscopic evaluation of severity of swallowing disorder, considering excess residue in the pharynx and larynx. The score (minimum 4 - maximum 11) is obtained by the sum of the scores given to the site of the bolus, the amount and ability to control residue/bolus pooling, the latter assessed on the basis of cough, raclage, number of dry voluntary or reflex swallowing acts ( 5). Four judges evaluated 30 short films of pharyngeal transit of 10 solid (1/4 of a cracker), 11 creamy (1 tablespoon of jam) and 9 liquid (1 tablespoon of 5 cc of water coloured with methlyene blue, 1 ml in 100 ml) boluses in 23 subjects (10 M/13 F, age from 31 to 76 yrs, mean age 58.56±11.76 years) with different pathologies. The films were randomly distributed on two CDs, which differed in terms of the sequence of the films, and were given to judges (after an explanatory session) at time 0, 24 hours later (time 1) and after 7 days (time 2). The inter- and intra-rater reliability of the P-score was calculated using the intra-class correlation coefficient (ICC; 3,k). The possibility that consistency of boluses could affect the scoring of the films was considered. The ICC for site, amount, management and the P-score total was found to be, respectively, 0.999, 0.997, 1.00 and 0.999. Clinical evaluation of a criterion of severity of a swallowing disorder remains a crucial point in the management of patients with pathologies that predispose to complications. The P-score, derived from static and dynamic parameters, yielded a very high correlation among the scores attributed by the four judges during observations carried out at different times. Bolus consistencies did not affect the outcome of the test: the analysis of variance, performed to verify if the scores attributed by the four judges to the parameters selected, might be influenced by the different consistencies of the boluses, was not

  4. Dental examiners consistency in applying the ICDAS criteria for a caries prevention community trial.

    Science.gov (United States)

    Nelson, S; Eggertsson, H; Powell, B; Mandelaris, J; Ntragatakis, M; Richardson, T; Ferretti, G

    2011-09-01

    To examine dental examiners' one-year consistency in utilizing the International Caries Detection and Assessment System (ICDAS) criteria after baseline training and calibration. A total of three examiners received baseline training/calibration by a "gold standard" examiner, and one year later re-calibration was conducted. For the baseline training/calibration, subjects aged 8-16 years, and for the re-calibration subjects aged five to six years were recruited for the study. The ICDAS criteria were used to classify visual caries lesion severity (0-6 scale), lesion activity (active/inactive), and presence of filling material (0-9 scale) of all available tooth surfaces of permanent and primary teeth. The examination used a clinical light, mirror and air syringe. Kappa (weighted: Wkappa, unweighted: Kappa) statistics were used to determine inter-and intra-examiner reliability at baseline and re-calibration. For lesion severity and filling criteria, the baseline calibration on 35 subjects indicated an inter-rater Wkappa ranging from 0.69-0.92 and intra-rater Wkappa ranging from 0.81-0.92. Re-calibration on 22 subjects indicated an inter-rater Wkappa of 0.77-0.98 and intra-rater Wkappa ranged from 0.93-1.00. The Wkappa for filling was consistently in the excellent range, while lesion severity was in the good to excellent range. Activity kappa was in the poor to good range. All examiners improved with time. The baseline training/calibration in ICDAS was crucial to maintain the stability of the examiners reliability over a one year period. The ICDAS can be an effective assessment tool for community-based clinical trials.

  5. Validity and inter-rater reliability of medio-lateral knee motion observed during a single-limb mini squat

    Directory of Open Access Journals (Sweden)

    Simic Milena

    2010-11-01

    Full Text Available Abstract Background Muscle function may influence the risk of knee injury and outcomes following injury. Clinical tests, such as a single-limb mini squat, resemble conditions of daily life and are easy to administer. Fewer squats per 30 seconds indicate poorer function. However, the quality of movement, such as the medio-lateral knee motion may also be important. The aim was to validate an observational clinical test of assessing the medio-lateral knee motion, using a three-dimensional (3-D motion analysis system. In addition, the inter-rater reliability was evaluated. Methods Twenty-five (17 women non-injured participants (mean age 25.6 years, range 18-37 were included. Visual analysis of the medio-lateral knee motion, scored as knee-over-foot or knee-medial-to-foot by two raters, and 3-D kinematic data were collected simultaneously during a single-limb mini squat. Frontal plane 2-D peak tibial, thigh, and knee varus-valgus angles, and 3-D peak hip internal-external rotation, and knee varus-valgus angles were calculated. Results Ten subjects were scored as having a knee-medial-to-foot position and 15 subjects a knee-over-foot position assessed by visual inspection. In 2-D, the peak tibial angle (mean 89.0 (SE 0.7 vs mean 86.3 (SE 0.4 degrees, p = 0.001 and peak thigh angle (mean 77.4 (SE 1.0 vs mean 81.2 (SE 0.5 degrees, p = 0.001 with respect to the horizontal, indicated that the knee was more medially placed than the ankle and thigh, respectively. Thus, the knee was in more valgus (mean 11.6 (SE 1.5 vs 5.0 (SE 0.8 degrees, p 0.90 and 96 between raters. Conclusions Medio-lateral motion of the knee can reliably be assessed during a single-leg mini-squat. The test is valid in 2-D, while the actual movement, in 3-D, is mainly exhibited as increased internal hip rotation. The single-limb mini squat is feasible and easy to administer in the clinical setting and in research to address lower extremity movement quality.

  6. Face validity and inter-rater reliability of the Danish version of the modified-Yale Preoperative Anxiety Scale

    DEFF Research Database (Denmark)

    Skovby, Pernille; Rask, Charlotte Ulrikka; Dall, Rolf

    2014-01-01

    -YPAS to Danish cultural and linguistic conditions and to test face validity and inter-reliability in a clinical setting. Materials and methods The translation was performed in accordance with WHO guidelines. Face validity as well as linguistic difficulties of the Danish version was tested and solved in a focus...... of the m-YPAS as suitable and relevant, i.e. the face validity satisfactory. Inter-rater reliability analysis revealed that inter-observer agreement at induction 1 were good to very good (kw: 0.63–0.98) and at induction 2, the agreement was good to very good (kw: 0.72–0.96). ICC for the overall weighted...... anxiety score was in: induction 1:0.92 and induction 2: 0.92 Conclusion Standardized and validated assessment tools are needed to evaluate interventions aiming to reduce preoperative anxiety in children. The Danish m-YPAS had a satisfactory face validity and inter-reliability, based on a minor empirical...

  7. Inter-rater agreement of comorbid DSM-IV personality disorders in substance abusers

    Directory of Open Access Journals (Sweden)

    Thylstrup Birgitte

    2008-05-01

    Full Text Available Abstract Background Little is known about the inter-rater agreement of personality disorders in clinical settings. Methods Clinicians rated 75 patients with substance use disorders on the DSM-IV criteria of personality disorders in random order, and on rating scales representing the severity of each. Results Convergent validity agreement was moderate (range for r = 0.55, 0.67 for cluster B disorders rated with DSM-IV criteria, and discriminant validity was moderate for eight of the ten personality disorders. Convergent validity of the rating scales was only moderate for antisocial and narcissistic personality disorder. Discussion Dimensional ratings may be used in research studies and clinical practice with some caution, and may be collected as one of several sources of information to describe the personality of a patient.

  8. Inter-rater and intrarater reliability of the South African Triage Scale in low-resource settings of Haiti and Afghanistan.

    Science.gov (United States)

    Dalwai, Mohammed; Tayler-Smith, Katie; Twomey, Michèle; Nasim, Masood; Popal, Abdul Qayum; Haqdost, Waliul Haq; Gayraud, Olivia; Cheréstal, Sophia; Wallis, Lee; Valles, Pola

    2018-03-16

    The South African Triage Scale (SATS) has demonstrated good validity in the EDs of Médecins Sans Frontières (MSF)-supported sites in Afghanistan and Haiti; however, corresponding reliability in these settings has not yet been reported on. This study set out to assess the inter-rater and intrarater reliability of the SATS in four MSF-supported EDs in Afghanistan and Haiti (two trauma-only EDs and two mixed (including both medical and trauma cases) EDs). Under classroom conditions between December 2013 and February 2014, ED nurses at each site assigned triage ratings to a set of context-specific vignettes (written case reports of ED patients). Inter-rater reliability was assessed by comparing triage ratings among nurses; intrarater reliability was assessed by asking the nurses to retriage 10 random vignettes from the original set and comparing these duplicate ratings. Inter-rater reliability was calculated using the unweighted kappa, linearly weighted kappa and quadratically weighted kappa (QWK) statistics, and the intraclass correlation coefficient (ICC). Intrarater reliability was calculated according to the percentage of exact agreement and the percentage of agreement allowing for one level of discrepancy in triage ratings. The correlation between years of nursing experience and reliability of the SATS was assessed based on comparison of ICCs and the respective 95% CIs. A total of 67 nurses agreed to participate in the study: In Afghanistan there were 19 nurses from Kunduz Trauma Centre and nine from Ahmed Shah Baba; in Haiti, there were 20 nurses from Martissant Emergency Centre and 19 from Tabarre Surgical and Trauma Centre. Inter-rater agreement was moderate across all sites (ICC range: 0.50-0.60; QWK range: 0.50-0.59) apart from the trauma ED in Haiti where it was moderate to substantial (ICC: 0.58; QWK: 0.61). Intrarater agreement was similar across the four sites (68%-74% exact agreement); when allowing for a one-level discrepancy in triage ratings

  9. Nonspecialist Raters Can Provide Reliable Assessments of Procedural Skills

    DEFF Research Database (Denmark)

    Mahmood, Oria; Dagnæs, Julia; Bube, Sarah

    2018-01-01

    was significant (p Pearson's correlation of 0.77 for the nonspecialists and 0.75 for the specialists. The test-retest reliability showed the biggest difference between the 2 groups, 0.59 and 0.38 for the nonspecialist raters and the specialist raters, respectively (p ... was chosen as it is a simple procedural skill that is crucial to master in a resident urology program. RESULTS: The internal consistency of assessments was high, Cronbach's α = 0.93 and 0.95 for nonspecialist and specialist raters, respectively (p correlations). The interrater reliability...

  10. Six of one, half a dozen of the other: A measure of multidisciplinary inter/intra-rater reliability of the society for fetal urology and urinary tract dilation grading systems for hydronephrosis.

    Science.gov (United States)

    Rickard, Mandy; Easterbrook, Bethany; Kim, Soojin; Farrokhyar, Forough; Stein, Nina; Arora, Steven; Belostotsky, Vladamir; DeMaria, Jorge; Lorenzo, Armando J; Braga, Luis H

    2017-02-01

    The urinary tract dilation (UTD) classification system was introduced to standardize terminology in the reporting of hydronephrosis (HN), and bridge a gap between pre- and postnatal classification such as the Society for Fetal Urology (SFU) grading system. Herein we compare the intra/inter-rater reliability of both grading systems. SFU (I-IV) and UTD (I-III) grades were independently assigned by 13 raters (9 pediatric urology staff, 2 nephrologists, 2 radiologists), twice, 3 weeks apart, to 50 sagittal postnatal ultrasonographic views of hydronephrotic kidneys. Data regarding ureteral measurements and bladder abnormalities were included to allow proper UTD categorization. Ten images were repeated to assess intra-rater reliability. Krippendorff's alpha coefficient was used to measure overall and by grade intra/inter-rater reliability. Reliability between specialties and training levels were also analyzed. Overall inter-rater reliability was slightly higher for SFU (α = 0.842, 95% CI 0.812-0.879, in session 1; and α = 0.808, 95% CI 0.775-0.839, in session 2) than for UTD (α = 0.774, 95% CI 0.715-0.827, in session 1; and α = 0.679, 95% CI 0.605-0.750, in session 2). Reliability for intermediate grades (SFU II/III and UTD 2) of HN was poor regardless of the system. Reliabilities for SFU and UTD classifications among Urology, Nephrology, and Radiology, as well as between training levels were not significantly different. Despite the introduction of HN grading systems to standardize the interpretation and reporting of renal ultrasound in infants with HN, none have been proven superior in allowing clinicians to distinguish between "moderate" grades. While this study demonstrated high reliability in distinguishing between "mild" (SFU I/II and UTD 1) and "severe" (SFU IV and UTD 3) grades of HN, the overall reliability between specialties was poor. This is in keeping with a previous report of modest inter-rater reliability of the SFU system. This drawback is

  11. Virtual Raters for Reproducible and Objective Assessments in Radiology

    Science.gov (United States)

    Kleesiek, Jens; Petersen, Jens; Döring, Markus; Maier-Hein, Klaus; Köthe, Ullrich; Wick, Wolfgang; Hamprecht, Fred A.; Bendszus, Martin; Biller, Armin

    2016-04-01

    Volumetric measurements in radiologic images are important for monitoring tumor growth and treatment response. To make these more reproducible and objective we introduce the concept of virtual raters (VRs). A virtual rater is obtained by combining knowledge of machine-learning algorithms trained with past annotations of multiple human raters with the instantaneous rating of one human expert. Thus, he is virtually guided by several experts. To evaluate the approach we perform experiments with multi-channel magnetic resonance imaging (MRI) data sets. Next to gross tumor volume (GTV) we also investigate subcategories like edema, contrast-enhancing and non-enhancing tumor. The first data set consists of N = 71 longitudinal follow-up scans of 15 patients suffering from glioblastoma (GB). The second data set comprises N = 30 scans of low- and high-grade gliomas. For comparison we computed Pearson Correlation, Intra-class Correlation Coefficient (ICC) and Dice score. Virtual raters always lead to an improvement w.r.t. inter- and intra-rater agreement. Comparing the 2D Response Assessment in Neuro-Oncology (RANO) measurements to the volumetric measurements of the virtual raters results in one-third of the cases in a deviating rating. Hence, we believe that our approach will have an impact on the evaluation of clinical studies as well as on routine imaging diagnostics.

  12. Inter-rater reliability of assessment of levator ani muscle strength and attachment to the pubic bone in nulliparous women.

    Science.gov (United States)

    van Delft, K; Schwertner-Tiepelmann, N; Thakar, R; Sultan, A H

    2013-09-01

    The modified Oxford scale (MOS) has been found previously to have poor inter-rater reliability, whereas digital assessment of levator ani muscle (LAM) attachment to the pubic bone has been shown to have acceptable reliability. Our aim was to evaluate inter-rater reliability of the validated MOS and to develop a reliable classification system for digital assessment of LAM attachment, correlating this to findings on transperineal ultrasound (TPUS) examination. Evaluation of the MOS by palpation was performed in nulliparous women by two investigators. LAM attachment was evaluated using digital palpation, for which a novel classification system was developed with four grades based on the position of the attachment and presence of discernible muscle. Findings were compared with those on TPUS examination. Inter-rater reliability was assessed using Cohen's kappa statistic. Twenty-five nulliparous women were examined. There was agreement in MOS scores between the investigators in 64% of women (n = 16), with a kappa of 0.66 (indicating substantial agreement). There was agreement in palpation of LAM attachment using the new grading system in 96% of women (n = 24), with a kappa of 0.90 (indicating almost perfect agreement). TPUS examination did not show LAM avulsion in any woman, with the exception of one with a partial avulsion. In this group of nulliparous patients, there was substantial agreement between the two investigators in evaluation of the MOS and there was good agreement between grades of LAM attachment using the new classification system, which correlated with findings on TPUS examination. It therefore appears that these results are reproducible in nulliparous women and the techniques can be readily learned and reliably incorporated into clinical practice and research after appropriate training. Further research is required to establish clinical utility of the grading system for LAM attachment in postpartum women and in women with symptomatic pelvic organ

  13. Validation and inter-rater reliability of a three item falls risk screening tool

    Directory of Open Access Journals (Sweden)

    Catherine Maree Said

    2017-11-01

    Full Text Available Abstract Background Falls screening tools are routinely used in hospital settings and the psychometric properties of tools should be examined in the setting in which they are used. The aim of this study was to explore the concurrent and predictive validity of the Austin Health Falls Risk Screening Tool (AHFRST, compared with The Northern Hospital Modified St Thomas’s Risk Assessment Tool (TNH-STRATIFY, and the inter-rater reliability of the AHFRST. Methods A research physiotherapist used the AHFRST and TNH-STRATIFY to classify 130 participants admitted to Austin Health (five acute wards, n = 115 two subacute wards n = 15; median length of stay 6 days IQR 3–12 as ‘High’ or ‘Low’ falls risk. The AHFRST was also completed by nursing staff on patient admission. Falls data was collected from the hospital incident reporting system. Results Six falls occurred during the study period (fall rate of 4.6 falls per 1000 bed days. There was substantial agreement between the AHFRST and the TNH-STRATIFY (Kappa = 0.68, 95% CI 0.52–0.78. Both tools had poor predictive validity, with low specificity (AHFRST 46.0%, 95% CI 37.0–55.1; TNH-STRATIFY 34.7%, 95% CI 26.4–43.7 and positive predictive values (AHFRST 5.6%, 95% CI 1.6–13.8; TNH-STRATIFY 6.9%, 95% CI 2.6–14.4. The AHFRST showed moderate inter-rater reliability (Kappa = 0.54, 95% CI = 0.36–0.67, p < 0.001 although 18 patients did not have the AHFRST completed by nursing staff. Conclusions There was an acceptable level of agreement between the 3 item AHFRST classification of falls risk and the longer, 9 item TNH-STRATIFY classification. However, both tools demonstrated limited predictive validity in the Austin Health population. The results highlight the importance of evaluating the validity of falls screening tools, and the clinical utility of these tools should be reconsidered.

  14. Rater reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS).

    Science.gov (United States)

    Baker, Nancy A; Cook, James R; Redfern, Mark S

    2009-01-01

    This paper describes the inter-rater and intra-rater reliability, and the concurrent validity of an observational instrument, the Keyboard Personal Computer Style instrument (K-PeCS), which assesses stereotypical postures and movements associated with computer keyboard use. Three trained raters independently rated the video clips of 45 computer keyboard users to ascertain inter-rater reliability, and then re-rated a sub-sample of 15 video clips to ascertain intra-rater reliability. Concurrent validity was assessed by comparing the ratings obtained using the K-PeCS to scores developed from a 3D motion analysis system. The overall K-PeCS had excellent reliability [inter-rater: intra-class correlation coefficients (ICC)=.90; intra-rater: ICC=.92]. Most individual items on the K-PeCS had from good to excellent reliability, although six items fell below ICC=.75. Those K-PeCS items that were assessed for concurrent validity compared favorably to the motion analysis data for all but two items. These results suggest that most items on the K-PeCS can be used to reliably document computer keyboarding style.

  15. Inter-rater reliability of data elements from a prototype of the Paul Coverdell National Acute Stroke Registry

    Directory of Open Access Journals (Sweden)

    Wehner Susan

    2008-06-01

    Full Text Available Abstract Background The Paul Coverdell National Acute Stroke Registry (PCNASR is a U.S. based national registry designed to monitor and improve the quality of acute stroke care delivered by hospitals. The registry monitors care through specific performance measures, the accuracy of which depends in part on the reliability of the individual data elements used to construct them. This study describes the inter-rater reliability of data elements collected in Michigan's state-based prototype of the PCNASR. Methods Over a 6-month period, 15 hospitals participating in the Michigan PCNASR prototype submitted data on 2566 acute stroke admissions. Trained hospital staff prospectively identified acute stroke admissions, abstracted chart information, and submitted data to the registry. At each hospital 8 randomly selected cases were re-abstracted by an experienced research nurse. Inter-rater reliability was estimated by the kappa statistic for nominal variables, and intraclass correlation coefficient (ICC for ordinal and continuous variables. Factors that can negatively impact the kappa statistic (i.e., trait prevalence and rater bias were also evaluated. Results A total of 104 charts were available for re-abstraction. Excellent reliability (kappa or ICC > 0.75 was observed for many registry variables including age, gender, black race, hemorrhagic stroke, discharge medications, and modified Rankin Score. Agreement was at least moderate (i.e., 0.75 > kappa ≥; 0.40 for ischemic stroke, TIA, white race, non-ambulance arrival, hospital transfer and direct admit. However, several variables had poor reliability (kappa Conclusion The excellent reliability of many of the data elements supports the use of the PCNASR to monitor and improve care. However, the poor reliability for several variables, particularly time-related events in the emergency department, indicates the need for concerted efforts to improve the quality of data collection. Specific recommendations

  16. Validity and inter-rater reliability of medio-lateral knee motion observed during a single-limb mini squat

    DEFF Research Database (Denmark)

    Ageberg, Eva; Bennell, Kim L; Hunt, Michael A

    2010-01-01

    Muscle function may influence the risk of knee injury and outcomes following injury. Clinical tests, such as a single-limb mini squat, resemble conditions of daily life and are easy to administer. Fewer squats per 30 seconds indicate poorer function. However, the quality of movement, such as the ......, such as the medio-lateral knee motion may also be important. The aim was to validate an observational clinical test of assessing the medio-lateral knee motion, using a three-dimensional (3-D) motion analysis system. In addition, the inter-rater reliability was evaluated....

  17. Inter-rater reliability of the evaluation of muscular chains associated with posture alterations in scoliosis

    Directory of Open Access Journals (Sweden)

    Fortin Carole

    2012-05-01

    Full Text Available Abstract Background In the Global postural re-education (GPR evaluation, posture alterations are associated with anterior or posterior muscular chain impairments. Our goal was to assess the reliability of the GPR muscular chain evaluation. Methods Design: Inter-rater reliability study. Fifty physical therapists (PTs and two experts trained in GPR assessed the standing posture from photographs of five youths with idiopathic scoliosis using a posture analysis grid with 23 posture indices (PI. The PTs and experts indicated the muscular chain associated with posture alterations. The PTs were also divided into three groups according to their experience in GPR. Experts’ results (after consensus were used to verify agreement between PTs and experts for muscular chain and posture assessments. We used Kappa coefficients (K and the percentage of agreement (%A to assess inter-rater reliability and intra-class coefficients (ICC for determining agreement between PTs and experts. Results For the muscular chain evaluation, reliability was moderate to substantial for 12 PI for the PTs (%A: 56 to 82; K: 0.42 to 0.76 and perfect for 19 PI for the experts. For posture assessment, reliability was moderate to substantial for 12 PI for the PTs (%A > 60%; K: 0.42 to 0.75 and moderate to perfect for 18 PI for the experts (%A: 80 to 100; K: 0.55 to 1.00. The agreement between PTs and experts was good for most muscular chain evaluations (18 PI; ICC: 0.82 to 0.99 and PI (19 PI; ICC: 0.78 to 1.00. Conclusions The GPR muscular chain evaluation has good reliability for most posture indices. GPR evaluation should help guide physical therapists in targeting affected muscles for treatment of abnormal posture patterns.

  18. Inter-rater reliability of the South African Triage Scale: Assessing two different cadres of health care workers in a real time environment

    Directory of Open Access Journals (Sweden)

    Michèle Twomey

    2011-09-01

    Conclusion: The inter-rater reliability of SATS ratings is excellent within individual HCWs, but significantly lower between different HCWs. This confirms previous reliability studies of the SATS using vignettes and if validated by larger studies would support the feasibility of further implementation of the SATS in primary health care settings across the Western Cape.

  19. Workplace-based assessment: raters' performance theories and constructs.

    Science.gov (United States)

    Govaerts, M J B; Van de Wiel, M W J; Schuwirth, L W T; Van der Vleuten, C P M; Muijtjens, A M M

    2013-08-01

    Weaknesses in the nature of rater judgments are generally considered to compromise the utility of workplace-based assessment (WBA). In order to gain insight into the underpinnings of rater behaviours, we investigated how raters form impressions of and make judgments on trainee performance. Using theoretical frameworks of social cognition and person perception, we explored raters' implicit performance theories, use of task-specific performance schemas and the formation of person schemas during WBA. We used think-aloud procedures and verbal protocol analysis to investigate schema-based processing by experienced (N = 18) and inexperienced (N = 16) raters (supervisor-raters in general practice residency training). Qualitative data analysis was used to explore schema content and usage. We quantitatively assessed rater idiosyncrasy in the use of performance schemas and we investigated effects of rater expertise on the use of (task-specific) performance schemas. Raters used different schemas in judging trainee performance. We developed a normative performance theory comprising seventeen inter-related performance dimensions. Levels of rater idiosyncrasy were substantial and unrelated to rater expertise. Experienced raters made significantly more use of task-specific performance schemas compared to inexperienced raters, suggesting more differentiated performance schemas in experienced raters. Most raters started to develop person schemas the moment they began to observe trainee performance. The findings further our understanding of processes underpinning judgment and decision making in WBA. Raters make and justify judgments based on personal theories and performance constructs. Raters' information processing seems to be affected by differences in rater expertise. The results of this study can help to improve rater training, the design of assessment instruments and decision making in WBA.

  20. How reliable are Functional Movement Screening scores? A systematic review of rater reliability.

    Science.gov (United States)

    Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John

    2016-05-01

    Several physical assessment protocols to identify intrinsic risk factors for injury aetiology related to movement quality have been described. The Functional Movement Screen (FMS) is a standardised, field-expedient test battery intended to assess movement quality and has been used clinically in preparticipation screening and in sports injury research. To critically appraise and summarise research investigating the reliability of scores obtained using the FMS battery. Systematic literature review. Systematic search of Google Scholar, Scopus (including ScienceDirect and PubMed), EBSCO (including Academic Search Complete, AMED, CINAHL, Health Source: Nursing/Academic Edition), MEDLINE and SPORTDiscus. Studies meeting eligibility criteria were assessed by 2 reviewers for risk of bias using the Quality Appraisal of Reliability Studies checklist. Overall quality of evidence was determined using van Tulder's levels of evidence approach. 12 studies were appraised. Overall, there was a 'moderate' level of evidence in favour of 'acceptable' (intraclass correlation coefficient ≥0.6) inter-rater and intra-rater reliability for composite scores derived from live scoring. For inter-rater reliability of composite scores derived from video recordings there was 'conflicting' evidence, and 'limited' evidence for intra-rater reliability. For inter-rater reliability based on live scoring of individual subtests there was 'moderate' evidence of 'acceptable' reliability (κ≥0.4) for 4 subtests (Deep Squat, Shoulder Mobility, Active Straight-leg Raise, Trunk Stability Push-up) and 'conflicting' evidence for the remaining 3 (Hurdle Step, In-line Lunge, Rotary Stability). This review found 'moderate' evidence that raters can achieve acceptable levels of inter-rater and intra-rater reliability of composite FMS scores when using live ratings. Overall, there were few high-quality studies, and the quality of several studies was impacted by poor study reporting particularly in relation to

  1. Validation of the Falls Efficacy Scale – International in a sample of Portuguese elderly

    Directory of Open Access Journals (Sweden)

    Cristina Maria Alves Marques-Vieira

    Full Text Available ABSTRACT Objective: to translate and adapt Falls Efficacy Scale – International (FES-I. To analyze the psychometric properties of the FES-I Portugal version. Method: psychometric study. Sample consisting of 170 elderly people residing in the Autonomous Region of Madeira. A two- part form was used (sociodemographic characterization and FES-I Portugal. The cross-cultural adaptation was performed and the following psychometric properties were evaluated: validity (construct, predictive, and discriminant, reliability (Cronbach’s alpha, and inter-rater reliability. Results: the results allow us to verify a dimension of less demanding physical activities and another of more demanding physical activities. The inter-rater reliability study was 0.62, with an interclass correlation coefficient of 0.859, for a 95% confidence interval. The internal consistency of the Portuguese version was 0.962. Conclusion: the validity and reliability of the FES-I Portugal are consistent with the original version and proved to be appropriate instruments for evaluating the “impaired walking” and “risk of falls” nursing diagnoses in the older people.

  2. Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of traditional Chinese medicine.

    Science.gov (United States)

    Mist, Scott; Ritenbaugh, Cheryl; Aickin, Mikel

    2009-07-01

    To investigate whether a training process that focused on a questionnaire-based diagnosis in Traditional Chinese Medicine (TCM), and developing diagnostic consensus, would improve the agreement of TCM diagnoses among 10 TCM practitioners evaluating patients with temporomandibular joint disorder (TMJD). Evaluation of a diagnostic training program at the Department of Family and Community Medicine, University of Arizona, Tucson, Arizona, and the Oregon College of Oriental Medicine, Portland, Oregon. Screened participants for a study of TCM for TMJD. PRACTITIONERS: Ten (10) licensed acupuncturists with a minimum of 5 years licensure and education in Chinese herbs. A training session using a questionnaire-based diagnostic form was conducted, followed by waves of diagnostic sessions. Between sessions, practitioners discussed the results of the previous round of participants with a focus on reducing variability in primary diagnosis and severity rating of each diagnosis: 3 waves of 5 patients were assessed by 4 practitioner pairs for a total of 120 diagnoses. At 18 months, practitioners completed a recalibration exercise with a similar format with a total of 32 diagnoses. These diagnoses were then examined with respect to the rate of agreement among the 10 practitioners using inter-rater correlations and kappas. The inter-rater correlation with respect to the TCM diagnoses among the 10 practitioners increased from 0.112 to 0.618 with training. Statistically significant improvements were found between the baseline and 18 month exercises (p reliability of TCM diagnosis may be improved through a training process and a questionnaire-based diagnosis process. The improvements varied by diagnosis, with the greatest congruence among primary and more severe diagnoses. Future TCM studies should consider including calibration training to improve the validity of results.

  3. Internal consistency and construct validity of the Quality of Life in Alzheimer's Disease (QoL-AD) proxy – a secondary data analysis

    Science.gov (United States)

    Hylla, Jonas; Schwab, Christian G G; Isfort, Michael; Halek, Margareta; Dichter, Martin N

    2016-07-01

    Background: The maintenance and promotion of Quality of Life (QoL) of people with dementia is a major outcome in intervention studies and health care. The Quality of Life Alzheimer's Disease (QoL-AD) is an internationally recommended QoL measurement also available in German language. Until now, only a few results on the psychometric properties of the German QoL-AD were available. Objective: Evaluation of internal consistency and construct validity of the QoL-AD proxy. Method: A principal component analysis (secondary data analysis) of the 13 QoL-AD items was carried out based on the total sample of 234 people with dementia from nine nursing homes in Germany. Subsequently, the internal consistency of the identified factors was examined using Cronbach's alpha. Results: Two factors physical and mental health and social network were determined. Both factors explain 53 % of the total variance. The stability of both factors was validated in two sensitivity analyses. The internal consistency is good for both factors with a Cronbach's alpha of 0.88 (physical and mental health) and 0.75 (social network). Conclusion: The QoL-AD proxy allows the assessment of two relevant health-related QoL domains of people with dementia. However, in future studies especially the inter-rater reliability of the QoL-AD proxy has to be examined.

  4. Intra- and inter-rater reliabilities of measurement of ultrasound imaging for muscle thickness and pennation angle of tibialis anterior muscle in stroke patients.

    Science.gov (United States)

    Cho, Ki Hun; Lee, Hwang Jae; Lee, Wan Hee

    2017-07-01

    Dysfunction of skeletal muscle has been commonly reported in stroke patients. The purpose of this study was to investigate the intra- and inter-rater reliabilities of measurement of ultrasound imaging (USI) for pennation angle (PA) and muscle thickness (MT) of tibialis anterior muscle in stroke patients. Thirty-four stroke patients (19 men) participated in this study. USI was used for measurement of PA and MT of the tibialis anterior muscles at rest and during maximum voluntary contraction (MVC). Two examiners acquired images from all participants during two separate testing sessions, seven days apart. Intra-class correlation coefficients (ICCs), confidence interval (CI), standard error of measurement, minimal detectable change, and Bland-Altman plots were used for estimation of reliability. In the intra-rater reliability between measures, for all variables (PA and MT of the paretic and non-paretic sides of tibialis anterior muscles at rest and during MVC), the ICCs ranged between 0.639 and 0.998 and the CI was within an acceptable range of 0.388-0.999. In inter-rater reliability between examiners for the two tests, for all variables, the ICCs ranged between 0.690 and 0.995 and the CI was within an acceptable range of 0.463-0.997. In addition, significant difference was observed between the paretic and non-paretic sides of the tibialis anterior muscle architecture (p stroke patients. In addition, objective and quantitative measurements of tibialis anterior muscle using USI may provide appropriate management for the walking recovery of stroke patients.

  5. The Outdoor MEDIA DOT: The development and inter-rater reliability of a tool designed to measure food and beverage outlets and outdoor advertising.

    Science.gov (United States)

    Poulos, Natalie S; Pasch, Keryn E

    2015-07-01

    Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8-229 per school). Overall inter-rater reliability of the developed tool ranged from 69-89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Rater reliability and construct validity of a mobile application for posture analysis.

    Science.gov (United States)

    Szucs, Kimberly A; Brown, Elena V Donoso

    2018-01-01

    [Purpose] Measurement of posture is important for those with a clinical diagnosis as well as researchers aiming to understand the impact of faulty postures on the development of musculoskeletal disorders. A reliable, cost-effective and low tech posture measure may be beneficial for research and clinical applications. The purpose of this study was to determine rater reliability and construct validity of a posture screening mobile application in healthy young adults. [Subjects and Methods] Pictures of subjects were taken in three standing positions. Two raters independently digitized the static standing posture image twice. The app calculated posture variables, including sagittal and coronal plane translations and angulations. Intra- and inter-rater reliability were calculated using the appropriate ICC models for complete agreement. Construct validity was determined through comparison of known groups using repeated measures ANOVA. [Results] Intra-rater reliability ranged from 0.71 to 0.99. Inter-rater reliability was good to excellent for all translations. ICCs were stronger for translations versus angulations. The construct validity analysis found that the app was able to detect the change in the four variables selected. [Conclusion] The posture mobile application has demonstrated strong rater reliability and preliminary evidence of construct validity. This application may have utility in clinical and research settings.

  7. Construct validity and inter-rater reliability of the Dutch activity measure for post-acute care "6-clicks" basic mobility form to assess the mobility of hospitalized patients.

    Science.gov (United States)

    Geelen, Sven Jacobus Gertruda; Valkenet, Karin; Veenhof, Cindy

    2018-05-12

    To evaluate the construct validity and the inter-rater reliability of the Dutch Activity Measure for Post-Acute Care "6-clicks" Basic Mobility short form measuring the patient's mobility in Dutch hospital care. First, the "6-clicks" was translated by using a forward-backward translation protocol. Next, 64 patients were assessed by the physiotherapist to determine the validity while being admitted to the Internal Medicine wards of a university medical center. Six hypotheses were tested regarding the construct "mobility" which showed that: Better "6-clicks" scores were related to less restrictive pre-admission living situations (p = 0.011), less restrictive discharge locations (p = 0.001), more independence in activities of daily living (p = 0.001) and less physiotherapy visits (p Dutch "6-clicks" shows a good construct validity and moderate-to-excellent inter-rater reliability when used to assess the mobility of hospitalized patients. Implications for Rehabilitation Even though various measurement tools have been developed, it appears the majority of physiotherapists working in a hospital currently do not use these tools as a standard part of their care. The Activity Measure for Post-Acute Care "6-clicks" Basic Mobility is the only tool which is designed to be short, easy to use within usual care and has been validated in the entire hospital population. This study shows that the Dutch version of the Activity Measure for Post-Acute Care "6-clicks" Basic Mobility form is a valid, easy to use, quick tool to assess the basic mobility of Dutch hospitalized patients.

  8. Inter-rater variability in motor function assessment in Parkinson's disease between experts in movement disorders and nurses specialising in PD management.

    Science.gov (United States)

    de Deus Fonticoba, T; Santos García, D; Macías Arribí, M

    2017-05-23

    In clinical practice, assessing patients with Parkinson's disease (PD) is a complex, time-consuming task. Our purpose is to provide a rigorous and objective evaluation of how motor function in PD patients is assessed by neurologists specialising in movement disorders, on the one hand, and by nurses specialising in PD management, on the other. We conducted an observational, cross-sectional, single-centre study of 50 patients with PD (52% men; mean age: 64.7 ± 8.7 years) who were assessed between 5 January 2016 and 20 July 2016. A neurologist and a nurse evaluated motor function in the early morning hours using the Unified Parkinson's Disease Rating Scale (UPDRS) parts III and IV and Hoehn & Yahr (H&Y) scale. Tests were administered in the same PD periods (in 48 patients during the 'off' time and in 2 patients during the 'on' time). Inter-rater variability was estimated with the intraclass correlation coefficient (ICC). Forty-nine patients (98%) were classified in the same H&Y stage by both raters. Assessment times were similar for both raters. ICC for UPDRS-IV and UPDRS-III total scores were 0.955 (Pde Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.

  9. Daily Behavior Report Cards: An Investigation of the Consistency of On-Task Data across Raters and Methods

    Science.gov (United States)

    Chafouleas, Sandra M.; Riley-Tillman, T. Chris; Sassu, Kari A.; LaFrance, Mary J.; Patwa, Shamim S.

    2007-01-01

    In this study, the consistency of on-task data collected across raters using either a Daily Behavior Report Card (DBRC) or systematic direct observation was examined to begin to understand the decision reliability of using DBRCs to monitor student behavior. Results suggested very similar conclusions might be drawn when visually examining data…

  10. Inter-rater Reliability for Metrics Scored in a Binary Fashion-Performance Assessment for an Arthroscopic Bankart Repair.

    Science.gov (United States)

    Gallagher, Anthony G; Ryu, Richard K N; Pedowitz, Robert A; Henn, Patrick; Angelo, Richard L

    2018-05-02

    To determine the inter-rater reliability (IRR) of a procedure-specific checklist scored in a binary fashion for the evaluation of surgical skill and whether it meets a minimum level of agreement (≥0.8 between 2 raters) required for high-stakes assessment. In a prospective randomized and blinded fashion, and after detailed assessment training, 10 Arthroscopy Association of North America Master/Associate Master faculty arthroscopic surgeons (in 5 pairs) with an average of 21 years of surgical experience assessed the video-recorded 3-anchor arthroscopic Bankart repair performance of 44 postgraduate year 4 or 5 residents from 21 Accreditation Council for Graduate Medical Education orthopaedic residency training programs from across the United States. No paired scores of resident surgeon performance evaluated by the 5 teams of faculty assessors dropped below the 0.8 IRR level (mean = 0.93; range 0.84-0.99; standard deviation = 0.035). A comparison between the 5 assessor groups with 1 factor analysis of variance showed that there was no significant difference between the groups (P = .205). Pearson's product-moment correlation coefficient revealed a strong and statistically significant negative correlation, that is, -0.856 (P fashion meet the need and can show a high (>80%) IRR. Copyright © 2018 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.

  11. Medizinbibliotheken: inter:disziplinär – inter:nationalinter:aktiv

    Directory of Open Access Journals (Sweden)

    Bauer, Bruno

    2017-12-01

    Full Text Available The focus of the current issue 3/2017 of GMS Medizin – Bibliothek – Information is the annual conference 2017 of the German Medical Libraries Association in Vienna. The motto of the conference was “Medical Libraries: inter:disciplinary – inter:nationalinter:active”. The authors in this issue are Bruno Bauer (Austrian Transition to Open Access 2017–2020, Beata Górczynska (Development and structure of Polish veterinary school system and its libraries, Katharina Heldt, Henriette Senst & Jessica Riedel (Salon on the institute’s history: outstanding artifacts. 28.01.2016 to 15.12.2016, Jutta Matrisciano, Martina Semmler-Schmetz & Saskia Rohmer (Advice – From info snack to special menu: Solutions of the MedMA-Bib, Stefan Nortmann (The ‘Ersti-Café’ of the Medical Branch Library Münster, Sandra Rümmele (Toolbox: The new teaching library project of the Central Medical Library of the University Medical Center Hamburg-Eppendorf, Eva Seidlmayer & Christoph Poley (One Health – Transdisciplinarity at ZB MED and Heike Andermann (“Medical Libraries: inter:disciplinary – inter:nationalinter:active”. Annual Meeting of the German Medical Library Association (AGMB, September 25 to 27, 2017 in Vienna. Furthermore this focus issue features articles from Stefan Grün & Christoph Poley (Statistical evaluation of semantic entities from metadata and full text on German Medical Science corpora and Iris Reimann (German MLA News; Competition of the German MLA Pioneer projects in medical libraries 2017: Introduction of the winners; Competition of the German MLA (AGMB Pioneer projects in medical libraries 2018 – Announcement.

  12. Inter-rater reliability of the Sødring Motor Evaluation of Stroke patients (SMES).

    Science.gov (United States)

    Halsaa, K E; Sødring, K M; Bjelland, E; Finsrud, K; Bautz-Holter, E

    1999-12-01

    The Sødring Motor Evaluation of Stroke patients is an instrument for physiotherapists to evaluate motor function and activities in stroke patients. The rating reflects quality as well as quantity of the patient's unassisted performance within three domains: leg, arm and gross function. The inter-rater reliability of the method was studied in a sample of 30 patients admitted to a stroke rehabilitation unit. Three therapists were involved in the study; two therapists assessed the same patient on two consecutive days in a balanced design. Cohen's weighted kappa and McNemar's test of symmetry were used as measures of item reliability, and the intraclass correlation coefficient was used to express the reliability of the sumscores. For 24 out of 32 items the weighted kappa statistic was excellent (0.75-0.98), while 7 items had a kappa statistic within the range 0.53-0.74 (fair to good). The reliability of one item was poor (0.13). The intraclass correlation coefficient for the three sumscores was 0.97, 0.91 and 0.97. We conclude that the Sødring Motor Evaluation of Stroke patients is a reliable measure of motor function in stroke patients undergoing rehabilitation.

  13. Improving QST Reliability – More Raters, Tests or Occasions? A Multivariate Generalizability Study

    DEFF Research Database (Denmark)

    O'Neill, Søren; O'Neill, Lotte

    2015-01-01

    The reliability of quantitative sensory testing (QST) is affected by the error attributable to both test occasion and rater (examiner) as well as interactions between them. Most reliability studies only account for one source of error. The present study employed a fully-crossed, multivariate...... threshold, intensity, tolerance and modulation with mechanical, thermal and chemical stimuli. The classical test-retest and inter-rater reliability (0.19... procedures. Reliability was improved more by repeated testing on separate occasions opposed to repeated testing by different raters....

  14. Inter-rater variability of visual interpretation and comparison with quantitative evaluation of 11C-PiB PET amyloid images of the Japanese Alzheimer's Disease Neuroimaging Initiative (J-ADNI) multicenter study

    International Nuclear Information System (INIS)

    Yamane, Tomohiko; Ishii, Kenji; Sakata, Muneyuki; Ikari, Yasuhiko; Nishio, Tomoyuki; Ishii, Kazunari; Kato, Takashi; Ito, Kengo; Senda, Michio

    2017-01-01

    The aim of this study was to assess the inter-rater variability of the visual interpretation of 11 C-PiB PET images regarding the positivity/negativity of amyloid deposition that were obtained in a multicenter clinical research project, Japanese Alzheimer's Disease Neuroimaging Initiative (J-ADNI). The results of visual interpretation were also compared with a semi-automatic quantitative analysis using mean cortical standardized uptake value ratio to the cerebellar cortex (mcSUVR). A total of 162 11 C-PiB PET scans, including 45 mild Alzheimer's disease, 60 mild cognitive impairment, and 57 normal cognitive control cases that had been acquired as J-ADNI baseline scans were analyzed. Based on visual interpretation by three independent raters followed by consensus read, each case was classified into positive, equivocal, and negative deposition (ternary criteria) and further dichotomized by merging the former two (binary criteria). Complete agreement of visual interpretation by the three raters was observed for 91.3% of the cases (Cohen κ = 0.88 on average) in ternary criteria and for 92.3% (κ = 0.89) in binary criteria. Cases that were interpreted as visually positive in the consensus read showed significantly higher mcSUVR than those visually negative (2.21 ± 0.37 vs. 1.27 ± 0.09, p < 0.001), and positive or negative decision by visual interpretation was dichotomized by a cut-off value of mcSUVR = 1.5. Significant positive/negative associations were observed between mcSUVR and the number of raters who evaluated as positive (ρ = 0.87, p < 0.0001) and negative (ρ = -0.85, p < 0.0001) interpretation. Cases of disagreement among raters showed generally low mcSUVR. Inter-rater agreement was almost perfect in 11 C-PiB PET scans. Positive or negative decision by visual interpretation was dichotomized by a cut-off value of mcSUVR = 1.5. As some cases of disagreement among raters tended to show low mcSUVR, referring to quantitative method may facilitate

  15. Inter-rater reliability of the German version of the Nurses' Global Assessment of Suicide Risk scale.

    Science.gov (United States)

    Kozel, Bernd; Grieser, Manuela; Abderhalden, Christoph; Cutcliffe, John R

    2016-10-01

    In comparison to the general population, the suicide rates of psychiatric inpatient populations in Germany and Switzerland are very high. An important preventive contribution to the lowering of the suicide rates in mental health care is to ensure that the risk of suicide of psychiatric inpatients is assessed as accurately as possible. While risk-assessment instruments can serve an important function in determining such risk, very few have been translated to German. Therefore, in the present study, we reported on the German version of Nurses' Global Assessment of Suicide Risk (NGASR) scale. After translating the original instrument into German and pretesting the German version, we tested the inter-rater reliability of the instrument. Twelve video case studies were evaluated by 13 raters with the NGASR scale in a 'laboratory' trial. In each case, the observer's agreement was calculated for the single items, the overall scale, the risk levels, and the sum scores. The statistical data analysis was conducted with kappa and AC1 statistics for dichotomous (items, scale) scales. A high-to-very high observers' agreement (AC1: 0.62-1.00, kappa: 0.00-1.00) was determined for 16 items of the German version of the NGASR scale. We conclude that the German version of the NGASR scale is a reliable instrument for evaluating risk factors for suicide. A reliable application in the clinical practise appears to be enhanced by training in the use of the instrument and the right implementation instructions. © 2016 Australian College of Mental Health Nurses Inc.

  16. Vibration Response Imaging: evaluation of rater agreement in healthy subjects and subjects with pneumonia

    International Nuclear Information System (INIS)

    Bartziokas, Konstantinos; Daenas, Christos; Preau, Sebastien; Zygoulis, Paris; Triantaris, Apostolos; Kerenidi, Theodora; Makris, Demosthenes; Gourgoulianis, Konstantinos I; Daniil, Zoe

    2010-01-01

    We evaluated pulmonologists variability in the interpretation of Vibration response imaging (VRI) obtained from healthy subjects and patients hospitalized for community acquired pneumonia. The present is a prospective study conducted in a tertiary university hospital. Twenty healthy subjects and twenty three pneumonia cases were included in this study. Six pulmonologists blindly analyzed images of normal subjects and pneumonia cases and evaluated different aspects of VRI images related to the quality of data aquisition, synchronization of the progression of breath sound distribution and agreement between the maximal energy frame (MEF) of VRI (which is the maximal geographical area of lung vibrations produced at maximal inspiration) and chest radiography. For qualitative assessment of VRI images, the raters' evaluations were analyzed by degree of consistency and agreement. The average value for overall identical evaluations of twelve features of the VRI image evaluation, ranged from 87% to 95% per rater (94% to 97% in control cases and from 79% to 93% per rater in pneumonia cases). Inter-rater median (IQR) agreement was 91% (82-96). The level of agreement according to VRI feature evaluated was in most cases over 80%; intra-class correlation (ICC) obtained by using a model of subject/rater for the averaged features was overall 0.86 (0.92 in normal and 0.73 in pneumonia cases). Our findings suggest good agreement in the interpretation of VRI data between different raters. In this respect, VRI might be helpful as a radiation free diagnostic tool for the management of pneumonia

  17. Vibration Response Imaging: evaluation of rater agreement in healthy subjects and subjects with pneumonia

    Directory of Open Access Journals (Sweden)

    Makris Demosthenes

    2010-03-01

    Full Text Available Abstract Background We evaluated pulmonologists variability in the interpretation of Vibration response imaging (VRI obtained from healthy subjects and patients hospitalized for community acquired pneumonia. Methods The present is a prospective study conducted in a tertiary university hospital. Twenty healthy subjects and twenty three pneumonia cases were included in this study. Six pulmonologists blindly analyzed images of normal subjects and pneumonia cases and evaluated different aspects of VRI images related to the quality of data aquisition, synchronization of the progression of breath sound distribution and agreement between the maximal energy frame (MEF of VRI (which is the maximal geographical area of lung vibrations produced at maximal inspiration and chest radiography. For qualitative assessment of VRI images, the raters' evaluations were analyzed by degree of consistency and agreement. Results The average value for overall identical evaluations of twelve features of the VRI image evaluation, ranged from 87% to 95% per rater (94% to 97% in control cases and from 79% to 93% per rater in pneumonia cases. Inter-rater median (IQR agreement was 91% (82-96. The level of agreement according to VRI feature evaluated was in most cases over 80%; intra-class correlation (ICC obtained by using a model of subject/rater for the averaged features was overall 0.86 (0.92 in normal and 0.73 in pneumonia cases. Conclusions Our findings suggest good agreement in the interpretation of VRI data between different raters. In this respect, VRI might be helpful as a radiation free diagnostic tool for the management of pneumonia.

  18. Assessing and quantifying inter-rater variation for dichotomous ratings using a Rasch model

    DEFF Research Database (Denmark)

    Petersen, Jørgen Holm; Larsen, Klaus; Kreiner, Svend

    2010-01-01

    quantifying the rater variation as a suitable measure of the variation of the rater odds ratios. An important example that will serve to motivate and illustrate the proposed model, is the study of Umbilical artery Doppler velocimetry used by obstetricians to assess the status of a foetus. The purpose...... of the assessment is to improve the foetus' chance of survival by choosing the optimal time of elective delivery. In the study, data related to 139 perinatal deaths were sent to 32 experts who were asked whether the use of Doppler velocimetry might have prevented each death....

  19. Inter-Rater Reliability and Agreement of the 6-Minute Walk Test in Women With Hip Fracture

    DEFF Research Database (Denmark)

    Larsen, Camilla Marie; Overgaard, Jan; Tange Kristensen, Morten

    MWT in individuals with hip fractures. Methods: Two senior physiotherapy students independently examined (randomized order) a convenient sample of 20 participants; their assessments were separated by two days, and testing followed instructions from the American Thoracic Society(1). Hip pain...... was assessed with the Verbal Ranking Scale. Results: Participants (all women) with a mean (SD) age of 78.1 ± 5.9 years performed the test within a mean of 31.5 ± 5.8 days post-surgery; 10 had a cervical and 10 a trochanteric fracture. Excellent inter-rater reliability; ICC2.1 =0.92 (95% CI, 0.81 - 0...... = -0.196, P = 0.41). On the contrary, participants walked a mean of 21.7 ± 22.6 meters longer, at the second trial (P = 0.002). Participants with moderate hip fracture- related pain walked a shorter distance than those with no or light pain during the first test (P = 0.04), while this was not the case...

  20. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs.

    Science.gov (United States)

    Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne

    2014-01-01

    This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent-teacher and 19 mother-father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent-teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother-father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings.

  1. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs

    Science.gov (United States)

    Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne

    2014-01-01

    This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent–teacher and 19 mother–father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent–teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother–father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings. PMID:24994985

  2. Inter-Rater Reliability of Historical Data Collected by Non-Medical Research Assistants and Physicians in Patients with Acute Abdominal Pain

    Directory of Open Access Journals (Sweden)

    Mills, Angela M

    2009-02-01

    Full Text Available OBJECTIVES: In many academic emergency departments (ED, physicians are asked to record clinical data for research that may be time consuming and distracting from patient care. We hypothesized that non-medical research assistants (RAs could obtain historical information from patients with acute abdominal pain as accurately as physicians.METHODS: Prospective comparative study conducted in an academic ED of 29 RAs to 32 resident physicians (RPs to assess inter-rater reliability in obtaining historical information in abdominal pain patients. Historical features were independently recorded on standardized data forms by a RA and RP blinded to each others' answers. Discrepancies were resolved by a third person (RA who asked the patient to state the correct answer on a third questionnaire, constituting the "criterion standard." Inter-rater reliability was assessed using kappa statistics (kappa and percent crude agreement (CrA.RESULTS: Sixty-five patients were enrolled (mean age 43. Of 43 historical variables assessed, the median agreement was moderate (kappa 0.59 [Interquartile range 0.37-0.69]; CrA 85.9% and varied across data categories: initial pain location (kappa 0.61 [0.59-0.73]; CrA 87.7%, current pain location (kappa 0.60 [0.47-0.67]; CrA 82.8%, past medical history (kappa 0.60 [0.48-0.74]; CrA 93.8%, associated symptoms (kappa 0.38 [0.37-0.74]; CrA 87.7%, and aggravating/alleviating factors (kappa 0.09 [-0.01-0.21]; CrA 61.5%. When there was disagreement between the RP and the RA, the RA more often agreed with the criterion standard (64% [55-71%] than the RP (36% [29-45%].CONCLUSION: Non-medical research assistants who focus on clinical research are often more accurate than physicians, who may be distracted by patient care responsibilities, at obtaining historical information from ED patients with abdominal pain.

  3. Inter-rater reliability of postnatal ultrasound interpretation in infants with congenital hydronephrosis.

    Science.gov (United States)

    Vemulakonda, V M; Wilcox, D T; Torok, M R; Hou, A; Campbell, J B; Kempe, A

    2015-09-01

    The most common measurements of hydronephrosis are the anterior-posterior (AP) diameter and the Society for Fetal Urology (SFU) grading systems. To date, the inter-rater reliability (IRR) of these measures has not been compared in the postnatal period. The objectives of this study were to compare the IRR of the AP diameter and the SFU grading system in infants and to determine whether ultrasound findings other than pelvicalyceal dilation are associated with higher SFU grades. Initial postnatal ultrasounds of infants seen from February 1, 2011, to January 31, 2012, with a primary diagnosis of congenital hydronephrosis were included for review. Ultrasound images were de-identified and reviewed by four pediatric urologists. IRR was calculated using the intraclass correlation (ICC) measure. A paired t test was used to compare ICCs. Associations between SFU grade and other ultrasound findings were tested using Chi-square or Fisher's exact tests. A total of 112 kidneys in 56 patients were reviewed. IRR of the SFU grading system was high (right kidney ICC = 0.83, left kidney ICC = 0.85); however, IRR of AP diameter measurement was higher (right kidney ICC = 00.97, left kidney ICC = 0.98; p hydronephrosis on bivariable and multivariable analysis. The SFU grading system is associated with excellent IRR, although the AP diameter appears to have higher IRR. Physicians may consider ultrasound findings that are not explicitly included in the SFU system when assigning hydronephrosis grade, which may lead to variability in use of this classification system.

  4. Rater agreement in lung scintigraphy

    International Nuclear Information System (INIS)

    Christiansen, F.; Andersson, T.; Rydman, H.; Qvarner, N.; Maare, K.

    1996-01-01

    Purpose: The PIOPED criteria in their original and revised forms are today's standards in the interpretation of ventilation-perfusion scintigraphy. When the PIOPED criteria are used by experienced raters with training in consensus interpretation, the agreement rates have been demonstrated to be excellent. Our purpose was to investigate the rates of agreement between 2 experienced raters from different hospitals who had no training in consensus interpretation. Material and Methods: The 2 raters investigated a population of 195 patients. This group included 72 patients from a previous study who had an intermediate probability of pulmonary embolism and who had also been examined by pulmonary angiography. Results: The results demonstrated moderate agreement rates with a kappa value of 0.54 (0.45-0.63 in a 95% confidence interval), which is similar to the kappa value of the PIOPED study but significantly lower than the kappa values of agreement rates among consensus-trained raters. There was a low consistency in the intermediate probability category, with a proportional agreement rate of 0.39 between the experienced raters. Conclusion: The moderate agreement rates between raters from different hospitals make it difficult to compare study populations of a certain scintigraphic category in different hospitals. Further investigations are mandatory for accurate diagnosis when the scintigrams are in the category of intermediate probability of pulmonary embolism. (orig.)

  5. Validating the Danish adaptation of the World Health Organization's International Classification for Patient Safety classification of patient safety incident types

    DEFF Research Database (Denmark)

    Mikkelsen, Kim Lyngby; Thommesen, Jacob; Andersen, Henning Boje

    2013-01-01

    Objectives Validation of a Danish patient safety incident classification adapted from the World Health Organizaton's International Classification for Patient Safety (ICPS-WHO). Design Thirty-three hospital safety management experts classified 58 safety incident cases selected to represent all types.......513 (range: 0.193–0.804). Kappa and ICC showed high correlation (r = 0.99). An inverse correlation was found between the prevalence of type and inter-rater reliability. Results are discussed according to four factors known to determine the inter-rater agreement: skill and motivation of raters; clarity...

  6. Intra and inter-rater reliability of infrared image analysis of masticatory and upper trapezius muscles in women with and without temporomandibular disorder Confiabilidade intra e interexaminador da análise de imagens infravermelhas dos músculos mastigatórios e trapézio superior em mulheres com e sem disfunção temporomandibular

    Directory of Open Access Journals (Sweden)

    Ana C. S Costa

    2013-02-01

    Full Text Available BACKGROUND: Infrared thermography is an aid tool that can be used to evaluate several pathologies given its efficiency in analyzing the distribution of skin surface temperature. OBJECTIVES: To propose two forms of infrared image analysis of the masticatory and upper trapezius muscles, and to determine the intra and inter-rater reliability of both forms of analysis. METHOD: Infrared images of masticatory and upper trapezius muscles of 64 female volunteers with and without temporomandibular disorder (TMD were collected. Two raters performed the infrared image analysis, which occurred in two ways: temperature measurement of the muscle length and in central portion of the muscle. The Intraclass Correlation Coefficient (ICC was used to determine the intra and inter-rater reliability. RESULTS: The ICC showed excellent intra and inter-rater values for both measurements: temperature measurement of the muscle length (TMD group, intra-rater, ICC ranged from 0.996 to 0.999, inter-rater, ICC ranged from 0.992 to 0.999; control group, intra-rater, ICC ranged from 0.993 to 0.998, inter-rater, ICC ranged from 0.990 to 0.998, and temperature measurement of the central portion of the muscle (TMD group, intra-rater, ICC ranged from 0.981 to 0.998, inter-rater, ICC ranged from 0.971 to 0.998; control group, intra-rater, ICC ranged from 0.887 to 0.996, inter-rater, ICC ranged from 0.852 to 0.996. CONCLUSION: The results indicated that temperature measurements of the masticatory and upper trapezius muscles carried out by the analysis of the muscle length and central portion yielded excellent intra and inter-rater reliability.CONTEXTUALIZAÇÃO: A termografia infravermelha vem sendo utilizada como instrumento auxiliar na avaliação de patologias diversas, dada a sua eficiência na investigação da distribuição da temperatura superficial cutânea. OBJETIVOS: Propor duas formas de análise das imagens infravermelhas dos músculos mastigatórios e trapézio superior

  7. Inter-rater variability of visual interpretation and comparison with quantitative evaluation of {sup 11}C-PiB PET amyloid images of the Japanese Alzheimer's Disease Neuroimaging Initiative (J-ADNI) multicenter study

    Energy Technology Data Exchange (ETDEWEB)

    Yamane, Tomohiko [Saitama Medical University Saitama International Center, Department of Nuclear Medicine, Hidaka (Japan); Institute of Biomedical Research and Innovation, Division of Molecular Imaging, Kobe (Japan); Tokyo Metropolitan Institute of Gerontology, Team for Neuroimaging Research, Tokyo (Japan); Ishii, Kenji; Sakata, Muneyuki [Tokyo Metropolitan Institute of Gerontology, Team for Neuroimaging Research, Tokyo (Japan); Ikari, Yasuhiko; Nishio, Tomoyuki [Institute of Biomedical Research and Innovation, Division of Molecular Imaging, Kobe (Japan); Research Association for Biotechnology, Tokyo (Japan); Ishii, Kazunari [Kinki University Hospital, Department of Radiology, Osaka, Sayama (Japan); Kato, Takashi; Ito, Kengo [National Center for Geriatrics and Gerontology, Department of Brain Science and Molecular Imaging, Obu (Japan); Senda, Michio [Institute of Biomedical Research and Innovation, Division of Molecular Imaging, Kobe (Japan); Collaboration: J-ADNI Study Group

    2017-05-15

    The aim of this study was to assess the inter-rater variability of the visual interpretation of {sup 11}C-PiB PET images regarding the positivity/negativity of amyloid deposition that were obtained in a multicenter clinical research project, Japanese Alzheimer's Disease Neuroimaging Initiative (J-ADNI). The results of visual interpretation were also compared with a semi-automatic quantitative analysis using mean cortical standardized uptake value ratio to the cerebellar cortex (mcSUVR). A total of 162 {sup 11}C-PiB PET scans, including 45 mild Alzheimer's disease, 60 mild cognitive impairment, and 57 normal cognitive control cases that had been acquired as J-ADNI baseline scans were analyzed. Based on visual interpretation by three independent raters followed by consensus read, each case was classified into positive, equivocal, and negative deposition (ternary criteria) and further dichotomized by merging the former two (binary criteria). Complete agreement of visual interpretation by the three raters was observed for 91.3% of the cases (Cohen κ = 0.88 on average) in ternary criteria and for 92.3% (κ = 0.89) in binary criteria. Cases that were interpreted as visually positive in the consensus read showed significantly higher mcSUVR than those visually negative (2.21 ± 0.37 vs. 1.27 ± 0.09, p < 0.001), and positive or negative decision by visual interpretation was dichotomized by a cut-off value of mcSUVR = 1.5. Significant positive/negative associations were observed between mcSUVR and the number of raters who evaluated as positive (ρ = 0.87, p < 0.0001) and negative (ρ = -0.85, p < 0.0001) interpretation. Cases of disagreement among raters showed generally low mcSUVR. Inter-rater agreement was almost perfect in {sup 11}C-PiB PET scans. Positive or negative decision by visual interpretation was dichotomized by a cut-off value of mcSUVR = 1.5. As some cases of disagreement among raters tended to show low mcSUVR, referring to quantitative method may

  8. Is the Parkinson Anxiety Scale comparable across raters?

    Science.gov (United States)

    Forjaz, Maria João; Ayala, Alba; Martinez-Martin, Pablo; Dujardin, Kathy; Pontone, Gregory M; Starkstein, Sergio E; Weintraub, Daniel; Leentjens, Albert F G

    2015-04-01

    The Parkinson Anxiety Scale is a new scale developed to measure anxiety severity in Parkinson's disease specifically. It consists of three dimensions: persistent anxiety, episodic anxiety, and avoidance behavior. This study aimed to assess the measurement properties of the scale while controlling for the rater (self- vs. clinician-rated) effect. The Parkinson Anxiety Scale was administered to a cross-sectional multicenter international sample of 362 Parkinson's disease patients. Both patients and clinicians rated the patient's anxiety independently. A many-facet Rasch model design was applied to estimate and remove the rater effect. The following measurement properties were assessed: fit to the Rasch model, unidimensionality, reliability, differential item functioning, item local independency, interrater reliability (self or clinician), and scale targeting. In addition, test-retest stability, construct validity, precision, and diagnostic properties of the Parkinson Anxiety Scale were also analyzed. A good fit to the Rasch model was obtained for Parkinson Anxiety Scale dimensions A and B, after the removal of one item and rescoring of the response scale for certain items, whereas dimension C showed marginal fit. Self versus clinician rating differences were of small magnitude, with patients reporting higher anxiety levels than clinicians. The linear measure for Parkinson Anxiety Scale dimensions A and B showed good convergent construct with other anxiety measures and good diagnostic properties. Parkinson Anxiety Scale modified dimensions A and B provide valid and reliable measures of anxiety in Parkinson's disease that are comparable across raters. Further studies are needed with dimension C. © 2014 International Parkinson and Movement Disorder Society.

  9. Reliability of the Cooking Task in adults with acquired brain injury.

    Science.gov (United States)

    Poncet, Frédérique; Swaine, Bonnie; Taillefer, Chantal; Lamoureux, Julie; Pradat-Diehl, Pascale; Chevignard, Mathilde

    2015-01-01

    Acquired brain injury (ABI) often leads to deficits in executive functioning (EF) responsible for severe and long-standing disabilities in daily life activities. The Cooking Task is an ecological and valid test of EF involving multi-tasking in a real environment. Given its complex scoring system, it is important to establish the tool's reliability. The objective of the study was to examine the reliability of the Cooking Task (internal consistency, inter-rater and test-retest reliability). A total of 160 patients with ABI (113 men, mean age 37 years, SD = 14.3) were tested using the Cooking Task. For test-retest reliability, patients were assessed by the same rater on two occasions (mean interval 11 days) while two raters independently and simultaneously observed and scored patients' performances to estimate inter-rater reliability. Internal consistency was high for the global scale (Cronbach α = .74). Inter-rater reliability (n = 66) for total errors was also high (ICC = .93), however the test-retest reliability (n = 11) was poor (ICC = .36). In general the Cooking Task appears to be a reliable tool. The low test-retest results were expected given the importance of EF in the performance of novel tasks.

  10. Assessment of apraxia: inter-rater reliability of a new apraxia test, association between apraxia and other cognitive deficits and prevalence of apraxia in a rehabilitation setting.

    Science.gov (United States)

    Zwinkels, Angeliek; Geusgens, Chantal; van de Sande, Peter; Van Heugten, Caroline

    2004-11-01

    To investigate the inter-rater reliability of a new apraxia test. Furthermore to examine the association of apraxia with other neuropsychological impairments and the prevalence of apraxia in a rehabilitation setting on the basis of the new test. Cross-sectional cohort study, involving 100 patients with a first stroke admitted to a rehabilitation centre in the Netherlands. General patient characteristics and stroke-related aspects. Cognitive screening involving apraxia, visuospatial scanning, abstract thinking and reasoning, memory, attention, planning and aphasia. The indices for inter-rater agreement range from excellent to poor. Significant correlations are found between apraxia and visuospatial scanning, memory, attention, planning and aphasia. The patients with apraxia perform significantly worse than the patients without apraxia on memory, the time needed to complete the tests for scanning and attention, and aphasia. The prevalence of apraxia is 25.3% in the total group, 51.3% in the left hemisphere stroke patients and 6.0% in the right hemisphere stroke patients. Patients with and without apraxia do not differ significantly concerning age, gender and type of stroke. The apraxia test has been shown to be a reliable instrument. Apraxia is often associated with aphasia, memory problems and mental slowness. This study shows that on the basis of the apraxia test, the prevalence of apraxia among patients in the rehabilitation centre is high, especially among patients with left hemisphere lesions.

  11. Reliability of the Quality of Upper Extremity Skills Test for Children with Cerebral Palsy Aged 2 to 12 Years

    Science.gov (United States)

    Thorley, Megan; Lannin, Natasha; Cusick, Anne; Novak, Iona; Boyd, Roslyn

    2012-01-01

    Aim: To investigate reliability of the Quality of Upper Extremity Skills Test (QUEST) scores for children with cerebral palsy (CP) aged 2-12 years. Method: Thirty-one QUESTs from 24 children with CP were rated once by two raters and twice by one rater. Internal consistency of total scores, inter- and intra-rater reliability findings for total,…

  12. Factors Influencing Mini-CEX Rater Judgments and Their Practical Implications: A Systematic Literature Review.

    Science.gov (United States)

    Lee, Victor; Brain, Keira; Martin, Jenepher

    2017-06-01

    At present, little is known about how mini-clinical evaluation exercise (mini-CEX) raters translate their observations into judgments and ratings. The authors of this systematic literature review aim both to identify the factors influencing mini-CEX rater judgments in the medical education setting and to translate these findings into practical implications for clinician assessors. The authors searched for internal and external factors influencing mini-CEX rater judgments in the medical education setting from 1980 to 2015 using the Ovid MEDLINE, PsycINFO, ERIC, PubMed, and Scopus databases. They extracted the following information from each study: country of origin, educational level, study design and setting, type of observation, occurrence of rater training, provision of feedback to the trainee, research question, and identified factors influencing rater judgments. The authors also conducted a quality assessment for each study. Seventeen articles met the inclusion criteria. The authors identified both internal and external factors that influence mini-CEX rater judgments. They subcategorized the internal factors into intrinsic rater factors, judgment-making factors (conceptualization, interpretation, attention, and impressions), and scoring factors (scoring integration and domain differentiation). The current theories of rater-based judgment have not helped clinicians resolve the issues of rater idiosyncrasy, bias, gestalt, and conflicting contextual factors; therefore, the authors believe the most important solution is to increase the justification of rater judgments through the use of specific narrative and contextual comments, which are more informative for trainees. Finally, more real-world research is required to bridge the gap between the theory and practice of rater cognition.

  13. RELIABILITY OF THE DYNAMIC OCCUPATIONAL THERAPY COGNITIVE ASSESSMENT FOR CHILDREN (DOTCA-CH: THAI VERSION OF ORIENTATION, SPATIAL PERCEPTION, AND THINKING OPERATIONS SUBTESTS

    Directory of Open Access Journals (Sweden)

    Suchitporn Lersilp

    2014-06-01

    Full Text Available The Dynamic Occupational Therapy Cognitive Assessment for Children (DOTCA-Ch is a tool for finding out about cognitive problems in school-aged children. However, the DOTCA-Ch was developed in English for Western children. For this reason, it’s not appropriate for Thai children because of the differences of culture and language. The objectives of this study were aimed at translating the DOTCA-Ch in Orientation, Spatial Perception, and Thinking Operations subtests to a Thai version with a World Health Organization back-translation process, and to examine its internal consistency, inter-rater reliability and test-retest reliability. The participants consisted of 38 intellectually impaired and learning disabled individuals between the ages of 6–12. Results from this study revealed high internal consistency in the Orientation subtest (α=.83 Spatial Perception subtest (α=.82 and Thinking Operations subtest (α=.82, high inter-rater reliability in the Orientation subtest (ICC =.83, Spatial Perception subtest (ICC =.84 and Thinking Operations subtest (ICC =.74 and high test-retest reliability in the Orientation subtest (ICC =.84 Spatial Perception subtest (ICC =.86 and Thinking Operations subtest (ICC =.85. These results indicate that the Thai version of the DOTCA-Ch in Orientation, Spatial Perception, and Thinking Operations subtests  might be used as an appropriate assessment tool for Thai children, based on psychometric evidence including internal consistency, inter-rater reliability and test-retest reliability. However, additional study of other psychometric properties, including, predictive validity, concurrent reliability, and inter-rater reliability during the mediation process of this assessment tool needs to be carried out.

  14. The reliability of a modified Kalamazoo Consensus Statement Checklist for assessing the communication skills of multidisciplinary clinicians in the simulated environment.

    Science.gov (United States)

    Peterson, Eleanor B; Calhoun, Aaron W; Rider, Elizabeth A

    2014-09-01

    With increased recognition of the importance of sound communication skills and communication skills education, reliable assessment tools are essential. This study reports on the psychometric properties of an assessment tool based on the Kalamazoo Consensus Statement Essential Elements Communication Checklist. The Gap-Kalamazoo Communication Skills Assessment Form (GKCSAF), a modified version of an existing communication skills assessment tool, the Kalamazoo Essential Elements Communication Checklist-Adapted, was used to assess learners in a multidisciplinary, simulation-based communication skills educational program using multiple raters. 118 simulated conversations were available for analysis. Internal consistency and inter-rater reliability were determined by calculating a Cronbach's alpha score and intra-class correlation coefficients (ICC), respectively. The GKCSAF demonstrated high internal consistency with a Cronbach's alpha score of 0.844 (faculty raters) and 0.880 (peer observer raters), and high inter-rater reliability with an ICC of 0.830 (faculty raters) and 0.89 (peer observer raters). The Gap-Kalamazoo Communication Skills Assessment Form is a reliable method of assessing the communication skills of multidisciplinary learners using multi-rater methods within the learning environment. The Gap-Kalamazoo Communication Skills Assessment Form can be used by educational programs that wish to implement a reliable assessment and feedback system for a variety of learners. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  15. Psychometric evaluation of a motor control test battery of the craniofacial region.

    Science.gov (United States)

    von Piekartz, H; Stotz, E; Both, A; Bahn, G; Armijo-Olivo, S; Ballenberger, N

    2017-12-01

    The primary objective of this study was to determine the structural and known-group validity as well as the inter-rater reliability of a test battery to evaluate the motor control of the craniofacial region. Seventy volunteers without TMD and 25 subjects with TMD (Axes I) per the DC/TMD were asked to execute a test battery consisting of eight tests. The tests were video-taped in the same sequence in a standardised manner. Two experienced physical therapists participated in this study as blinded assessors. We used exploratory factor analysis to identify the underlying component structure of the eight tests. Internal consistency (Cronbach's α), inter-rater reliability (intra-class correlation coefficient) and construct validity (ie, hypothesis testing-known-group validity) (receiver operating curves) were also explored for the test battery. The structural validity showed the presence of one factor underlying the construct of the test battery. The internal consistency was excellent (0.90) as well as the inter-rater reliability. All values of reliability were close to 0.9 or above indicating very high inter-rater reliability. The area under the curve (AUC) was 0.93 for rater 1 and 0.94 for rater two, respectively, indicating excellent discrimination between subjects with TMD and healthy controls. The results of the present study support the psychometric properties of test battery to measure motor control of the craniofacial region when evaluated through videotaping. This test battery could be used to differentiate between healthy subjects and subjects with musculoskeletal impairments in the cervical and oro-facial regions. In addition, this test battery could be used to assess the effectiveness of management strategies in the craniofacial region. © 2017 John Wiley & Sons Ltd.

  16. Psychometric Evaluation of the D-Catch, an Instrument to Measure the Accuracy of Nursing Documentation.

    Science.gov (United States)

    D'Agostino, Fabio; Barbaranelli, Claudio; Paans, Wolter; Belsito, Romina; Juarez Vela, Raul; Alvaro, Rosaria; Vellone, Ercole

    2017-07-01

    To evaluate the psychometric properties of the D-Catch instrument. A cross-sectional methodological study. Validity and reliability were estimated with confirmatory factor analysis (CFA) and internal consistency and inter-rater reliability, respectively. A sample of 250 nursing documentations was selected. CFA showed the adequacy of a 1-factor model (chronologically descriptive accuracy) with an outlier item (nursing diagnosis accuracy). Internal consistency and inter-rater reliability were adequate. The D-Catch is a valid and reliable instrument for measuring the accuracy of nursing documentation. Caution is needed when measuring diagnostic accuracy since only one item measures this dimension. The D-Catch can be used as an indicator of the accuracy of nursing documentation and the quality of nursing care. © 2015 NANDA International, Inc.

  17. Inter-rater Reliability of the Dysphagia Outcome and Severity Scale (DOSS): Effects of Clinical Experience, Audio-Recording and Training.

    Science.gov (United States)

    Zarkada, Angeliki; Regan, Julie

    2017-10-19

    The Dysphagia Outcome and Severity Scale (DOSS) is widely used to measure dysphagia severity based on videofluoroscopy (VFSS). This study investigated inter-rater reliability (IRR) of the DOSS. It also determined the effect of clinical experience, VFSS audio-recording and training on DOSS IRR. A quantitative prospective research design was used. Seventeen speech and language pathologists (SLPs) were recruited from an acute teaching hospital, Dublin (> 3 years' VFSS experience, n = 10) and from a postgraduate dysphagia programme in a university setting (training session on DOSS rating after which DOSS IRR was re-tested. Cohen's kappa co-efficient was used to establish IRR. IRR of the DOSS presented only fair agreement (κ = 0.36, p training (κ = 0.328) was significantly better comparing to post-training (κ = 0.218) (p < 0.05). Findings raise concerns as the DOSS is frequently used in clinical practice to capture dysphagia severity and to monitor changes.

  18. Reliability and validity of the de Morton Mobility Index in individuals with sub-acute stroke.

    Science.gov (United States)

    Braun, Tobias; Marks, Detlef; Thiel, Christian; Grüneberg, Christian

    2018-02-04

    To establish the validity and reliability of the de Morton Mobility Index (DEMMI) in patients with sub-acute stroke. This cross-sectional study was performed in a neurological rehabilitation hospital. We assessed unidimensionality, construct validity, internal consistency reliability, inter-rater reliability, minimal detectable change and possible floor and ceiling effects of the DEMMI in adult patients with sub-acute stroke. The study included a total sample of 121 patients with sub-acute stroke. We analysed validity (n = 109) and reliability (n = 51) in two sub-samples. Rasch analysis indicated unidimensionality with an overall fit to the model (chi-square = 12.37, p = 0.577). All hypotheses on construct validity were confirmed. Internal consistency reliability (Cronbach's alpha = 0.94) and inter-rater reliability (intraclass correlation coefficient = 0.95; 95% confidence interval: 0.92-0.97) were excellent. The minimal detectable change with 90% confidence was 13 points. No floor or ceiling effects were evident. These results indicate unidimensionality, sufficient internal consistency reliability, inter-rater reliability, and construct validity of the DEMMI in patients with a sub-acute stroke. Advantages of the DEMMI in clinical application are the short administration time, no need for special equipment and interval level data. The de Morton Mobility Index, therefore, may be a useful performance-based bedside test to measure mobility in individuals with a sub-acute stroke across the whole mobility spectrum. Implications for Rehabilitation The de Morton Mobility Index (DEMMI) is an unidimensional measurement instrument of mobility in individuals with sub-acute stroke. The DEMMI has excellent internal consistency and inter-rater reliability, and sufficient construct validity. The minimal detectable change of the DEMMI with 90% confidence in stroke rehabilitation is 13 points. The lack of any floor or ceiling effects on hospital admission indicates

  19. The relative reliability of actively participating and passively observing raters in a simulation-based assessment for selection to specialty training in anaesthesia.

    Science.gov (United States)

    Roberts, M J; Gale, T C E; Sice, P J A; Anderson, I R

    2013-06-01

    Selection to specialty training is a high-stakes assessment demanding valuable consultant time. In one initial entry level and two higher level anaesthesia selection centres, we investigated the feasibility of using staff participating in simulation scenarios, rather than observing consultants, to rate candidate performance. We compared participant and observer scores using four different outcomes: inter-rater reliability; score distributions; correlation of candidate rankings; and percentage of candidates whose selection might be affected by substituting participants' for observers' ratings. Inter-rater reliability between observers was good (correlation coefficient 0.73-0.96) but lower between participants (correlation coefficient 0.39-0.92), particularly at higher level where participants also rated candidates more favourably than did observers. Station rank orderings were strongly correlated between the rater groups at entry level (rho 0.81, p training posts available. We conclude that using participating raters is feasible at initial entry level only. Anaesthesia © 2013 The Association of Anaesthetists of Great Britain and Ireland.

  20. Reliability of the International Spinal Cord Injury Musculoskeletal Basic Data Set

    DEFF Research Database (Denmark)

    Baunsgaard, C B; Chhabra, H S; Harvey, L A

    2016-01-01

    STUDY DESIGN: Psychometric study. OBJECTIVES: To determine the intra- and inter-rater reliability and content validity of the International Spinal Cord Injury (SCI) Musculoskeletal Basic Data Set (ISCIMSBDS). SETTING: Four centers with one in each of the countries in Australia, England, India and...

  1. The reliability of the Brazilian version of the Composite International Diagnostic Interview (CIDI 2.1

    Directory of Open Access Journals (Sweden)

    Quintana M.I.

    2004-01-01

    Full Text Available The objective of the present study was to determine the reliability of the Brazilian version of the Composite International Diagnostic Interview 2.1 (CIDI 2.1 in clinical psychiatry. The CIDI 2.1 was translated into Portuguese using WHO guidelines and reliability was studied using the inter-rater reliability method. The study sample consisted of 186 subjects from psychiatric hospitals and clinics, primary care centers and community services. The interviewers consisted of a group of 13 lay and three non-lay interviewers submitted to the CIDI training. The average interview time was 2 h and 30 min. General reliability ranged from kappa 0.50 to 1. For lifetime diagnoses the reliability ranged from kappa 0.77 (Bipolar Affective Disorder to 1 (Substance-Related Disorder, Alcohol-Related Disorder, Eating Disorders. Previous year reliability ranged from kappa 0.66 (Obsessive-Compulsive Disorder to 1 (Dissociative Disorders, Maniac Disorders, Eating Disorders. The poorest reliability rate was found for Mild Depressive Episode (kappa = 0.50 during the previous year. Training proved to be a fundamental factor for maintaining good reliability. Technical knowledge of the questionnaire compensated for the lack of psychiatric knowledge of the lay personnel. Inter-rater reliability was good to excellent for persons in psychiatric practice.

  2. Rating the raters in a mixed model: An approach to deciphering the rater reliability

    Science.gov (United States)

    Shang, Junfeng; Wang, Yougui

    2013-05-01

    Rating the raters has attracted extensive attention in recent years. Ratings are quite complex in that the subjective assessment and a number of criteria are involved in a rating system. Whenever the human judgment is a part of ratings, the inconsistency of ratings is the source of variance in scores, and it is therefore quite natural for people to verify the trustworthiness of ratings. Accordingly, estimation of the rater reliability will be of great interest and an appealing issue. To facilitate the evaluation of the rater reliability in a rating system, we propose a mixed model where the scores of the ratees offered by a rater are described with the fixed effects determined by the ability of the ratees and the random effects produced by the disagreement of the raters. In such a mixed model, for the rater random effects, we derive its posterior distribution for the prediction of random effects. To quantitatively make a decision in revealing the unreliable raters, the predictive influence function (PIF) serves as a criterion which compares the posterior distributions of random effects between the full data and rater-deleted data sets. The benchmark for this criterion is also discussed. This proposed methodology of deciphering the rater reliability is investigated in the multiple simulated and two real data sets.

  3. Affect Consciousness in children with internalizing problems: Assessment of affect integration.

    Science.gov (United States)

    Taarvig, Eva; Solbakken, Ole André; Grova, Bjørg; Monsen, Jon T

    2015-10-01

    Affect integration was operationalized through the Affect Consciousness (AC) construct as degrees of awareness, tolerance, nonverbal expression and conceptual expression of 11 affects. These aspects are assessed through a semi-structured Affect Consciousness Interview (ACI) and separate rating scales (Affect Consciousness Scales (ACSs)) developed for use in research and clinical work with adults with psychopathological disorders. Age-adjusted changes were made in the interview and rating system. This study explored the applicability of the adjusted ACI to a sample of 11-year-old children with internalizing problems through examining inter-rater reliability of the adjusted ACI, along with relationships between the AC aspects and aspects of mental health as symptoms of depression, symptoms of anxiety, social competence, besides general intelligence. Satisfactory inter-rater reliability was found, as well as consistent relationships between the AC aspects and the various aspects of mental health, a finding which coincides with previous research. The finding indicates that the attainment of the capacity to deal adaptively with affect is probably an important contributor to the development of adequate social competence and maybe in the prevention of psychopathology in children. The results indicate that the adjusted ACI and rating scales are useful tools in treatment planning with children at least from the age of 11 years. © The Author(s) 2014.

  4. Content Validity Index and Intra- and Inter-Rater Reliability of a New Muscle Strength/Endurance Test Battery for Swedish Soldiers.

    Directory of Open Access Journals (Sweden)

    Helena Larsson

    Full Text Available The objective of this study was to examine the content validity of commonly used muscle performance tests in military personnel and to investigate the reliability of a proposed test battery. For the content validity investigation, thirty selected tests were those described in the literature and/or commonly used in the Nordic and North Atlantic Treaty Organization (NATO countries. Nine selected experts rated, on a four-point Likert scale, the relevance of these tests in relation to five different work tasks: lifting, carrying equipment on the body or in the hands, climbing, and digging. Thereafter, a content validity index (CVI was calculated for each work task. The result showed excellent CVI (≥0.78 for sixteen tests, which comprised of one or more of the military work tasks. Three of the tests; the functional lower-limb loading test (the Ranger test, dead-lift with kettlebells, and back extension, showed excellent content validity for four of the work tasks. For the development of a new muscle strength/endurance test battery, these three tests were further supplemented with two other tests, namely, the chins and side-bridge test. The inter-rater reliability was high (intraclass correlation coefficient, ICC2,1 0.99 for all five tests. The intra-rater reliability was good to high (ICC3,1 0.82-0.96 with an acceptable standard error of mean (SEM, except for the side-bridge test (SEM%>15. Thus, the final suggested test battery for a valid and reliable evaluation of soldiers' muscle performance comprised the following four tests; the Ranger test, dead-lift with kettlebells, chins, and back extension test. The criterion-related validity of the test battery should be further evaluated for soldiers exposed to varying physical workload.

  5. How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs

    Directory of Open Access Journals (Sweden)

    Margarita eStolarova

    2014-06-01

    Full Text Available This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire deve-loped for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent-teacher and 19 mother-father pairs collected for two-year-old children (12 bilingual are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC. Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent-teacher ratings of children’s early vocabulary can achieve agreement and correlation comparable to those of mother-father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters’ agreement. We conclude that future reports of agree-ment, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings.

  6. Adaptation and testing of psychosocial assessment instruments for cross-cultural use: an example from the Thailand Burma border.

    Science.gov (United States)

    Haroz, Emily E; Bass, Judith K; Lee, Catherine; Murray, Laura K; Robinson, Courtland; Bolton, Paul

    2014-01-01

    The purpose of this study was to develop valid and reliable instruments to assess priority psychosocial problems and functioning among adult survivors of systematic violence from Burma living in Thailand. The process involved four steps: 1) instrument drafting and piloting; 2) reliability and validity testing; 3) instrument revision; and 4) retesting revised instrument. A total of N = 158 interviews were completed. Overall subscales showed good internal consistency (0.73-0.92) and satisfactory combined test-retest/inter rater reliability (0.63-0.84). Criterion validity, was not demonstrated for any scale. The alcohol and functioning scales underperformed and were revised (step 3) and retested (step 4). Upon retesting, the function scale showed good internal consistency reliability (0.91-0.92), and the alcohol scale showed acceptable internal consistency (0.79) and strong test-retest/inter-rater reliability (0.86-0.89). This paper describes the importance and process of adaptation and testing, illustrated by the experiences and results for selected instruments in this population.

  7. Intra-Rater Reproducibility and Validity of Nintendo Wii Balance Testing in Community-Dwelling Older Adults

    DEFF Research Database (Denmark)

    Jørgensen, Martin Grønbech; Laessoe, Uffe; Hendriksen, Carsten

    2014-01-01

    The aims of the current study were to (1) examine the intra-rater inter-session reproducibility of the Nintendo Wii Agility and Stillness tests and (2) explore the concurrent validity in relation to 'gold-standard' force plate analysis. Within-day inter-session reproducibility was examined in 30 ...... older adults (age 71.8±5.1 yrs.). No systematic test-retest differences were found for the Wii Stillness test, however, the Wii Agility test scores differed systematically between test sessions (p...

  8. The politics of inter-regionalism: relations between international regional organizations

    NARCIS (Netherlands)

    Vleuten, J.M. van der; Ribeiro Hoffman, A.; Reinalda, B.

    2013-01-01

    As the development of relations between international regional organizations, inter-regionalism denotes a relatively recent phenomenon. Largely due to systemic bipolarity, inter-regional relations remained limited to 'dialogue partnerships' between the European Community (EC) and other regional

  9. The Italian version of the Mayo-Portland Adaptability Inventory-4. A new measure of brain injury outcome.

    Science.gov (United States)

    Cattelani, R; Corsini, D; Posteraro, L; Agosti, M; Saccavini, M

    2009-12-01

    The assessment of major obstacles to community integration which may result from an acquired brain injury (ABI) is needed for rational planning and effective management of ABI patients' social adjustment. Currently, such a generally acceptable measure is not available for the Italian population. This paper reports the translation process, the internal consistency, and the inter-rater reliability data for the Italian version of the Mayo-Portland Adaptability Inventory-4 (MPAI-4), a useful measure with highly developed and well documented psychometric properties. The MPAI-4 is specifically designed to assess socially relevant aspects of physical status and cognitive-behavioural competence following ABI. It is a 29-item inventory which is divided into three subdomains (Abilities, Adjustment, and Participation indices) covering a reasonably representative range Twenty ABI patients with at least one-year discharge from the rehabilitation facilities were submitted to the Italian MPAI-4. They were independently rated by two different rehabilitation professionals and a family member/significant other serving as informant (SO). Internal consistency was assessed by calculating the Cronbach's alpha values. Inter-rater agreement for individual items was statistically examined by determining the interclass correlation coefficient (ICC). In addition to the 8% of perfectly correspondent sentences, a clear prevalence (75.5%) of minor semantic variations and formal variations with no semantic value at the sentence-to-sentence matching was found. Full-scale Cronbach's alpha was 0.951 and 0.947 for the two professionals (rater #1 and rater #2, respectively), and was 0.957 for the family member serving as informant (rater #3). Full-Scale ICC (2.1) between professionals and SOs was 0.804 (CI=95%; lower-upper bound=0.688-0.901). The Italian MPAI-4 shares many psychometric features with the original English version, demonstrates both good internal consistency and good inter-rater

  10. Internal Consistency of Reliability Assessment of the Persian version of the ‘Home Falls and Accident Screening Tool’

    Directory of Open Access Journals (Sweden)

    Afsoon Hassani Mehraban

    2013-10-01

    Full Text Available Objectives: Falling is a common problem among the elderly. Falling indoors and outdoors is highly prevalent among the Iranian elderly. Therefore, identification of the contributing factors at home and their modification can reduce falls and subsequent injuries inthe elderly. The goal of this study was to identify the elderly at risk of fall, using the ‘Home Falls and Accident Screening Tool’ (HOME FAST, and to determine the reliability of this tool. Methods: Sixty old people were selected from five geographical regions of Tehran through the Local Town Councils. Participants were aged 60 to 65 years, and HOME FAST was used to assess inter rater and test- retest reliability. Results: Test-retest reliability in the study showed that agreement between the items of the Persian version of HOME FAST was over 0.8, which is a very good reliability. The agreement between the domains was 0.65-1.00, indicative of moderate to high reliability. Moreover, the Inter rater reliability of the items was over 0.8, which is also very good. The correlation of each item between the domains was 0.01-1.00, which shows poor to high reliability. Discussion: This study showed that the reliability of the Persian version of HOME FAST is high. This tool can therefore be used as an appropriate screening tool by professionals to take necessary preventive measures for the Iranian elderly population.

  11. The Surgical Safety Checklist and Teamwork Coaching Tools: a study of inter-rater reliability.

    Science.gov (United States)

    Huang, Lyen C; Conley, Dante; Lipsitz, Stu; Wright, Christopher C; Diller, Thomas W; Edmondson, Lizabeth; Berry, William R; Singer, Sara J

    2014-08-01

    To assess the inter-rater reliability (IRR) of two novel observation tools for measuring surgical safety checklist performance and teamwork. Data surgical safety checklists can promote adherence to standards of care and improve teamwork in the operating room. Their use has been associated with reductions in mortality and other postoperative complications. However, checklist effectiveness depends on how well they are performed. Authors from the Safe Surgery 2015 initiative developed a pair of novel observation tools through literature review, expert consultation and end-user testing. In one South Carolina hospital participating in the initiative, two observers jointly attended 50 surgical cases and independently rated surgical teams using both tools. We used descriptive statistics to measure checklist performance and teamwork at the hospital. We assessed IRR by measuring percent agreement, Cohen's κ, and weighted κ scores. The overall percent agreement and κ between the two observers was 93% and 0.74 (95% CI 0.66 to 0.79), respectively, for the Checklist Coaching Tool and 86% and 0.84 (95% CI 0.77 to 0.90) for the Surgical Teamwork Tool. Percent agreement for individual sections of both tools was 79% or higher. Additionally, κ scores for six of eight sections on the Checklist Coaching Tool and for two of five domains on the Surgical Teamwork Tool achieved the desired 0.7 threshold. However, teamwork scores were high and variation was limited. There were no significant changes in the percent agreement or κ scores between the first 10 and last 10 cases observed. Both tools demonstrated substantial IRR and required limited training to use. These instruments may be used to observe checklist performance and teamwork in the operating room. However, further refinement and calibration of observer expectations, particularly in rating teamwork, could improve the utility of the tools. Published by the BMJ Publishing Group Limited. For permission to use (where not already

  12. A Brazilian-Portuguese version of the Kinesthetic and Visual Motor Imagery Questionnaire.

    Science.gov (United States)

    Demanboro, Alan; Sterr, Annette; Anjos, Sarah Monteiro Dos; Conforto, Adriana Bastos

    2018-01-01

    Motor imagery has emerged as a potential rehabilitation tool in stroke. The goals of this study were: 1) to develop a translated and culturally-adapted Brazilian-Portugese version of the Kinesthetic and Visual Motor Imagery Questionnaire (KVIQ20-P); 2) to evaluate the psychometric characteristics of the scale in a group of patients with stroke and in an age-matched control group; 3) to compare the KVIQ20 performance between the two groups. Test-retest, inter-rater reliabilities, and internal consistencies were evaluated in 40 patients with stroke and 31 healthy participants. In the stroke group, ICC confidence intervals showed excellent test-retest and inter-rater reliabilities. Cronbach's alpha also indicated excellent internal consistency. Results for controls were comparable to those obtained in persons with stroke. The excellent psychometric properties of the KVIQ20-P should be considered during the design of studies of motor imagery interventions for stroke rehabilitation.

  13. Choice, internal consistency, and rationality

    OpenAIRE

    Aditi Bhattacharyya; Prasanta K. Pattanaik; Yongsheng Xu

    2010-01-01

    The classical theory of rational choice is built on several important internal consistency conditions. In recent years, the reasonableness of those internal consistency conditions has been questioned and criticized, and several responses to accommodate such criticisms have been proposed in the literature. This paper develops a general framework to accommodate the issues raised by the criticisms of classical rational choice theory, and examines the broad impact of these criticisms from both no...

  14. Inter-rater and test-retest reliability, internal consistency, and factorial structure of the instrument for forensic treatment evaluation

    NARCIS (Netherlands)

    Schuringa, E.; Spreen, M.; Bogaerts, S.

    2014-01-01

    In this study, the Instrument for Forensic Treatment Evaluation (IFTE) is introduced. The IFTE includes 14 dynamic items of the risk assessment scheme HKT-R and eight items specifically related to the treatment of forensic psychiatric patients. The items are divided over three factors: protective

  15. Modified Tuck Jump Assessment: Reliability and Training of Raters

    Directory of Open Access Journals (Sweden)

    Craig A. Smith, Nicole J. Chimera, Monica R. Lininger, Meghan Warren

    2017-09-01

    Full Text Available We are writing with regard to “Intra- and inter-rater reliability of the modified tuck jump assessment,” by Fort-Vanmeerhaeghe et al. (2017 published in the Journal of Sports Science & Medicine. The authors reported on the reliability of the modified Tuck Jump Assessment (TJA. The purpose of the article was twofold: to introduce a new scoring methodology and to report on the interrater and intrarater reliability. The authors found the modified TJA to have excellent interrater reliability (ICC = 0.94, 95% CI = 0.88-0.97 and intrarater reliability (rater 1 ICC = 0.94, 95% CI = 0.88-0.9; rater 2 ICC = 0.96, 95% CI = 0.92-0.98 with experienced raters (n = 2 in a sample of 24 elite volleyball athletes. Overall, we found the study to be well conducted and valuable to the field of injury screening; however, the study did not adequately explain how the raters were trained in the modified TJA to improve consistency of scoring, or the modifications of the individual flaw “excessive contact noise at landing.” This information is necessary to improve the clinical utility of the TJA and direct future reliability studies. The TJA has been changed at least three times in the literature: from the initial introduction (Myer et al., 2006 to the most referenced and detailed protocol (Myer et al., 2011 to the publication under discussion (Fort-Vanmeerhaeghe et al., 2017. The initial test protocol was based upon clinical expertise and has evolved over time as new research emerged and problems arose with the original TJA. Initially, the TJA was scored on a visual analog scale (Myer et al., 2006, changed to a dichotomous scale (0 for no flaw or 1 for flaw present (Myer et al., 2011 and most recently modified using an ordinal scale (Fort-Vanmeerhaeghe et al., 2017. A significant disparity in the reported interrater and intrarater reliability arose with the dichotomously scored TJA, between those involved in the development of the TJA (Herrington et al., 2013

  16. How much do family physicians involve pregnant women in decisions about prenatal screening for Down syndrome?

    Science.gov (United States)

    Gagnon, Susie; Labrecque, Michel; Njoya, Merlin; Rousseau, François; St-Jacques, Sylvie; Légaré, France

    2010-02-01

    To assess the extent to which family physicians (FPs) involve women in decisions about prenatal screening for Down syndrome. Based on transcripts of consultations between 41 FPs and 128 women, two raters independently assessed clinician's efforts to involve women in decisions about prenatal screening for Down syndrome using the French-language version of OPTION. Descriptive statistics of OPTION scores were calculated. Construct validity was assessed by performing a principal factor analysis and by measuring association with consultation duration and FPs sociodemograhics. Internal consistency was assessed with Cronbach's alpha and inter-rater reliability with the intraclass correlation coefficient. The overall mean OPTION score was low: 19 +/- 7 (range = 0 [no involvement] to 100 [high involvement]). One factor accounted for 80% of the variance. Both internal consistency and inter-rater reliability were very good (Cronbach's alpha = 0.73; ICC = 0.76). OPTION scores were lower for residents than for licensed FPs (17 +/- 5 vs 21 +/- 4; p = 0.02) and were positively associated with duration of consultation (r = 0.56; p women in decisions about prenatal screening for Down syndrome. (c) 2009 John Wiley & Sons, Ltd.

  17. DeuteRater: a tool for quantifying peptide isotope precision and kinetic proteomics.

    Science.gov (United States)

    Naylor, Bradley C; Porter, Michael T; Wilson, Elise; Herring, Adam; Lofthouse, Spencer; Hannemann, Austin; Piccolo, Stephen R; Rockwood, Alan L; Price, John C

    2017-05-15

    Using mass spectrometry to measure the concentration and turnover of the individual proteins in a proteome, enables the calculation of individual synthesis and degradation rates for each protein. Software to analyze concentration is readily available, but software to analyze turnover is lacking. Data analysis workflows typically don't access the full breadth of information about instrument precision and accuracy that is present in each peptide isotopic envelope measurement. This method utilizes both isotope distribution and changes in neutromer spacing, which benefits the analysis of both concentration and turnover. We have developed a data analysis tool, DeuteRater, to measure protein turnover from metabolic D 2 O labeling. DeuteRater uses theoretical predictions for label-dependent change in isotope abundance and inter-peak (neutromer) spacing within the isotope envelope to calculate protein turnover rate. We have also used these metrics to evaluate the accuracy and precision of peptide measurements and thereby determined the optimal data acquisition parameters of different instruments, as well as the effect of data processing steps. We show that these combined measurements can be used to remove noise and increase confidence in the protein turnover measurement for each protein. Source code and ReadMe for Python 2 and 3 versions of DeuteRater are available at https://github.com/JC-Price/DeuteRater . Data is at https://chorusproject.org/pages/index.html project number 1147. Critical Intermediate calculation files provided as Tables S3 and S4. Software has only been tested on Windows machines. jcprice@chem.byu.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  18. The health preoccupation diagnostic interview: inter-rater reliability of a structured interview for diagnostic assessment of DSM-5 somatic symptom disorder and illness anxiety disorder.

    Science.gov (United States)

    Axelsson, Erland; Andersson, Erik; Ljótsson, Brjánn; Wallhed Finn, Daniel; Hedman, Erik

    2016-06-01

    Somatic symptom disorder (SSD) and illness anxiety disorder (IAD) are two new diagnoses introduced in the DSM-5. There is a need for reliable instruments to facilitate the assessment of these disorders. We therefore developed a structured diagnostic interview, the Health Preoccupation Diagnostic Interview (HPDI), which we hypothesized would reliably differentiate between SSD, IAD, and no diagnosis. Persons with clinically significant health anxiety (n = 52) and healthy controls (n = 52) were interviewed using the HPDI. Diagnoses were then compared with those made by an independent assessor, who listened to audio recordings of the interviews. Ratings generally indicated moderate to almost perfect inter-rater agreement, as illustrated by an overall Cohen's κ of .85. Disagreements primarily concerned (a) the severity of somatic symptoms, (b) the differential diagnosis of panic disorder, and (c) SSD specifiers. We conclude that the HPDI can be used to reliably diagnose DSM-5 SSD and IAD.

  19. Lessons learnt from participation in international inter-comparison exercise for environmental radioactivity measurement

    International Nuclear Information System (INIS)

    Jha, S.K.; Pulhani, Vandana; Sartandel, Sangeeta

    2016-06-01

    Environmental Radioactivity Measurement Section of Health Physics Division is regularly carrying out surveillance of the radioactivity concentration in the environment. The laboratory participates in the inter-comparison exercises conducted by various international agencies for quality assurance and quality control of analytical estimations. This report summarizes the results of the analysis of radioactivity in environmental matrices of the inter-comparison exercises. The participation in inter-comparison exercises has demonstrated competence in radionuclide identification and estimations, equivalence with the results of other participating laboratories, validated adopted analytical methods, introduced traceability to measurement etc. at national and international level. (author)

  20. OC10 - Inter-rater agreement of the Paediatric Early Warning Score tools used in the central Denmark region

    DEFF Research Database (Denmark)

    Jensen, Claus Sixtus; Aagaard, Hanne; Vebert Olesen, Hanne

    2016-01-01

    through simultaneous blinded PEWS assessment on the same patients by two nurses. Fleiss' kappa was utilized to determine the level of agreement among the raters. CONCLUSION: With a paucity of published reliability testing studies, this research attempts to address identified research gaps and will thus...

  1. The accuracy and consistency of rural, remote and outpost triage nurse decision making in one Western Australia Country Health Service Region.

    Science.gov (United States)

    Ekins, Kylie; Morphet, Julia

    2015-11-01

    The Australasian Triage Scale aims to ensure that the triage category allocated, reflects the urgency with which the patient needs medical assistance. This is dependent on triage nurse accuracy in decision making. The Australasian Triage Scale also aims to facilitate triage decision consistency between individuals and organisations. Various studies have explored the accuracy and consistency of triage decisions throughout Australia, yet no studies have specifically focussed on triage decision making in rural health services. Further, no standard has been identified by which accuracy or consistency should be measured. Australian emergency departments are measured against a set of standard performance indicators, including time from triage to patient review, and patient length of stay. There are currently no performance indicators for triage consistency. An online questionnaire was developed to collect demographic data and measure triage accuracy and consistency. The questionnaire utilised previously validated triage scenarios.(1) Triage decision accuracy was measured, and consistency was compared by health site type using Fleiss' kappa. Forty-six triage nurses participated in this study. The accuracy of participants' triage decision-making decreased with each less urgent triage category. Post-graduate qualifications had no bearing on triage accuracy. There was no significant difference in the consistency of decision-making between paediatric and adult scenarios. Overall inter-rater agreement using Fleiss' kappa coefficient, was 0.4. This represents a fair-to-good level of inter-rater agreement. A standard definition of accuracy and consistency in triage nurse decision making is required. Inaccurate triage decisions can result in increased morbidity and mortality. It is recommended that emergency department performance indicator thresholds be utilised as a benchmark for national triage consistency. Crown Copyright © 2015. Published by Elsevier Ltd. All rights

  2. Reliability, validity and description of timed performance of the Jebsen-Taylor Test in patients with muscular dystrophies.

    Science.gov (United States)

    Artilheiro, Mariana Cunha; Fávero, Francis Meire; Caromano, Fátima Aparecida; Oliveira, Acary de Souza Bulle; Carvas, Nelson; Voos, Mariana Callil; Sá, Cristina Dos Santos Cardoso de

    2017-12-08

    The Jebsen-Taylor Test evaluates upper limb function by measuring timed performance on everyday activities. The test is used to assess and monitor the progression of patients with Parkinson disease, cerebral palsy, stroke and brain injury. To analyze the reliability, internal consistency and validity of the Jebsen-Taylor Test in people with Muscular Dystrophy and to describe and classify upper limb timed performance of people with Muscular Dystrophy. Fifty patients with Muscular Dystrophy were assessed. Non-dominant and dominant upper limb performances on the Jebsen-Taylor Test were filmed. Two raters evaluated timed performance for inter-rater reliability analysis. Test-retest reliability was investigated by using intraclass correlation coefficients. Internal consistency was assessed using the Cronbach alpha. Construct validity was conducted by comparing the Jebsen-Taylor Test with the Performance of Upper Limb. The internal consistency of Jebsen-Taylor Test was good (Cronbach's α=0.98). A very high inter-rater reliability (0.903-0.999), except for writing with an Intraclass correlation coefficient of 0.772-1.000. Strong correlations between the Jebsen-Taylor Test and the Performance of Upper Limb Module were found (rho=-0.712). The Jebsen-Taylor Test is a reliable and valid measure of timed performance for people with Muscular Dystrophy. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.

  3. On individual differences in person perception: raters' personality traits relate to their psychopathy checklist-revised scoring tendencies.

    Science.gov (United States)

    Miller, Audrey K; Rufino, Katrina A; Boccaccini, Marcus T; Jackson, Rebecca L; Murrie, Daniel C

    2011-06-01

    This study investigated raters' personality traits in relation to scores they assigned to offenders using the Psychopathy Checklist-Revised (PCL-R). A total of 22 participants, including graduate students and faculty members in clinical psychology programs, completed a PCL-R training session, independently scored four criminal offenders using the PCL-R, and completed a comprehensive measure of their own personality traits. A priori hypotheses specified that raters' personality traits, and their similarity to psychopathy characteristics, would relate to raters' PCL-R scoring tendencies. As hypothesized, some raters assigned consistently higher scores on the PCL-R than others, especially on PCL-R Facets 1 and 2. Also as hypothesized, raters' scoring tendencies related to their own personality traits (e.g., higher rater Agreeableness was associated with lower PCL-R Interpersonal facet scoring). Overall, findings underscore the need for future research to examine the role of evaluator characteristics on evaluation results and the need for clinical training to address evaluators' personality influences on their ostensibly objective evaluations.

  4. Reliability of the Matson Evaluation of Social Skills with Youngsters (MESSY) for Children with Autism Spectrum Disorders

    Science.gov (United States)

    Matson, Johnny L.; Horovitz, Max; Mahan, Sara; Fodstad, Jill

    2013-01-01

    The purpose of this paper was to update the psychometrics of the "Matson Evaluation of Social Skills for Youngsters" ("MESSY") with children with Autism Spectrum Disorders (ASD), specifically with respect to internal consistency, split-half reliability, and inter-rater reliability. In Study 1, 114 children with ASD (Autistic Disorder, Asperger's…

  5. Reliability, Construct Validity and Interpretability of the Brazilian version of the Rapid Upper Limb Assessment (RULA) and Strain Index (SI).

    Science.gov (United States)

    Valentim, Daniela Pereira; Sato, Tatiana de Oliveira; Comper, Maria Luiza Caíres; Silva, Anderson Martins da; Boas, Cristiana Villas; Padula, Rosimeire Simprini

    There are very few observational methods for analysis of biomechanical exposure available in Brazilian-Portuguese. This study aimed to cross-culturally adapt and test the measurement properties of the Rapid Upper Limb Assessment (RULA) and Strain Index (SI). The cross-cultural adaptation and measurement properties test were established according to Beaton et al. and COSMIN guidelines, respectively. Several tasks that required static posture and/or repetitive motion of upper limbs were evaluated (n>100). The intra-raters' reliability for the RULA ranged from poor to almost perfect (k: 0.00-0.93), and SI from poor to excellent (ICC 2.1 : 0.05-0.99). The inter-raters' reliability was very poor for RULA (k: -0.12 to 0.13) and ranged from very poor to moderate for SI (ICC 2.1 : 0.00-0.53). The agreement was good for RULA (75-100% intra-raters, and 42.24-100% inter-raters) and to SI (EPM: -1.03% to 1.97%; intra-raters, and -0.17% to 1.51% inter-raters). The internal consistency was appropriate for RULA (α=0.88), and low for SI (α=0.65). Moderate construct validity were observed between RULA and SI, in wrist/hand-wrist posture (rho: 0.61) and strength/intensity of exertion (rho: 0.39). The adapted versions of the RULA and SI presented semantic and cultural equivalence for the Brazilian Portuguese. The RULA and SI had reliability estimates ranged from very poor to almost perfect. The internal consistency for RULA was better than the SI. The correlation between methods was moderate only of muscle request/movement repetition. Previous training is mandatory to use of observations methods for biomechanical exposure assessment, although it does not guarantee good reproducibility of these measures. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.

  6. Workplace-based assessment: raters' performance theories and constructs

    NARCIS (Netherlands)

    Govaerts, M.J.; Wiel, M.W.J. van de; Schuwirth, L.W.; Vleuten, C.P.M. van der; Muijtjens, A.M.M.

    2013-01-01

    Weaknesses in the nature of rater judgments are generally considered to compromise the utility of workplace-based assessment (WBA). In order to gain insight into the underpinnings of rater behaviours, we investigated how raters form impressions of and make judgments on trainee performance. Using

  7. Analyzing rater agreement manifest variable methods

    CERN Document Server

    von Eye, Alexander

    2014-01-01

    Agreement among raters is of great importance in many domains. For example, in medicine, diagnoses are often provided by more than one doctor to make sure the proposed treatment is optimal. In criminal trials, sentencing depends, among other things, on the complete agreement among the jurors. In observational studies, researchers increase reliability by examining discrepant ratings. This book is intended to help researchers statistically examine rater agreement by reviewing four different approaches to the technique.The first approach introduces readers to calculating coefficients that allow one to summarize agreements in a single score. The second approach involves estimating log-linear models that allow one to test specific hypotheses about the structure of a cross-classification of two or more raters'' judgments. The third approach explores cross-classifications or raters'' agreement for indicators of agreement or disagreement, and for indicators of such characteristics as trends. The fourth approach compa...

  8. Cross-cultural adaptation and reproducibility of the Brazilian-Portuguese version of the modified FRESNO Test to evaluate the competence in evidence based practice by physical therapists.

    Science.gov (United States)

    Silva, Anderson M; Costa, Lucíola C M; Comper, Maria L; Padula, Rosimeire S

    2016-01-01

    The Modified Fresno Test was developed to assess knowledge and skills of both physical therapy (PT) professionals and students to use evidence-based practice (EBP). To translate the Modified Fresno Test into Brazilian-Portuguese and to evaluate the test's reproducibility. The first step consisted of adapting the instrument into the Brazilian-Portuguese language. Then, a total of 57 participants, including PT students, PT professors and PT practitioners, completed the translated instrument. The responses from the participants were used to evaluate reproducibility of the translated instrument. Internal consistency was calculated using the Cronbach's alpha. Reliability was calculated using the intraclass correlation coefficient (ICC) for continuous variables, and the Kappa coefficient (K) for categorical variables. The agreement was assessed using the standard error of the measurement (SEM). The cross-cultural adaptation process was appropriate, providing an adequate Brazilian-Portuguese version of the instrument. The internal consistency was good (α=0.769). The reliability for inter- and intra-rater assessment were ICC=0.89 (95% CI 0.82 to 0.93); for evaluator 1 was ICC=0.85 (95% CI 0.57 to 0.93); and for evaluator 2 was ICC=0.98 (95% CI 0.97 to 0.99). The SEM was 13.04 points for inter-rater assessment, 12.57 points for rater 1 and 4.59 points for rater 2. The Brazilian-Portuguese language version of the Modified Fresno Test showed satisfactory results in terms of reproducibility. The Modified Fresno Test will allow physical therapy professionals and students to be evaluated on the use of understanding EBP.

  9. Inter-operator and inter-device agreement and reliability of the SEM Scanner.

    Science.gov (United States)

    Clendenin, Marta; Jaradeh, Kindah; Shamirian, Anasheh; Rhodes, Shannon L

    2015-02-01

    The SEM Scanner is a medical device designed for use by healthcare providers as part of pressure ulcer prevention programs. The objective of this study was to evaluate the inter-rater and inter-device agreement and reliability of the SEM Scanner. Thirty-one (31) volunteers free of pressure ulcers or broken skin at the sternum, sacrum, and heels were assessed with the SEM Scanner. Each of three operators utilized each of three devices to collect readings from four anatomical sites (sternum, sacrum, left and right heels) on each subject for a total of 108 readings per subject collected over approximately 30 min. For each combination of operator-device-anatomical site, three SEM readings were collected. Inter-operator and inter-device agreement and reliability were estimated. Over the course of this study, more than 3000 SEM Scanner readings were collected. Agreement between operators was good with mean differences ranging from -0.01 to 0.11. Inter-operator and inter-device reliability exceeded 0.80 at all anatomical sites assessed. The results of this study demonstrate the high reliability and good agreement of the SEM Scanner across different operators and different devices. Given the limitations of current methods to prevent and detect pressure ulcers, the SEM Scanner shows promise as an objective, reliable tool for assessing the presence or absence of pressure-induced tissue damage such as pressure ulcers. Copyright © 2015 Bruin Biometrics, LLC. Published by Elsevier Ltd.. All rights reserved.

  10. Measurement of respiratory rate by multiple raters in a clinical setting is unreliable

    DEFF Research Database (Denmark)

    Brabrand, Mikkel; Hallas, Peter; Folkestad, Lars

    2018-01-01

    raters while five were reviewed by eight. The videos were shown using an online system that also recorded the counted respiratory rate. RESULTS: A total of 140 nurses participated with a median of 15years' experience. The range of counted respiratory rate was minimum 10 on each video. For videos......OBJECTIVE: To evaluate the inter-observer reliability of nurses assessing respiratory rate. METHODS: We presented seven minimum 60-seconds long videos of thoraces of non-identifiable patients breathing to experienced nurses from several Danish emergency departments. Two videos were assessed by 50...

  11. An evaluation of the predictive validity and inter-rater reliability of clinical diagnostic criteria for senile dementia of Lewy body type.

    Science.gov (United States)

    McKeith, I G; Fairbairn, A F; Bothwell, R A; Moore, P B; Ferrier, I N; Thompson, P; Perry, R H

    1994-05-01

    Several recent autopsy studies suggest that senile dementia of Lewy body type (SDLT) may be the second most common neuropathologic cause of dementia in the elderly, accounting for 7 to 30% of all cases. Operational criteria for the antemortem clinical diagnosis of SDLT have already been proposed by our group. The performance of these is now examined by randomizing the case notes from a new series of SDLT, Alzheimer, and multi-infarct dementia patients for psychiatric assessment by four raters of varying clinical experience and blind to pathologic diagnosis. Using the SDLT criteria, the two most experienced raters agreed in 94% of cases (kappa = 0.87), with the least experienced rater agreeing in 78% (kappa = 0.50). Diagnostic specificity for SDLT was uniformly high (90.0 to 97.0%), with a mean sensitivity of detection of 74%, and was greater by the experienced (90.0%) than the least experienced (55%) clinician. The antemortem identification of SDLT patients can therefore be achieved with a high degree of diagnostic specificity using such operationalized criteria, although there remains a minority of patients who present with either "typical" Alzheimer-type symptoms or with paranoid or delusional symptoms in the absence of substantial cognitive impairment. Sensitivity to neuroleptics may be a useful diagnostic pointer in these patients.

  12. Validity and Reliability of the Clinical Competency Evaluation Instrument for Use among Physiotherapy Students: Pilot study.

    Science.gov (United States)

    Muhamad, Zailani; Ramli, Ayiesah; Amat, Salleh

    2015-05-01

    The aim of this study was to determine the content validity, internal consistency, test-retest reliability and inter-rater reliability of the Clinical Competency Evaluation Instrument (CCEVI) in assessing the clinical performance of physiotherapy students. This study was carried out between June and September 2013 at University Kebangsaan Malaysia (UKM), Kuala Lumpur, Malaysia. A panel of 10 experts were identified to establish content validity by evaluating and rating each of the items used in the CCEVI with regards to their relevance in measuring students' clinical competency. A total of 50 UKM undergraduate physiotherapy students were assessed throughout their clinical placement to determine the construct validity of these items. The instrument's reliability was determined through a cross-sectional study involving a clinical performance assessment of 14 final-year undergraduate physiotherapy students. The content validity index of the entire CCEVI was 0.91, while the proportion of agreement on the content validity indices ranged from 0.83-1.00. The CCEVI construct validity was established with factor loading of ≥0.6, while internal consistency (Cronbach's alpha) overall was 0.97. Test-retest reliability of the CCEVI was confirmed with a Pearson's correlation range of 0.91-0.97 and an intraclass coefficient correlation range of 0.95-0.98. Inter-rater reliability of the CCEVI domains ranged from 0.59 to 0.97 on initial and subsequent assessments. This pilot study confirmed the content validity of the CCEVI. It showed high internal consistency, thereby providing evidence that the CCEVI has moderate to excellent inter-rater reliability. However, additional refinement in the wording of the CCEVI items, particularly in the domains of safety and documentation, is recommended to further improve the validity and reliability of the instrument.

  13. Assessment of the severity of dementia: validity and reliability of the Chinese (Cantonese) version of the Hierarchic Dementia Scale (CV-HDS).

    Science.gov (United States)

    Poon, Vickie Wan-kei; Lam, Linda Chiu-wa; Wong, Samuel Yeung-shan

    2008-09-01

    With the rapid growth of the older population, early detection of cognitive deficits is crucial in slowing down functional deterioration of the elderly persons. To examine the validity and reliability of the Chinese (Cantonese) version of the Hierarchic Dementia Scale (CV-HDS) for Chinese older persons in Hong Kong. The HDS was translated into Cantonese Chinese. The content and cultural validity were evaluated by six expert panel members. Sixty-two participants with diagnosis of dementia were recruited for evaluation. Inter-rater reliability, test-retest reliability, internal consistency and concurrent validity were examined. The CV-HDS demonstrated satisfactory psychometric properties. inter-rater reliability and test-retest reliability were high (alpha=0.89 and alpha=0.94 respectively). High value of Cronbach's alpha (alpha=0.94) demonstrated good internal consistency. The concurrent validity of CV-HDS, through correlation with its scores with that of the Chinese version of Mini Mental Status Examination, was established (ranged from r=0.58 to r=0.78, pCantonese speaking Chinese people with dementia. It facilitates treatment planning to optimize the effects of functional training and rehabilitation.

  14. Effects of measurement method and transcript availability on inexperienced raters' stuttering frequency scores.

    Science.gov (United States)

    Chakraborty, Nalanda; Logan, Kenneth J

    To examine the effects of measurement method and transcript availability on the accuracy, reliability, and efficiency of inexperienced raters' stuttering frequency measurements. 44 adults, all inexperienced at evaluating stuttered speech, underwent 20 min of preliminary training in stuttering measurement and then analyzed a series of sentences, with and without access to transcripts of sentence stimuli, using either a syllable-based analysis (SBA) or an utterance-based analysis (UBA). Participants' analyses were compared between groups and to a composite analysis from two experienced evaluators. Stuttering frequency scores from the SBA and UBA groups differed significantly from the experienced evaluators' scores; however, UBA scores were significantly closer to the experienced evaluators' scores and were completed significantly faster than the SBA scores. Transcript availability facilitated scoring accuracy and efficiency in both groups. The internal reliability of stuttering frequency scores was acceptable for the SBA and UBA groups; however, the SBA group demonstrated only modest point-by-point agreement with ratings from the experienced evaluators. Given its accuracy and efficiency advantages over syllable-based analysis, utterance-based fluency analysis appears to be an appropriate context for introducing stuttering frequency measurement to raters who have limited experience in stuttering measurement. To address accuracy gaps between experienced and inexperienced raters, however, use of either analysis must be supplemented with training activities that expose inexperienced raters to the decision-making processes used by experienced raters when identifying stuttered syllables. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Development of the music therapy assessment Tool for advanced Huntington’s disease: A pilot validation study

    DEFF Research Database (Denmark)

    O'Kelly, Julian; Bodak, Rebeka

    2016-01-01

    Background: Case studies of people with Huntington's disease (HD) report that music therapy provides a range of benefits that may improve quality of life; however, no robust music therapy assessment tools exist for this population. Objective: Develop and conduct preliminary psychometric testing...... of its construct validity, internal consistency, and inter-rater and intra-rater reliability over 10 group music therapy sessions with 19 patients. Results: The resulting MATA-HD included a total of 15 items across six subscales (Arousal/Attention, Physical Presentation, Communication, Musical, Cognition......, and Psychological/Behavioral). We found good construct validity (r ≥ 0.7) for Mood, Communication Level, Communication Effectiveness, Choice, Social Behavior, Arousal, and Attention items. Cronbach's α of 0.825 indicated good internal consistency across 11 items with a common focus of engagement in therapy...

  16. Assessing adherence to the evidence base in the management of poststroke dysphagia.

    Science.gov (United States)

    Burton, Christopher; Pennington, Lindsay; Roddam, Hazel; Russell, Ian; Russell, Daphne; Krawczyk, Karen; Smith, Hilary A

    2006-01-01

    To evaluate the reliability and responsiveness to change of an audit tool to assess adherence to evidence of effectiveness in the speech and language therapy (SLT) management of poststroke dysphagia. The tool was used to review SLT practice as part of a randomized study of different education strategies. Medical records were audited before and after delivery of the trial intervention. Seventeen SLT departments in the north-west of England participated in the study. The assessment tool was used to assess the medical records of 753 patients before and 717 patients after delivery of the trial intervention across the 17 departments. A target of 10 records per department per month was sought, using systematic sampling with a random start. Inter- and intra-rater reliability were explored, together with the tool's internal consistency and responsiveness to change. The assessment tool had high face validity, although internal consistency was low (ra = 0.37). Composite scores on the tool were however responsive to differences between SLT departments. Both inter- and intra-rater reliability ranged from 'substantial' to 'near perfect' across all items. The audit tool has high face validity and measurement reliability. The use of a composite adherence score should, however, proceed with caution as internal consistency is low.

  17. A Longitudinal Examination of Rater and Ratee Effects in Performance Ratings.

    Science.gov (United States)

    Vance, Robert J.; And Others

    1983-01-01

    Investigated the consistency and loci of leniency, halo, and range restriction effects in performance ratings in a longitudinal study. Policy supervisors (N=90) rated 350 subordinates on five occasions. Concluded that reliable variance in mean ratings is partly attributable to ratees, but mainly introduced by raters. (JAC)

  18. Rater cognition: review and integration of research findings.

    Science.gov (United States)

    Gauthier, Geneviève; St-Onge, Christina; Tavares, Walter

    2016-05-01

    Given the complexity of competency frameworks, associated skills and abilities, and contexts in which they are to be assessed in competency-based education (CBE), there is an increased reliance on rater judgements when considering trainee performance. This increased dependence on rater-based assessment has led to the emergence of rater cognition as a field of research in health professions education. The topic, however, is often conceptualised and ultimately investigated using many different perspectives and theoretical frameworks. Critically analysing how researchers think about, study and discuss rater cognition or the judgement processes in assessment frameworks may provide meaningful and efficient directions in how the field continues to explore the topic. We conducted a critical and integrative review of the literature to explore common conceptualisations and unified terminology associated with rater cognition research. We identified 1045 articles on rater-based assessment in health professions education using Scorpus, Medline and ERIC and 78 articles were included in our review. We propose a three-phase framework of observation, processing and integration. We situate nine specific mechanisms and sub-mechanisms described across the literature within these phases: (i) generating automatic impressions about the person; (ii) formulating high-level inferences; (iii) focusing on different dimensions of competencies; (iv) categorising through well-developed schemata based on (a) personal concept of competence, (b) comparison with various exemplars and (c) task and context specificity; (v) weighting and synthesising information differently, (vi) producing narrative judgements; and (vii) translating narrative judgements into scales. Our review has allowed us to identify common underlying conceptualisations of observed rater mechanisms and subsequently propose a comprehensive, although complex, framework for the dynamic and contextual nature of the rating process

  19. Does a Rater's Professional Background Influence Communication Skills Assessment?

    Science.gov (United States)

    Artemiou, Elpida; Hecker, Kent G; Adams, Cindy L; Coe, Jason B

    2015-01-01

    There is increasing pressure in veterinary education to teach and assess communication skills, with the Objective Structured Clinical Examination (OSCE) being the most common assessment method. Previous research reveals that raters are a large source of variance in OSCEs. This study focused on examining the effect of raters' professional background as a source of variance when assessing students' communication skills. Twenty-three raters were categorized according to their professional background: clinical sciences (n=11), basic sciences (n=4), clinical communication (n=5), or hospital administrator/clinical skills technicians (n=3). Raters from each professional background were assigned to the same station and assessed the same students during two four-station OSCEs. Students were in year 2 of their pre-clinical program. Repeated-measures ANOVA results showed that OSCE scores awarded by the rater groups differed significantly: (F(matched_station_1) [2,91]=6.97, p=.002), (F(matched_station_2) [3,90]=13.95, p=.001), (F(matched_station_3) [3,90]=8.76, p=.001), and ((Fmatched_station_4) [2,91]=30.60, p=.001). A significant time effect between the two OSCEs was calculated for matched stations 1, 2, and 4, indicating improved student performances. Raters with a clinical communication skills background assigned scores that were significantly lower compared to the other rater groups. Analysis of written feedback provided by the clinical sciences raters showed that they were influenced by the students' clinical knowledge of the case and that they did not rely solely on the communication checklist items. This study shows that it is important to consider rater background both in recruitment and training programs for communication skills' assessment.

  20. Reliability and validity of the international dementia alliance schedule for the assessment and staging of care in China.

    Science.gov (United States)

    Wang, Xiao; Sun, Zhenghai; Xiong, Lingchuan; Semrau, Maya; He, Jianhua; Li, Yang; Zhu, Jianzhong; Zhang, Nan; Wang, Aimin; Jiang, Qinpu; Mu, Nan; Zhao, Yuping; Chen, Wei; Wu, Donghui; Zheng, Zhanjie; Sun, Yongan; Zhang, Jing; Xu, Jun; Meng, Xue; Zhao, Mei; Zhang, Haifeng; Lv, Xiaozhen; Sartorius, Norman; Li, Tao; Yu, Xin; Wang, Huali

    2017-11-21

    Clinical and social services both are important for dementia care. The International Dementia Alliance (IDEAL) Schedule for the Assessment and Staging of Care was developed to guide clinical and social care for dementia. Our study aimed to assess the validity and reliability of the IDEAL schedule in China. Two hundred eighty-two dementia patients and their caregivers were recruited from 15 hospitals in China. Each patient-caregiver dyad was assessed with the IDEAL schedule by a rater and an observer simultaneously. The Clinical Dementia Rating (CDR), Mini-Mental Status Examination (MMSE), and Caregiver Burden Inventory (CBI) were assessed for criterion validity. IDEAL repeated assessment was conducted 7-10 days after the initial interview for 62 dyads. Two hundred seventy-seven patient-caregiver dyads completed the IDEAL assessment. Inter-rater reliability for the total score of the IDEAL schedule was 0.93 (95%CI = 0.92-0.95). The inter-class coefficient for the total score of IDEAL was 0.95 for the interviewers and 0.93 for the silent raters. The IDEAL total score correlated with the global CDR score (ρ = 0.72, p valid and reliable tool for the staging of care for dementia in the Chinese population.

  1. Cross-cultural adaptation and reproducibility of the Brazilian-Portuguese version of the modified FRESNO Test to evaluate the competence in evidence based practice by physical therapists

    Science.gov (United States)

    Silva, Anderson M.; Costa, Lucíola C. M.; Comper, Maria L.; Padula, Rosimeire S.

    2016-01-01

    BACKGROUND: The Modified Fresno Test was developed to assess knowledge and skills of both physical therapy (PT) professionals and students to use evidence-based practice (EBP). OBJECTIVES: To translate the Modified Fresno Test into Brazilian-Portuguese and to evaluate the test's reproducibility. METHOD: The first step consisted of adapting the instrument into the Brazilian-Portuguese language. Then, a total of 57 participants, including PT students, PT professors and PT practitioners, completed the translated instrument. The responses from the participants were used to evaluate reproducibility of the translated instrument. Internal consistency was calculated using the Cronbach's alpha. Reliability was calculated using the intraclass correlation coefficient (ICC) for continuous variables, and the Kappa coefficient (K) for categorical variables. The agreement was assessed using the standard error of the measurement (SEM). RESULTS: The cross-cultural adaptation process was appropriate, providing an adequate Brazilian-Portuguese version of the instrument. The internal consistency was good (α=0.769). The reliability for inter- and intra-rater assessment were ICC=0.89 (95% CI 0.82 to 0.93); for evaluator 1 was ICC=0.85 (95% CI 0.57 to 0.93); and for evaluator 2 was ICC=0.98 (95% CI 0.97 to 0.99). The SEM was 13.04 points for inter-rater assessment, 12.57 points for rater 1 and 4.59 points for rater 2. CONCLUSION: The Brazilian-Portuguese language version of the Modified Fresno Test showed satisfactory results in terms of reproducibility. The Modified Fresno Test will allow physical therapy professionals and students to be evaluated on the use of understanding EBP. PMID:26786079

  2. Cross-cultural adaptation and reproducibility of the Brazilian-Portuguese version of the modified FRESNO Test to evaluate the competence in evidence based practice by physical therapists

    Directory of Open Access Journals (Sweden)

    Anderson M. Silva

    2016-02-01

    Full Text Available BACKGROUND: The Modified Fresno Test was developed to assess knowledge and skills of both physical therapy (PT professionals and students to use evidence-based practice (EBP. OBJECTIVES: To translate the Modified Fresno Test into Brazilian-Portuguese and to evaluate the test's reproducibility. METHOD: The first step consisted of adapting the instrument into the Brazilian-Portuguese language. Then, a total of 57 participants, including PT students, PT professors and PT practitioners, completed the translated instrument. The responses from the participants were used to evaluate reproducibility of the translated instrument. Internal consistency was calculated using the Cronbach's alpha. Reliability was calculated using the intraclass correlation coefficient (ICC for continuous variables, and the Kappa coefficient (K for categorical variables. The agreement was assessed using the standard error of the measurement (SEM. RESULTS: The cross-cultural adaptation process was appropriate, providing an adequate Brazilian-Portuguese version of the instrument. The internal consistency was good (α=0.769. The reliability for inter- and intra-rater assessment were ICC=0.89 (95% CI 0.82 to 0.93; for evaluator 1 was ICC=0.85 (95% CI 0.57 to 0.93; and for evaluator 2 was ICC=0.98 (95% CI 0.97 to 0.99. The SEM was 13.04 points for inter-rater assessment, 12.57 points for rater 1 and 4.59 points for rater 2. CONCLUSION: The Brazilian-Portuguese language version of the Modified Fresno Test showed satisfactory results in terms of reproducibility. The Modified Fresno Test will allow physical therapy professionals and students to be evaluated on the use of understanding EBP.

  3. On the Creation of Hypertext Links in Full-Text Documents: Measurement of Inter-Linker Consistency.

    Science.gov (United States)

    Ellis, David; And Others

    1994-01-01

    Describes a study in which several different sets of hypertext links are inserted by different people in full-text documents. The degree of similarity between the sets is measured using coefficients and topological indices. As in comparable studies of inter-indexer consistency, the sets of links used by different people showed little similarity.…

  4. Segmentation editing improves efficiency while reducing inter-expert variation and maintaining accuracy for normal brain tissues in the presence of space-occupying lesions

    International Nuclear Information System (INIS)

    Deeley, M A; Chen, A; Cmelak, A; Malcolm, A; Jaboin, J; Niermann, K; Yang, Eddy S; Yu, David S; Datteri, R D; Noble, J; Dawant, B M; Donnelly, E; Moretti, L

    2013-01-01

    Image segmentation has become a vital and often rate-limiting step in modern radiotherapy treatment planning. In recent years, the pace and scope of algorithm development, and even introduction into the clinic, have far exceeded evaluative studies. In this work we build upon our previous evaluation of a registration driven segmentation algorithm in the context of 8 expert raters and 20 patients who underwent radiotherapy for large space-occupying tumours in the brain. In this work we tested four hypotheses concerning the impact of manual segmentation editing in a randomized single-blinded study. We tested these hypotheses on the normal structures of the brainstem, optic chiasm, eyes and optic nerves using the Dice similarity coefficient, volume, and signed Euclidean distance error to evaluate the impact of editing on inter-rater variance and accuracy. Accuracy analyses relied on two simulated ground truth estimation methods: simultaneous truth and performance level estimation and a novel implementation of probability maps. The experts were presented with automatic, their own, and their peers’ segmentations from our previous study to edit. We found, independent of source, editing reduced inter-rater variance while maintaining or improving accuracy and improving efficiency with at least 60% reduction in contouring time. In areas where raters performed poorly contouring from scratch, editing of the automatic segmentations reduced the prevalence of total anatomical miss from approximately 16% to 8% of the total slices contained within the ground truth estimations. These findings suggest that contour editing could be useful for consensus building such as in developing delineation standards, and that both automated methods and even perhaps less sophisticated atlases could improve efficiency, inter-rater variance, and accuracy. (paper)

  5. Internal consistency & validity of Indian Disability Evaluation and Assessment Scale (IDEAS in patients with schizophrenia

    Directory of Open Access Journals (Sweden)

    Sandeep Grover

    2014-01-01

    Full Text Available Background & objectives: The Indian Disability Evaluation and Assessment Scale (IDEAS has been recommended for assessment and certification of disability by the Government of India (GOI. However, the psychometric properties of IDEAS as adopted by GOI remain understudied. Our aim, thus, was to study the internal consistency and validity of IDEAS in patients with schizophrenia. Methods: A total of 103 consenting patients with residual schizophrenia were assessed for disability, quality of life (QOL and psychopathology using the IDEAS, WHO QOL-100 and Positive and Negative symptom scale (PANSS respectively. Internal consistency was calculated using Cronbach′s alpha. For construct validity, relations between IDEAS, and psychopathology and QOL were studied. Results: The inter-item correlations for IDEAS were significant with a Cronbach′s alpha of 0.721. All item scores other than score on communication and understanding; total and global IDEAS scores correlated significantly with the positive, negative and general sub-scales, and total PANSS scores. Communication and understanding was significantly related to negative sub-scale score only. Total and global disability scores correlated negatively with all the domains of WHOQOL-100 (ρ<0.01. The individual IDEAS item scores correlated negatively with various WHOQOL-100 domains (ρ0< 0.01. Interpretation & conclusions: This study findings showed that the GOI-modified IDEAS had good internal consistency and construct validity as tested in patients with residual schizophrenia. Similar studies need to be done with other groups of patients.

  6. Delimiting Coefficient a from Internal Consistency and Unidimensionality

    Science.gov (United States)

    Sijtsma, Klaas

    2015-01-01

    I discuss the contribution by Davenport, Davison, Liou, & Love (2015) in which they relate reliability represented by coefficient a to formal definitions of internal consistency and unidimensionality, both proposed by Cronbach (1951). I argue that coefficient a is a lower bound to reliability and that concepts of internal consistency and…

  7. Assessment of the nursing care product (APROCENF: a reliability and construct validity study

    Directory of Open Access Journals (Sweden)

    Danielle Fabiana Cucolo

    Full Text Available ABSTRACT Objectives: to verify the reliability and construct validity estimates of the "Assessment of nursing care product" scale (APROCENF and its applicability. Methods: this validation study included a sample of 40 (inter-rater reliability and 172 (construct validity assessments performed by nurses at the end of the work shift at nine inpatient services of a teaching hospital in the Brazilian Southeast. The data were collected between February and September/2014 with interruptions. Cronbach's alpha and Spearman's correlation coefficients were calculated, as well as the intraclass correlation and the weighted kappa index (inter-rater reliability. Exploratory factor analysis was used with principal component extraction and varimax rotation (construct validity. Results: the internal consistency revealed an alpha coefficient of 0.85, item-item correlation ranging between 0.13 and 0.61 and item-total correlation between 0.43 and 0.69. Inter-rater equivalence was obtained and all items evidenced significant factor loadings. Conclusion: this research evidenced the reliability and construct validity of the scale to assess the nursing care product. Its application in nursing practice permits identifying improvements needed in the production process, contributing to management and care decisions.

  8. Precisão de avaliadores na avaliação da criatividade por meio da produção de metáforas Inter rater reliability in the creativity assessment using metaphor production

    Directory of Open Access Journals (Sweden)

    Ricardo Primi

    2007-12-01

    for a creativity test based on metaphors creation, making use of items like "The camel is the ______ of the desert". The participants were 19 people and nine raters. The metaphor test is made of nine items, to which the participants gave 513 answers. Each answer was independently measured by the raters using a scale from 0 to 3, indicating the metaphor's elaboration. Reliability was calculated by a Rasch model assuming every idea as a case and each judge as an item of a hypothetical test in procedure called judge-linking network. The inter-rater reliability varied from .52 to .83 with a mean of .74 (SD=.08 resulting in a acceptable inter-rater reliability.

  9. Reliability of visual and instrumental color matching.

    Science.gov (United States)

    Igiel, Christopher; Lehmann, Karl Martin; Ghinea, Razvan; Weyhrauch, Michael; Hangx, Ysbrand; Scheller, Herbert; Paravina, Rade D

    2017-09-01

    The aim of this investigation was to evaluate intra-rater and inter-rater reliability of visual and instrumental shade matching. Forty individuals with normal color perception participated in this study. The right maxillary central incisor of a teaching model was prepared and restored with 10 feldspathic all-ceramic crowns of different shades. A shade matching session consisted of the observer (rater) visually selecting the best match by using VITA classical A1-D4 (VC) and VITA Toothguide 3D Master (3D) shade guides and the VITA Easyshade Advance intraoral spectrophotometer (ES) to obtain both VC and 3D matches. Three shade matching sessions were held with 4 to 6 weeks between sessions. Intra-rater reliability was assessed based on the percentage of agreement for the three sessions for the same observer, whereas the inter-rater reliability was calculated as mean percentage of agreement between different observers. The Fleiss' Kappa statistical analysis was used to evaluate visual inter-rater reliability. The mean intra-rater reliability for the visual shade selection was 64(11) for VC and 48(10) for 3D. The corresponding ES values were 96(4) for both VC and 3D. The percentages of observers who matched the same shade with VC and 3D were 55(10) and 43(12), respectively, while corresponding ES values were 88(8) for VC and 92(4) for 3D. The results for visual shade matching exhibited a high to moderate level of inconsistency for both intra-rater and inter-rater comparisons. The VITA Easyshade Advance intraoral spectrophotometer exhibited significantly better reliability compared with visual shade selection. This study evaluates the ability of observers to consistently match the same shade visually and with a dental spectrophotometer in different sessions. The intra-rater and inter-rater reliability (agreement of repeated shade matching) of visual and instrumental tooth color matching strongly suggest the use of color matching instruments as a supplementary tool in

  10. The translation and psychometric assessment of the persian version of the sheehan disability scale.

    Directory of Open Access Journals (Sweden)

    Masoumeh Amin-Esmaeili

    2014-09-01

    Full Text Available The Sheehan Disability Scale (SDS assesses disability in four domains of home management, work responsibilities, close relationships and social life. The main objective of this study was to develop the Persian version of the SDS.Two steps of field work followed the Persian translation and cultural adaptation of the tool: First, the internal consistency and convergent validity was examined in 104 clinical cases recruited from inpatient and outpatient psychiatric services, using 36-item Short Form Health Survey (SF-36 and Global Assessment of Functioning (GAF. Then 88 individuals were randomly selected from the adult general population to assess internal consistency, inter-rater reliability and known group validity.In the clinical settings, Cronbach's α coefficient was 0.88 and item-total correlation ranged from 0.71 to 0.78 in various domains. The correlation between SDS and SF-36 (P< 0.001 was significant in all the areas of the performance; and neither of the correlations was statistically significant when SDS and GAF were compared. In the general population study, the SDS met a good internal consistency (α = 0.81 and known group validity, and the inter-rater reliability was perfect for "school/work responsibility ."The Persian translation of the SDS is a simple and short scale, and it seems to be a valid scale for the measurement of disability in clinical settings and in the Iranian general population.

  11. Inter-arm blood pressure difference in hospitalized elderly patients--is it consistent?

    Science.gov (United States)

    Grossman, Alon; Weiss, Avraham; Beloosesky, Yichayaou; Morag-Koren, Nira; Green, Hefziba; Grossman, Ehud

    2014-07-01

    Inter-arm blood pressure difference (IAD) is recognized as a risk factor for cardiovascular mortality. Its reproducibility in the elderly is unknown. The authors determined the prevalence and reproducibility of IAD in hospitalized elderly patients. Blood pressure was measured simultaneously in both arms on two different days in elderly individuals hospitalized in a geriatric ward. The study included 364 elderly patients (mean age, 85±5 years). Eighty-four patients (23%) had systolic IAD >10 and 62 patients (17%) had diastolic IAD >10 mm Hg. A total of 319 patients had two blood pressure measurements. Systolic and diastolic IAD remained in the same category in 203 (64%) and 231 (72%) patients, respectively. Correlations of systolic and diastolic IAD between the two measurements were poor. Consistency was not affected by age, body mass index, comorbidities, or treatment. IAD is extremely common in hospitalized elderly patients, but, because of poor consistency, its clinical significance in this population is uncertain. ©2014 Wiley Periodicals, Inc.

  12. Inter-radiologist agreement for CT scoring of pediatric splenic injuries and effect on an established clinical practice guideline.

    Science.gov (United States)

    Leschied, Jessica R; Mazza, Michael B; Davenport, Matthew; Chong, Suzanne T; Smith, Ethan A; Hoff, Carrie N; Ladino-Torres, Maria F; Khalatbari, Shokoufeh; Ehrlich, Peter F; Dillman, Jonathan R

    2016-02-01

    The American Pediatric Surgical Association (APSA) advocates for the use of a clinical practice guideline to direct management of hemodynamically stable pediatric spleen injuries. The clinical practice guideline is based on the CT score of the spleen injury according to the American Association for the Surgery of Trauma (AAST) CT scoring system. To determine the potential effect of radiologist agreement for CT scoring of pediatric spleen injuries on an established APSA clinical practice guideline. We retrospectively analyzed blunt splenic injuries occurring in children from January 2007 to January 2012 at a single level 1 trauma center (n = 90). Abdominal CT exams performed at clinical presentation were reviewed by four radiologists who documented the following: (1) splenic injury grade (AAST system), (2) arterial extravasation and (3) pseudoaneurysm. Inter-rater agreement for AAST injury grade was assessed using the multi-rater Fleiss kappa and Kendall coefficient of concordance. Inter-rater agreement was assessed using weighted (AAST injury grade) or prevalence-adjusted bias-adjusted (binary measures) kappa statistics; 95% confidence intervals were calculated. We evaluated the hypothetical effect of radiologist disagreement on an established APSA clinical practice guideline. Inter-rater agreement was good for absolute AAST injury grade (kappa: 0.64 [0.59–0.69]) and excellent for relative AAST injury grade (Kendall w: 0.90). All radiologists agreed on the AAST grade in 52% of cases. Based on an established clinical practice guideline, radiologist disagreement could have changed the decision for intensive care management in 11% (10/90) of children, changed the length of hospital stay in 44% (40/90), and changed the time to return to normal activity in 44% (40/90). Radiologist agreement when assigning splenic AAST injury grades is less than perfect, and disagreements have the potential to change management in a substantial number of pediatric patients.

  13. Inter-radiologist agreement for CT scoring of pediatric splenic injuries and effect on an established clinical practice guideline

    International Nuclear Information System (INIS)

    Leschied, Jessica R.; Smith, Ethan A.; Ladino-Torres, Maria F.; Dillman, Jonathan R.; Mazza, Michael B.; Chong, Suzanne T.; Hoff, Carrie N.; Davenport, Matthew S.; Khalatbari, Shokoufeh; Ehrlich, Peter F.

    2016-01-01

    The American Pediatric Surgical Association (APSA) advocates for the use of a clinical practice guideline to direct management of hemodynamically stable pediatric spleen injuries. The clinical practice guideline is based on the CT score of the spleen injury according to the American Association for the Surgery of Trauma (AAST) CT scoring system. To determine the potential effect of radiologist agreement for CT scoring of pediatric spleen injuries on an established APSA clinical practice guideline. We retrospectively analyzed blunt splenic injuries occurring in children from January 2007 to January 2012 at a single level 1 trauma center (n = 90). Abdominal CT exams performed at clinical presentation were reviewed by four radiologists who documented the following: (1) splenic injury grade (AAST system), (2) arterial extravasation and (3) pseudoaneurysm. Inter-rater agreement for AAST injury grade was assessed using the multi-rater Fleiss kappa and Kendall coefficient of concordance. Inter-rater agreement was assessed using weighted (AAST injury grade) or prevalence-adjusted bias-adjusted (binary measures) kappa statistics; 95% confidence intervals were calculated. We evaluated the hypothetical effect of radiologist disagreement on an established APSA clinical practice guideline. Inter-rater agreement was good for absolute AAST injury grade (kappa: 0.64 [0.59-0.69]) and excellent for relative AAST injury grade (Kendall w: 0.90). All radiologists agreed on the AAST grade in 52% of cases. Based on an established clinical practice guideline, radiologist disagreement could have changed the decision for intensive care management in 11% (10/90) of children, changed the length of hospital stay in 44% (40/90), and changed the time to return to normal activity in 44% (40/90). Radiologist agreement when assigning splenic AAST injury grades is less than perfect, and disagreements have the potential to change management in a substantial number of pediatric patients. (orig.)

  14. Inter-radiologist agreement for CT scoring of pediatric splenic injuries and effect on an established clinical practice guideline

    Energy Technology Data Exchange (ETDEWEB)

    Leschied, Jessica R.; Smith, Ethan A.; Ladino-Torres, Maria F.; Dillman, Jonathan R. [University of Michigan Health System, Department of Radiology, Section of Pediatric Radiology, C.S. Mott Children' s Hospital, Ann Arbor, MI (United States); Mazza, Michael B.; Chong, Suzanne T.; Hoff, Carrie N. [University of Michigan Health System, Department of Radiology, Division of Emergency Radiology, C.S. Mott Children' s Hospital, Ann Arbor, MI (United States); Davenport, Matthew S. [University of Michigan Health System, Department of Radiology, Division of Abdominal Imaging, C.S. Mott Children' s Hospital, Ann Arbor, MI (United States); Khalatbari, Shokoufeh [University of Michigan, Michigan Institute for Clinical and Health Research, Ann Arbor, MI (United States); Ehrlich, Peter F. [University of Michigan Health System, Department of Surgery, Section of Pediatric Surgery, C.S. Mott Children' s Hospital, Ann Arbor, MI (United States)

    2016-02-15

    The American Pediatric Surgical Association (APSA) advocates for the use of a clinical practice guideline to direct management of hemodynamically stable pediatric spleen injuries. The clinical practice guideline is based on the CT score of the spleen injury according to the American Association for the Surgery of Trauma (AAST) CT scoring system. To determine the potential effect of radiologist agreement for CT scoring of pediatric spleen injuries on an established APSA clinical practice guideline. We retrospectively analyzed blunt splenic injuries occurring in children from January 2007 to January 2012 at a single level 1 trauma center (n = 90). Abdominal CT exams performed at clinical presentation were reviewed by four radiologists who documented the following: (1) splenic injury grade (AAST system), (2) arterial extravasation and (3) pseudoaneurysm. Inter-rater agreement for AAST injury grade was assessed using the multi-rater Fleiss kappa and Kendall coefficient of concordance. Inter-rater agreement was assessed using weighted (AAST injury grade) or prevalence-adjusted bias-adjusted (binary measures) kappa statistics; 95% confidence intervals were calculated. We evaluated the hypothetical effect of radiologist disagreement on an established APSA clinical practice guideline. Inter-rater agreement was good for absolute AAST injury grade (kappa: 0.64 [0.59-0.69]) and excellent for relative AAST injury grade (Kendall w: 0.90). All radiologists agreed on the AAST grade in 52% of cases. Based on an established clinical practice guideline, radiologist disagreement could have changed the decision for intensive care management in 11% (10/90) of children, changed the length of hospital stay in 44% (40/90), and changed the time to return to normal activity in 44% (40/90). Radiologist agreement when assigning splenic AAST injury grades is less than perfect, and disagreements have the potential to change management in a substantial number of pediatric patients. (orig.)

  15. Examining the interrater reliability of the Hare Psychopathy Checklist-Revised across a large sample of trained raters.

    Science.gov (United States)

    Blais, Julie; Forth, Adelle E; Hare, Robert D

    2017-06-01

    The goal of the current study was to assess the interrater reliability of the Psychopathy Checklist-Revised (PCL-R) among a large sample of trained raters (N = 280). All raters completed PCL-R training at some point between 1989 and 2012 and subsequently provided complete coding for the same 6 practice cases. Overall, 3 major conclusions can be drawn from the results: (a) reliability of individual PCL-R items largely fell below any appropriate standards while the estimates for Total PCL-R scores and factor scores were good (but not excellent); (b) the cases representing individuals with high psychopathy scores showed better reliability than did the cases of individuals in the moderate to low PCL-R score range; and (c) there was a high degree of variability among raters; however, rater specific differences had no consistent effect on scoring the PCL-R. Therefore, despite low reliability estimates for individual items, Total scores and factor scores can be reliably scored among trained raters. We temper these conclusions by noting that scoring standardized videotaped case studies does not allow the rater to interact directly with the offender. Real-world PCL-R assessments typically involve a face-to-face interview and much more extensive collateral information. We offer recommendations for new web-based training procedures. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  16. Emotions and assessment: considerations for rater-based judgements of entrustment.

    Science.gov (United States)

    Gomez-Garibello, Carlos; Young, Meredith

    2018-03-01

    Assessment is subject to increasing scrutiny as medical education transitions towards a competency-based medical education (CBME) model. Traditional perspectives on the roles of assessment emphasise high-stakes, summative assessment, whereas CBME argues for formative assessment. Revisiting conceptualisations about the roles and formats of assessment in medical education provides opportunities to examine understandings and expectations of the assessment of learners. The act of the rater generating scores might be considered as an exclusively cognitive exercise; however, current literature has drawn attention to the notion of raters as measurement instruments, thereby attributing additional factors to their decision-making processes, such as social considerations and intuition. However, the literature has not comprehensively examined the influence of raters' emotions during assessment. In this narrative review, we explore the influence of raters' emotions in the assessment of learners. We summarise existing literature that describes the role of emotions in assessment broadly, and rater-based assessment specifically, across a variety of fields. The literature related to emotions and assessment is examined from different perspectives, including those of educational context, decision making and rater cognition. We use the concept of entrustable professional activities (EPAs) to contextualise a discussion of the ways in which raters' emotions may have meaningful impacts on the decisions they make in clinical settings. This review summarises findings from different perspectives and identifies areas for consideration for the role of emotion in rater-based assessment, and areas for future research. We identify and discuss three different interpretations of the influence of raters' emotions during assessments: (i) emotions lead to biased decision making; (ii) emotions contribute random noise to assessment, and (iii) emotions constitute legitimate sources of information that

  17. Rater Accuracy and Training Group Effects in Expert- and Supervisor-Based Monitoring Systems

    Science.gov (United States)

    Baird, Jo-Anne; Meadows, Michelle; Leckie, George; Caro, Daniel

    2017-01-01

    This study evaluated rater accuracy with rater-monitoring data from high stakes examinations in England. Rater accuracy was estimated with cross-classified multilevel modelling. The data included face-to-face training and monitoring of 567 raters in 110 teams, across 22 examinations, giving a total of 5500 data points. Two rater-monitoring systems…

  18. 15th International Conference on Global Research and Education Inter-Academia 2016

    CERN Document Server

    Szewczyk, Roman

    2017-01-01

    Developments in the connected fields of solid state physics, bioengineering, mechatronics and nanometrology have had a profound effect on the emergence of modern technologies and their influence on our lives. In all of these fields, understanding and improving the basic underlying materials is of crucial importance for the development of systems and applications. The International Conference Inter-Academia 2016 has successfully married these fields and become a regular feature in the conference calendar. It consisted of seven thematic areas in the field of material science, nanotechnology, biotechnology, plasma physics, metrology, robotics, sensors and devices. The book Recent Global Research and Education: Technological Challenges is intended for use in academic, government and industry R&D departments, as an indispensable reference tool for the years to come. Also, we hope that the volume can serve the world community as the definitive reference source in Advances in Intelligent Systems and Computing. T...

  19. Motivational Interviewing Skills in Health Care Encounters (MISHCE): Development and psychometric testing of an assessment tool.

    Science.gov (United States)

    Petrova, Tatjana; Kavookjian, Jan; Madson, Michael B; Dagley, John; Shannon, David; McDonough, Sharon K

    2015-01-01

    Motivational interviewing (MI) has demonstrated a significant impact as an intervention strategy for addiction management, change in lifestyle behaviors, and adherence to prescribed medication and other treatments. Key elements to studying MI include training in MI of professionals who will use it, assessment of skills acquisition in trainees, and the use of a validated skills assessment tool. The purpose of this research project was to develop a psychometrically valid and reliable tool that has been designed to assess MI skills competence in health care provider trainees. The goal was to develop an assessment tool that would evaluate the acquisition and use of specific MI skills and principles, as well as the quality of the patient-provider therapeutic alliance in brief health care encounters. To address this purpose, specific steps were followed, beginning with a literature review. This review contributed to the development of relevant conceptual and operational definitions, selecting a scaling technique and response format, and methods for analyzing validity and reliability. Internal consistency reliability was established on 88 video recorded interactions. The inter-rater and test-retest reliability were established using randomly selected 18 from the 88 interactions. The assessment tool Motivational Interviewing Skills for Health Care Encounters (MISHCE) and a manual for use of the tool were developed. Validity and reliability of MISHCE were examined. Face and content validity were supported with well-defined conceptual and operational definitions and feedback from an expert panel. Reliability was established through internal consistency, inter-rater reliability, and test-retest reliability. The overall internal consistency reliability (Cronbach's alpha) for all fifteen items was 0.75. MISHCE demonstrated good inter-rater reliability and good to excellent test-retest reliability. MISHCE assesses the health provider's level of knowledge and skills in brief

  20. The Americleft Project: A Modification of Asher-McDade Method for Rating Nasolabial Esthetics in Patients With Unilateral Cleft Lip and Palate Using Q-sort.

    Science.gov (United States)

    Stoutland, Alicia; Long, Ross E; Mercado, Ana; Daskalogiannakis, John; Hathaway, Ronald R; Russell, Kathleen A; Singer, Emily; Semb, Gunvor; Shaw, William C

    2017-11-01

    The purpose of this study was to investigate ways to improve rater reliability and satisfaction in nasolabial esthetic evaluations of patients with complete unilateral cleft lip and palate (UCLP), by modifying the Asher-McDade method with use of Q-sort methodology. Blinded ratings of cropped photographs of one hundred forty-nine 5- to 7-year-old consecutively treated patients with complete UCLP from 4 different centers were used in a rating of frontal and profile nasolabial esthetic outcomes by 6 judges involved in the Americleft Project's intercenter outcome comparisons. Four judges rated in previous studies using the original Asher-McDade approach. For the Q-sort modification, rather than projection of images, each judge had cards with frontal and profile photographs of each patient and rated them on a scale of 1 to 5 for vermillion border, nasolabial frontal, and profile, using the Q-sort method with placement of cards into categories 1 to 5. Inter- and intrarater reliabilities were calculated using the Weighted Kappa (95% confidence interval). For 4 raters, the reliabilities were compared with those in previous studies. There was no significant improvement in inter-rater reliabilities using the new method. Intrarater reliability consistently improved. All raters preferred the Q-sort method with rating cards rather than a PowerPoint of photos, which improved internal consistency in rating compared to previous studies using the original Asher-McDade method. All raters preferred this method because of the ability to continuously compare photos and adjust relative ratings between patients.

  1. Development of the Music Therapy Assessment Tool for Advanced Huntington's Disease: A Pilot Validation Study.

    Science.gov (United States)

    O'Kelly, Julian; Bodak, Rebeka

    2016-01-01

    Case studies of people with Huntington's disease (HD) report that music therapy provides a range of benefits that may improve quality of life; however, no robust music therapy assessment tools exist for this population. Develop and conduct preliminary psychometric testing of a music therapy assessment tool for patients with advanced HD. First, we established content and face validity of the Music Therapy Assessment Tool for Advanced HD (MATA-HD) through focus groups and field testing. Second, we examined psychometric properties of the resulting MATA-HD in terms of its construct validity, internal consistency, and inter-rater and intra-rater reliability over 10 group music therapy sessions with 19 patients. The resulting MATA-HD included a total of 15 items across six subscales (Arousal/Attention, Physical Presentation, Communication, Musical, Cognition, and Psychological/Behavioral). We found good construct validity (r ≥ 0.7) for Mood, Communication Level, Communication Effectiveness, Choice, Social Behavior, Arousal, and Attention items. Cronbach's α of 0.825 indicated good internal consistency across 11 items with a common focus of engagement in therapy. The inter-rater reliability (IRR) Intra-Class Coefficient (ICC) scores averaged 0.65, and a mean intra-rater ICC reliability of 0.68 was obtained. Further training and retesting provided a mean of IRR ICC of 0.7. Preliminary data indicate that the MATA-HD is a promising tool for measuring patient responses to music therapy interventions across psychological, physical, social, and communication domains of functioning in patients with advanced HD. © the American Music Therapy Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Evaluation of Vocal Fold Motion Abnormalities: Are We All Seeing the Same Thing?

    Science.gov (United States)

    Madden, Lyndsay L; Rosen, Clark A

    2017-01-01

    Flexible laryngoscopy is the principle tool for the evaluation of vocal fold motion. As of yet, no consistent, unified outcome metric has been developed for vocal fold paralysis/immobility research. The goal of this study was to evaluate vocal fold motion assessment (inter- and intra-rater reliability) among general otolaryngologists and fellowship-trained laryngologists. Prospective video perceptual analysis study. Flexible laryngoscopic examinations, with sound, of 15 unique patient cases (20 seconds each) were sent to 10 general otolaryngologists and 10 fellowship-trained laryngologists blinded to clinical history. Reviewers were given written definitions of vocal fold mobility and immobility and two video examples. The cases included bilateral vocal fold mobility (six), unilateral vocal fold immobility (five), and unilateral vocal fold hypomobility (four). Five examinations were repeated to determine intra-rater reliability. Participants were asked to judge if there was or there was no purposeful motion, as described by written definitions, for each vocal fold (800 tokens in total). Twenty reviewers (100%) replied. Both general otolaryngologists and fellowship-trained laryngologists had an overall inter-rater reliability of 95%. Difference in inter-rater reliability between the two groups of raters was negligible: 95% for general otolaryngologists and 97.5% for fellowship-trained laryngologists. There was no variability in intra-rater reliability within either rater group (99%). Intra- and inter-rater agreement in determining whether the patient had purposeful vocal fold motion on flexible laryngoscopic examination was excellent in both groups. This study demonstrates that otolaryngologists can consistently and accurately judge the presence and the absence of vocal fold motion. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  3. How faculty members experience workplace-based assessment rater training: a qualitative study.

    Science.gov (United States)

    Kogan, Jennifer R; Conforti, Lisa N; Bernabeo, Elizabeth; Iobst, William; Holmboe, Eric

    2015-07-01

    Direct observation of clinical skills is a common approach in workplace-based assessment (WBA). Despite widespread use of the mini-clinical evaluation exercise (mini-CEX), faculty development efforts are typically required to improve assessment quality. Little consensus exists regarding the most effective training methods, and few studies explore faculty members' reactions to rater training. This study was conducted to qualitatively explore the experiences of faculty staff with two rater training approaches - performance dimension training (PDT) and a modified approach to frame of reference training (FoRT) - to elucidate how such faculty development can be optimally designed. In a qualitative study of a multifaceted intervention using complex intervention principles, 45 out-patient resident faculty preceptors from 26 US internal medicine residency programmes participated in a rater training faculty development programme. All participants were interviewed individually and in focus groups during and after the programme to elicit how the training influenced their approach to assessment. A constructivist grounded theory approach was used to analyse the data. Many participants perceived that rater training positively influenced their approach to direct observation and feedback, their ability to use entrustment as the standard for assessment, and their own clinical skills. However, barriers to implementation and change included: (i) a preference for holistic assessment over frameworks; (ii) challenges in defining competence; (iii) difficulty in changing one's approach to assessment, and (iv) concerns about institutional culture and buy-in. Rater training using PDT and a modified approach to FoRT can provide faculty staff with assessment skills that are congruent with principles of criterion-referenced assessment and entrustment, and foundational principles of competency-based education, while providing them with opportunities to reflect on their own clinical skills

  4. A comparison of Google Glass and traditional video vantage points for bedside procedural skill assessment.

    Science.gov (United States)

    Evans, Heather L; O'Shea, Dylan J; Morris, Amy E; Keys, Kari A; Wright, Andrew S; Schaad, Douglas C; Ilgen, Jonathan S

    2016-02-01

    This pilot study assessed the feasibility of using first person (1P) video recording with Google Glass (GG) to assess procedural skills, as compared with traditional third person (3P) video. We hypothesized that raters reviewing 1P videos would visualize more procedural steps with greater inter-rater reliability than 3P rating vantages. Seven subjects performed simulated internal jugular catheter insertions. Procedures were recorded by both Google Glass and an observer's head-mounted camera. Videos were assessed by 3 expert raters using a task-specific checklist (CL) and both an additive- and summative-global rating scale (GRS). Mean scores were compared by t-tests. Inter-rater reliabilities were calculated using intraclass correlation coefficients. The 1P vantage was associated with a significantly higher mean CL score than the 3P vantage (7.9 vs 6.9, P = .02). Mean GRS scores were not significantly different. Mean inter-rater reliabilities for the CL, additive-GRS, and summative-GRS were similar between vantages. 1P vantage recordings may improve visualization of tasks for behaviorally anchored instruments (eg, CLs), whereas maintaining similar global ratings and inter-rater reliability when compared with conventional 3P vantage recordings. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. [Intra-rater Reliability for the Questionnaire on Activity Limitations and Participation Restrictions of Children With ADHD].

    Science.gov (United States)

    Salamanca Duque, Luisa Matilde; Naranjo Aristizábal, María Mercedes; Gutiérrez Ríos, Gladys Helena; Prieto, Jaime Bayona

    2014-03-01

    Questionnaires for evaluating activity limitations and participation restrictions in children with ADHD (CLARP-TDAH) has recently been developed in Colombia, based on the suggestions made by the WHO from the International Classification of Functioning, Disability and Health (ICF), allowing clinical evaluation beyond an evaluation of the functionality and functioning of children in their family and school environments. Previous research with the questionnaire proved useful in the multidisciplinary approach of Colombian children with ADHD. This study determines the level of intra-rater reliability for questionnaires CLARP-TDAH Parents and Teachers. The study included a non-random sample of 203 Colombian children attending school and diagnosed with ADHD. Intra-rater reliability and the reproducibility of the results was determined using the Kappa index. The informants were parents and teachers. Kappa values >0.7 were obtained for the intra-rater reliability of the questionnaire domains of CLARP-TDAH Parents, while for CLARP-TDAH Teachers domains these values were >0.8. CLARP-TDAH questionnaires are a tool with a good level of intra-rater reliability, which allows a reliable assessment of activity limitations and participation restrictions in order to determine the level of functioning at home and school. Copyright © 2014 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.

  6. Effect of rater training on reliability and accuracy of mini-CEX scores: a randomized, controlled trial.

    Science.gov (United States)

    Cook, David A; Dupras, Denise M; Beckman, Thomas J; Thomas, Kris G; Pankratz, V Shane

    2009-01-01

    Mini-CEX scores assess resident competence. Rater training might improve mini-CEX score interrater reliability, but evidence is lacking. Evaluate a rater training workshop using interrater reliability and accuracy. Randomized trial (immediate versus delayed workshop) and single-group pre/post study (randomized groups combined). Academic medical center. Fifty-two internal medicine clinic preceptors (31 randomized and 21 additional workshop attendees). The workshop included rater error training, performance dimension training, behavioral observation training, and frame of reference training using lecture, video, and facilitated discussion. Delayed group received no intervention until after posttest. Mini-CEX ratings at baseline (just before workshop for workshop group), and four weeks later using videotaped resident-patient encounters; mini-CEX ratings of live resident-patient encounters one year preceding and one year following the workshop; rater confidence using mini-CEX. Among 31 randomized participants, interrater reliabilities in the delayed group (baseline intraclass correlation coefficient [ICC] 0.43, follow-up 0.53) and workshop group (baseline 0.40, follow-up 0.43) were not significantly different (p = 0.19). Mean ratings were similar at baseline (delayed 4.9 [95% confidence interval 4.6-5.2], workshop 4.8 [4.5-5.1]) and follow-up (delayed 5.4 [5.0-5.7], workshop 5.3 [5.0-5.6]; p = 0.88 for interaction). For the entire cohort, rater confidence (1 = not confident, 6 = very confident) improved from mean (SD) 3.8 (1.4) to 4.4 (1.0), p = 0.018. Interrater reliability for ratings of live encounters (entire cohort) was higher after the workshop (ICC 0.34) than before (ICC 0.18) but the standard error of measurement was similar for both periods. Rater training did not improve interrater reliability or accuracy of mini-CEX scores. clinicaltrials.gov identifier NCT00667940

  7. Cross-cultural validation of the Persian version of the Functional Independence Measure for patients with stroke.

    Science.gov (United States)

    Naghdi, Soofia; Ansari, Noureddin Nakhostin; Raji, Parvin; Shamili, Aryan; Amini, Malek; Hasson, Scott

    2016-01-01

    To translate and cross-culturally adapt the Functional Independence Measure (FIM) into the Persian language and to test the reliability and validity of the Persian FIM (PFIM) in patients with stroke. In this cross-sectional study carried out in an outpatient stroke rehabilitation center, 40 patients with stroke (mean age 60 years) were participated. A standard forward-backward translation method and expert panel validation was followed to develop the PFIM. Two experienced occupational therapists (OTs) assessed the patients independently in all items of the PFIM in a single session for inter-rater reliability. One of the OTs reassessed the patients after 1 week for intra-rater reliability. There were no floor or ceiling effects for the PFIM. Excellent inter-rater and intra-rater reliability was noted for the PFIM total score, motor and cognitive subscales (ICC(agreement)0.88-0.98). According to the Bland-Altman agreement analysis, there was no systematic bias between raters and within raters. The internal consistency of the PFIM was with Cronbach's alpha from 0.70 to 0.96. The principal component analysis with varimax rotation indicated a three-factor structure: (1) self-care and mobility; (2) sphincter control and (3) cognitive that jointly accounted for 74.8% of the total variance. Construct validity was supported by a significant Pearson correlation between the PFIM and the Persian Barthel Index (r = 0.95; p Persian patients with stroke. The Functional Independence Measure (FIM) is an outcome measure for disability based on the International Classification of Functioning, Disability and Health (ICF). The FIM was cross-culturally adapted and validated into Persian language. The Persian version of the FIM (PFIM) is reliable and valid for assessing functional status of patients with stroke. The PFIM can be used in Persian speaking countries to assess the limitations in activities of daily living of patients with stroke.

  8. Development of a valid and reliable test to assess trauma radiograph interpretation performance

    International Nuclear Information System (INIS)

    Neep, M.J.; Steffens, T.; Riley, V.; Eastgate, P.; McPhail, S.M.

    2017-01-01

    Objectives: The purpose of this investigation was to develop and examine the preliminary validity and reliability among radiographers of a test to assess trauma radiograph interpretation performance suitable for use among health professionals. Methods: Stage 1 examined 14,159 consecutive appendicular and axial examinations from a hospital emergency department over a 12 month period to quantify a typical anatomical region case-mix of trauma radiographs. A sample of radiographic cases representative of affected anatomical regions was then developed into the Image Interpretation Test (IIT). Stage 2 involved prospective investigations of the IIT's reliability (inter-rater, intra-rater, internal consistency) and validity (concurrent) among 41 radiographers. Results: The IIT included 60 cases. The median (interquartile range) clinical experience of participants was 5 (2–10) years. Case scores were internally consistent (Cronbach's alpha = 0.90). Favourable inter-rater reliability (kappa > 0.70 for 58/60 cases, Intra-class correlation coefficient (ICC) > 0.99 for total score) and intra-rater reliability (kappa > 0.90 for 60/60 cases, ICC > 0.99 for total score) was observed. There was a positive association between radiographers' confidence in image interpretation and IIT score (coefficient = 1.52, r-squared = 0.60, p < 0.001). Conclusions: The IIT developed during this investigation included a selection of radiographic cases consistent with anatomical regions represented in an adult trauma case-mix. This study has also provided foundational preliminary evidence to support the reliability and validity of the IIT among radiographers. The findings suggest that it is possible to assess image interpretation performance of adult trauma radiographs with this test. - Highlights: • Development of an Image Interpretation Test (IIT). • Cases consistent with anatomical regions represented in a typical adult trauma case-mix. • Development of a

  9. Internal Branding and Employee Brand Consistent Behaviours

    DEFF Research Database (Denmark)

    Mazzei, Alessandra; Ravazzani, Silvia

    2017-01-01

    constitutive processes. In particular, the paper places emphasis on the role and kinds of communication practices as a central part of the nonnormative and constitutive internal branding process. The paper also discusses an empirical study based on interviews with 32 Italian and American communication managers...... and 2 focus groups with Italian communication managers. Findings show that, in order to enhance employee brand consistent behaviours, the most effective communication practices are those characterised as enablement-oriented. Such a communication creates the organizational conditions adequate to sustain......Employee behaviours conveying brand values, named brand consistent behaviours, affect the overall brand evaluation. Internal branding literature highlights a knowledge gap in terms of communication practices intended to sustain such behaviours. This study contributes to the development of a non...

  10. Variability in multi-rater competency assessments

    Directory of Open Access Journals (Sweden)

    D. Theron

    1999-06-01

    Full Text Available The purpose of this study was to determine if significant differences exist between the multi-rater competency evaluations of employees operating within a flat organisational structure. Sixty-eight marketing employees were each evaluated by a number of raters including themselves, their managers, customers and peers. A competency questionnaire was developed by using the input of the employees who took part in the appraisal. Using paired t- tests significant differences between the various groups of raters were found. These findings and the implications thereof are discussed. Opsomming Die doel van hierdie studie was om te bepaal of daar beduidende verskille bestaan tussen die multi-beoordelaar bevoegdheidsevaluerings van werknemers wat binne 'n plat organisasiestruktuur funksioneer. Agt-en-sestig bemarkingswerknemers is elk beoordeel deur 'n aantal beoordelaars wat die werknemers self, hul bestuurders, kliente en kollegas ingesluit het. 'n Bevoegdheidsvraelys is ontwikkel deur gebruik te maak van die insette van die werknemers wat deel geneem het aan die evaluering. Deur die gebruik van gepaarde t-toetse is gevind dat daar beduidende verskille bestaan tussen sommige van die groepe beoordelaars. Hierdie bevindinge en die implikasies daarvan word bespreek.

  11. Explaining sexual harassment judgments: looking beyond gender of the rater.

    Science.gov (United States)

    O'Connor, Maureen; Gutek, Barbara A; Stockdale, Margaret; Geer, Tracey M; Melançon, Renée

    2004-02-01

    In two decades of research on sexual harassment, one finding that appears repeatedly is that gender of the rater influences judgments about sexual harassment such that women are more likely than men to label behavior as sexual harassment. Yet, sexual harassment judgments are complex, particularly in situations that culminate in legal proceedings. And, this one variable, gender, may have been overemphasized to the exclusion of other situational and rater characteristic variables. Moreover, why do gender differences appear? As work by Wiener and his colleagues have done (R. L. Wiener et al., 2002; R. L. Wiener & L. Hurt, 2000; R. L. Wiener, L. Hurt, B. Russell, K. Mannen, & C. Gasper, 1997), this study attempts to look beyond gender to answer this question. In the studies reported here, raters (undergraduates and community adults), either read a written scenario or viewed a videotaped reenactment of a sexual harassment trial. The nature of the work environment was manipulated to see what, if any, effect the context would have on gender effects. Additionally, a number of rater characteristics beyond gender were measured, including ambivalent sexism attitudes of the raters, their judgments of complainant credibility, and self-referencing that might help explain rater judgments. Respondent gender, work environment, and community vs. student sample differences produced reliable differences in sexual harassment ratings in both the written and video trial versions of the study. The gender and sample differences in the sexual harassment ratings, however, are explained by a model which incorporates hostile sexism, perceptions of the complainants credibility, and raters' own ability to put themselves in the complainant's position (self-referencing).

  12. Cross- cultural validation of the Brazilian Portuguese version of the Social Phobia Inventory (SPIN): study of the items and internal consistency.

    Science.gov (United States)

    Osório, Flávia de Lima; Crippa, José Alexandre S; Loureiro, Sonia Regina

    2009-03-01

    The objective of the present study was to carry out the cross- cultural validation for Brazilian Portuguese of the Social Phobia Inventory, an instrument for the evaluation of fear, avoidance and physiological symptoms associated with social anxiety disorder. The process of translation and adaptation involved four bilingual professionals, appreciation and approval of the back- translation by the authors of the original scale, a pilot study with 30 Brazilian university students, and appreciation by raters who confirmed the face validity of the Portuguese version, which was named ' Inventário de Fobia Social' . As part of the psychometric study of the Social Phobia Inventory, analysis of the items and evaluation of the internal consistency of the instrument were performed in a study conducted on 2314 university students. The results demonstrated that item 11, related to the fear of public speaking, was the most frequently scored item. The correlation of the items with the total score was quite adequate, ranging from 0.44 to 0.71, as was the internal consistency, which ranged from 0.71 to 0.90. The authors conclude that the Brazilian Portuguese version of the Social Phobia Inventory proved to be adequate regarding the psychometric properties initially studied, with qualities quite close to those of the original study. Studies that will evaluate the remaining indicators of validity of the Social Phobia Inventory in clinical and non-clinical samples are considered to be opportune and necessary.

  13. Validity and reliability of a Malay version of the Lawton instrumental activities of daily living scale among the Malay speaking elderly in Malaysia.

    Science.gov (United States)

    Kadar, Masne; Ibrahim, Suhaili; Razaob, Nor Afifi; Chai, Siaw Chui; Harun, Dzalani

    2018-02-01

    The Lawton Instrumental Activities of Daily Living Scale is a tool often used to assess independence among elderly at home. Its suitability to be used with the elderly population in Malaysia has not been validated. This current study aimed to assess the validity and reliability of the Lawton Instrumental Activities of Daily Living Scale - Malay Version to Malay speaking elderly in Malaysia. This study was divided into three phases: (1) translation and linguistic validity involving both forward and backward translations; (2) establishment of face validity and content validity; and (3) establishment of reliability involving inter-rater, test-retest and internal consistency analyses. Data used for these analyses were obtained by interviewing 65 elderly respondents. Percentages of Content Validity Index for 4 criteria were from 88.89 to 100.0. The Cronbach α coefficient for internal consistency was 0.838. Intra-class Correlation Coefficient of inter-rater reliability and test-retest reliability was 0.957 and 0.950 respectively. The result shows that the Lawton Instrumental Activities of Daily Living Scale - Malay Version has excellent reliability and validity for use with the Malay speaking elderly people in Malaysia. This scale could be used by professionals to assess functional ability of elderly who live independently in community. © 2018 Occupational Therapy Australia.

  14. Assessing the quality of decision support technologies using the International Patient Decision Aid Standards instrument (IPDASi.

    Directory of Open Access Journals (Sweden)

    Glyn Elwyn

    Full Text Available To describe the development, validation and inter-rater reliability of an instrument to measure the quality of patient decision support technologies (decision aids.Scale development study, involving construct, item and scale development, validation and reliability testing.There has been increasing use of decision support technologies--adjuncts to the discussions clinicians have with patients about difficult decisions. A global interest in developing these interventions exists among both for-profit and not-for-profit organisations. It is therefore essential to have internationally accepted standards to assess the quality of their development, process, content, potential bias and method of field testing and evaluation.Scale development study, involving construct, item and scale development, validation and reliability testing.Twenty-five researcher-members of the International Patient Decision Aid Standards Collaboration worked together to develop the instrument (IPDASi. In the fourth Stage (reliability study, eight raters assessed thirty randomly selected decision support technologies.IPDASi measures quality in 10 dimensions, using 47 items, and provides an overall quality score (scaled from 0 to 100 for each intervention. Overall IPDASi scores ranged from 33 to 82 across the decision support technologies sampled (n = 30, enabling discrimination. The inter-rater intraclass correlation for the overall quality score was 0.80. Correlations of dimension scores with the overall score were all positive (0.31 to 0.68. Cronbach's alpha values for the 8 raters ranged from 0.72 to 0.93. Cronbach's alphas based on the dimension means ranged from 0.50 to 0.81, indicating that the dimensions, although well correlated, measure different aspects of decision support technology quality. A short version (19 items was also developed that had very similar mean scores to IPDASi and high correlation between short score and overall score 0.87 (CI 0.79 to 0.92.This work

  15. Training Raters to Assess Adult ADHD: Reliability of Ratings

    Science.gov (United States)

    Adler, Lenard A.; Spencer, Thomas; Faraone, Stephen V.; Reimherr, Fred W.; Kelsey, Douglas; Michelson, David; Biederman, Joseph

    2005-01-01

    The standardization of ADHD ratings in adults is important given their differing symptom presentation. The authors investigated the agreement and reliability of rater standardization in a large-scale trial of atomoxetine in adults with ADHD. Training of 91 raters for the investigator-administered ADHD Rating Scale (ADHDRS-IV-Inv) occurred prior to…

  16. Reliability of the standard goniometry and diagrammatic recording of finger joint angles: a comparative study with healthy subjects and non-professional raters.

    Science.gov (United States)

    Macionis, Valdas

    2013-01-09

    Diagrammatic recording of finger joint angles by using two criss-crossed paper strips can be a quick substitute to the standard goniometry. As a preliminary step toward clinical validation of the diagrammatic technique, the current study employed healthy subjects and non-professional raters to explore whether reliability estimates of the diagrammatic goniometry are comparable with those of the standard procedure. The study included two procedurally different parts, which were replicated by assigning 24 medical students to act interchangeably as 12 subjects and 12 raters. A larger component of the study was designed to compare goniometers side-by-side in measurement of finger joint angles varying from subject to subject. In the rest of the study, the instruments were compared by parallel evaluations of joint angles similar for all subjects in a situation of simulated change of joint range of motion over time. The subjects used special guides to position the joints of their left ring finger at varying angles of flexion and extension. The obtained diagrams of joint angles were converted to numerical values by computerized measurements. The statistical approaches included calculation of appropriate intraclass correlation coefficients, standard errors of measurements, proportions of measurement differences of 5 or less degrees, and significant differences between paired observations. Reliability estimates were similar for both goniometers. Intra-rater and inter-rater intraclass correlation coefficients ranged from 0.69 to 0.93. The corresponding standard errors of measurements ranged from 2.4 to 4.9 degrees. Repeated measurements of a considerable number of raters fell within clinically non-meaningful 5 degrees of each other in proportions comparable with a criterion value of 0.95. Data collected with both instruments could be similarly interpreted in a simulated situation of change of joint range of motion over time. The paper goniometer and the standard goniometer can

  17. Detecting rater bias using a person-fit statistic: a Monte Carlo simulation study.

    Science.gov (United States)

    Aubin, André-Sébastien; St-Onge, Christina; Renaud, Jean-Sébastien

    2018-04-01

    With the Standards voicing concern for the appropriateness of response processes, we need to explore strategies that would allow us to identify inappropriate rater response processes. Although certain statistics can be used to help detect rater bias, their use is complicated by either a lack of data about their actual power to detect rater bias or the difficulty related to their application in the context of health professions education. This exploratory study aimed to establish the worthiness of pursuing the use of l z to detect rater bias. We conducted a Monte Carlo simulation study to investigate the power of a specific detection statistic, that is: the standardized likelihood l z person-fit statistics (PFS). Our primary outcome was the detection rate of biased raters, namely: raters whom we manipulated into being either stringent (giving lower scores) or lenient (giving higher scores), using the l z statistic while controlling for the number of biased raters in a sample (6 levels) and the rate of bias per rater (6 levels). Overall, stringent raters (M = 0.84, SD = 0.23) were easier to detect than lenient raters (M = 0.31, SD = 0.28). More biased raters were easier to detect then less biased raters (60% bias: 62, SD = 0.37; 10% bias: 43, SD = 0.36). The PFS l z seems to offer an interesting potential to identify biased raters. We observed detection rates as high as 90% for stringent raters, for whom we manipulated more than half their checklist. Although we observed very interesting results, we cannot generalize these results to the use of PFS with estimated item/station parameters or real data. Such studies should be conducted to assess the feasibility of using PFS to identify rater bias.

  18. Exploring rater agreement: configurations of agreement and disagreement

    Directory of Open Access Journals (Sweden)

    ALEXANDER VON EYE

    2006-03-01

    Full Text Available At the level of manifest categorical variables, a large number of coefficients and models for the examination of rater agreement has been proposed and used for descriptive and explanatory purposes. This article focuses on exploring rater agreement. Configural Frequency Analysis (CFA is proposed as a method of exploration of cross-classifications of raters’ judgements. CFA allows researchers to (1 examine individual cells and sets of cells in agreement tables; (2 examine cells that indicate disagreement; and (3 explore agreement and disagreement among three or more raters. Four CFA base models are discussed. The first is the model of rater agreement that is also used for Cohen’s (1960  (kappa. This model proposes independence of raters’ judgements. Deviations from this model suggest agreement or disagreement beyond chance. The second CFA model is based on a log-linear null model. This model is also used for Brennan and Prediger’s (1981 n. It proposes a uniform distribution of ratings. The third model is that of Tanner and Young (1985. This model proposes equal weights for agreement cases and independence otherwise. The fourth model is the quasi-independence model. This model allows one to blank out agreement cells and thus to focus solely on patterns of disagreement. Examples use data from applicant selection.

  19. Tradução, adaptação e confiabilidade interexaminadores do manual de administração da escala de Fugl-Meyer Translation, adaptation and inter-rater reliability of the administration manual for the Fugl-Meyer assessment

    Directory of Open Access Journals (Sweden)

    Stella M Michaelsen

    2011-02-01

    a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. OBJECTIVES: To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. METHODS: Eighteen adults (59±10 years with chronic hemiparesis (38±35 months after a stroke took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC. RESULTS: The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98 and lower limbs (ICC=0.90, as well as for movement sense (ICC=0.98 and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively. The reliability was moderate for tactile sensitivity (0.75. The joint pain assessment presented low reliability. CONCLUSIONS: The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.

  20. Adaptação transcultural e consistência interna do Early Trauma Inventory (ETI Early Trauma Inventory (ETI: cross-cultural adaptation and internal consistency

    Directory of Open Access Journals (Sweden)

    Marcelo Feijó de Mello

    2010-04-01

    Full Text Available As experiências traumáticas precoces são um fator de risco preditivo de problemas psicopatológicos futuros. O Early Trauma Inventory (ETI é um instrumento que avalia em indivíduos adultos experiências traumáticas ocorridas antes dos 18 anos de idade. Tal instrumento foi traduzido, transculturalmente adaptado e sua consistência interna foi avaliada. Vítimas de violência que preencheram os critérios de inclusão e exclusão foram submetidas a uma entrevista diagnóstica (SCID-I e ao ETI. Foram incluídos 91 pacientes com o transtorno do estresse pós-traumático (TEPT. O alfa de Cronbach nos diferentes domínios variou de 0,595-0,793, e o escore total foi de 0,878. A maior parte dos itens nos vários domínios, com exceção do abuso emocional, apresentou índices de correlação interitem entre 0,51-0,99. A versão adaptada foi útil tanto na clínica quanto na pesquisa. Apresentou boa consistência interna e na correlação interitem. O ETI é um instrumento válido, com boa consistência para se avaliar a presença de história de traumas precoces em indivíduos adultos.Early life stress is a strong predictor of future psychopathology during adulthood. The Early Trauma Inventory (ETI was developed to detect the presence and impact of traumatic experiences that occurred up to 18 years of age. The ETI was translated and cross-culturally adapted and had its consistency evaluated. Victims of violence that met the inclusion and exclusion criteria were submitted to SCID-I and ETI. Ninety-one patients with post-traumatic stress disorder (PTSD were included. Cronbach's alpha in the different domains varied from 0.595 to 0.793, and the total score was 0.878. Except for emotional abuse, most of the various domains displayed inter-item correlation rates of 0.51 to 0.99. The adapted version was useful for clinical and research purposes and showed good internal consistency and inter-item correlation. The ETI is a valid instrument with good

  1. VANET '13: Proceeding of the Tenth ACM International Workshop on Vehicular Inter-networking, Systems, and Applications

    NARCIS (Netherlands)

    Gozalvez, J.; Kargl, Frank; Mittag, J.; Kravets, R.; Tsai, M.; Unknown, [Unknown

    This year marks a very important date for the ACM international workshop on Vehicular inter-networking, systems, and applications as ACM VANET celebrates now its 10th edition. Starting in 2004 as "ACM international workshop on Vehicular ad hoc networks" already the change in title indicates that

  2. Content validation: clarity/relevance, reliability and internal consistency of enunciative signs of language acquisition.

    Science.gov (United States)

    Crestani, Anelise Henrich; Moraes, Anaelena Bragança de; Souza, Ana Paula Ramos de

    2017-08-10

    To analyze the results of the validation of building enunciative signs of language acquisition for children aged 3 to 12 months. The signs were built based on mechanisms of language acquisition in an enunciative perspective and on clinical experience with language disorders. The signs were submitted to judgment of clarity and relevance by a sample of six experts, doctors in linguistic in with knowledge of psycholinguistics and language clinic. In the validation of reliability, two judges/evaluators helped to implement the instruments in videos of 20% of the total sample of mother-infant dyads using the inter-evaluator method. The method known as internal consistency was applied to the total sample, which consisted of 94 mother-infant dyads to the contents of the Phase 1 (3-6 months) and 61 mother-infant dyads to the contents of Phase 2 (7 to 12 months). The data were collected through the analysis of mother-infant interaction based on filming of dyads and application of the parameters to be validated according to the child's age. Data were organized in a spreadsheet and then converted to computer applications for statistical analysis. The judgments of clarity/relevance indicated no modifications to be made in the instruments. The reliability test showed an almost perfect agreement between judges (0.8 ≤ Kappa ≥ 1.0); only the item 2 of Phase 1 showed substantial agreement (0.6 ≤ Kappa ≥ 0.79). The internal consistency for Phase 1 had alpha = 0.84, and Phase 2, alpha = 0.74. This demonstrates the reliability of the instruments. The results suggest adequacy as to content validity of the instruments created for both age groups, demonstrating the relevance of the content of enunciative signs of language acquisition.

  3. Exploring the Role of First Impressions in Rater-Based Assessments

    Science.gov (United States)

    Wood, Timothy J.

    2014-01-01

    Medical education relies heavily on assessment formats that require raters to assess the competence and skills of learners. Unfortunately, there are often inconsistencies and variability in the scores raters assign. To ensure the scores from these assessment tools have validity, it is important to understand the underlying cognitive processes that…

  4. Measuring the Impact of Rater Negotiation in Writing Performance Assessment

    Science.gov (United States)

    Trace, Jonathan; Janssen, Gerriet; Meier, Valerie

    2017-01-01

    Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…

  5. A Comparison of Assessment Methods and Raters in Product Creativity

    Science.gov (United States)

    Lu, Chia-Chen; Luh, Ding-Bang

    2012-01-01

    Although previous studies have attempted to use different experiences of raters to rate product creativity by adopting the Consensus Assessment Method (CAT) approach, the validity of replacing CAT with another measurement tool has not been adequately tested. This study aimed to compare raters with different levels of experience (expert ves.…

  6. Development and Validation of a Family Meeting Assessment Tool (FMAT).

    Science.gov (United States)

    Hagiwara, Yuya; Healy, Jennifer; Lee, Shuko; Ross, Jeanette; Fischer, Dixie; Sanchez-Reilly, Sandra

    2018-01-01

    A cornerstone procedure in Palliative Medicine is to perform family meetings. Learning how to lead a family meeting is an important skill for physicians and others who care for patients with serious illnesses and their families. There is limited evidence on how to assess best practice behaviors during end-of-life family meetings. Our aim was to develop and validate an observational tool to assess trainees' ability to lead a simulated end-of-life family meeting. Building on evidence from published studies and accrediting agency guidelines, an expert panel at our institution developed the Family Meeting Assessment Tool. All fourth-year medical students (MS4) and eight geriatric and palliative medicine fellows (GPFs) were invited to participate in a Family Meeting Objective Structured Clinical Examination, where each trainee assumed the physician role leading a complex family meeting. Two evaluators observed and rated randomly chosen students' performances using the Family Meeting Assessment Tool during the examination. Inter-rater reliability was measured using percent agreement. Internal consistency was measured using Cronbach α. A total of 141 trainees (MS4 = 133 and GPF = 8) and 26 interdisciplinary evaluators participated in the study. Internal reliability (Cronbach α) of the tool was 0.85. Number of trainees rated by two evaluators was 210 (MS4 = 202 and GPF = 8). Rater agreement was 84%. Composite scores, on average, were significantly higher for fellows than for medical students (P < 0.001). Expert-based content, high inter-rater reliability, good internal consistency, and ability to predict educational level provided initial evidence for construct validity for this novel assessment tool. Copyright © 2017 American Academy of Hospice and Palliative Medicine. All rights reserved.

  7. Consistent Visual Analyses of Intrasubject Data

    Science.gov (United States)

    Kahng, SungWoo; Chung, Kyong-Mee; Gutshall, Katharine; Pitts, Steven C.; Kao, Joyce; Girolami, Kelli

    2010-01-01

    Visual inspection of single-case data is the primary method of interpretation of the effects of an independent variable on a dependent variable in applied behavior analysis. The purpose of the current study was to replicate and extend the results of DeProspero and Cohen (1979) by reexamining the consistency of visual analysis across raters. We…

  8. QNOTE: an instrument for measuring the quality of EHR clinical notes.

    Science.gov (United States)

    Burke, Harry B; Hoang, Albert; Becher, Dorothy; Fontelo, Paul; Liu, Fang; Stephens, Mark; Pangaro, Louis N; Sessums, Laura L; O'Malley, Patrick; Baxi, Nancy S; Bunt, Christopher W; Capaldi, Vincent F; Chen, Julie M; Cooper, Barbara A; Djuric, David A; Hodge, Joshua A; Kane, Shawn; Magee, Charles; Makary, Zizette R; Mallory, Renee M; Miller, Thomas; Saperstein, Adam; Servey, Jessica; Gimbel, Ronald W

    2014-01-01

    The outpatient clinical note documents the clinician's information collection, problem assessment, and patient management, yet there is currently no validated instrument to measure the quality of the electronic clinical note. This study evaluated the validity of the QNOTE instrument, which assesses 12 elements in the clinical note, for measuring the quality of clinical notes. It also compared its performance with a global instrument that assesses the clinical note as a whole. Retrospective multicenter blinded study of the clinical notes of 100 outpatients with type 2 diabetes mellitus who had been seen in clinic on at least three occasions. The 300 notes were rated by eight general internal medicine and eight family medicine practicing physicians. The QNOTE instrument scored the quality of the note as the sum of a set of 12 note element scores, and its inter-rater agreement was measured by the intraclass correlation coefficient. The Global instrument scored the note in its entirety, and its inter-rater agreement was measured by the Fleiss κ. The overall QNOTE inter-rater agreement was 0.82 (CI 0.80 to 0.84), and its note quality score was 65 (CI 64 to 66). The Global inter-rater agreement was 0.24 (CI 0.19 to 0.29), and its note quality score was 52 (CI 49 to 55). The QNOTE quality scores were consistent, and the overall QNOTE score was significantly higher than the overall Global score (p=0.04). We found the QNOTE to be a valid instrument for evaluating the quality of electronic clinical notes, and its performance was superior to that of the Global instrument. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  9. Psychiatric comorbidity may not predict suicide during and after hospitalization. A nested case-control study with blinded raters.

    Science.gov (United States)

    Walby, Fredrik A; Odegaard, Erik; Mehlum, Lars

    2006-06-01

    To investigate the differential impact of DSM-IV axis-I and axis-II disorders on completed suicide and to study if psychiatric comorbidity increases the risk of suicide in currently and previously hospitalized psychiatric patients. A nested case-control design based on case notes from 136 suicides and 166 matched controls. All cases and controls were rediagnosed using the SCID-CV for axis-I and the DSM-IV criteria for axis-II disorders and the inter-rater reliability was satisfactory. Raters were blind to the case and control status and the original hospital diagnoses. Depressive disorders and bipolar disorders were associated with an increased risk of suicide. No such effect was found for comorbidity between axis-I disorders and for comorbidity between axis-I and axis-II disorders. Psychiatric diagnoses, although made using a structured and criteria-based approach, was based on information recorded in case notes. Axis-II comorbidity could only be investigated at an aggregated level. Psychiatric comorbidity did not predict suicide in this sample. Mood disorders did, however, increase the risk significantly independent of history of previous suicide attempts. Both findings can inform identification and treatment of patients at high risk for completed suicide.

  10. Therapist adherence in the strong without anorexia nervosa (SWAN) study: A randomized controlled trial of three treatments for adults with anorexia nervosa.

    Science.gov (United States)

    Andony, Louise J; Tay, Elaine; Allen, Karina L; Wade, Tracey D; Hay, Phillipa; Touyz, Stephen; McIntosh, Virginia V W; Treasure, Janet; Schmidt, Ulrike H; Fairburn, Christopher G; Erceg-Hurn, David M; Fursland, Anthea; Crosby, Ross D; Byrne, Susan M

    2015-12-01

    To develop a psychotherapy rating scale to measure therapist adherence in the Strong Without Anorexia Nervosa (SWAN) study, a multi-center randomized controlled trial comparing three different psychological treatments for adults with anorexia nervosa. The three treatments under investigation were Enhanced Cognitive Behavioural Therapy (CBT-E), the Maudsley Anorexia Nervosa Treatment for Adults (MANTRA), and Specialist Supportive Clinical Management (SSCM). The SWAN Psychotherapy Rating Scale (SWAN-PRS) was developed, after consultation with the developers of the treatments, and refined. Using the SWAN-PRS, two independent raters initially rated 48 audiotapes of treatment sessions to yield inter-rater reliability data. One rater proceeded to rate a total of 98 audiotapes from 64 trial participants. The SWAN-PRS demonstrated sound psychometric properties, and was considered a reliable measure of therapist adherence. The three treatments were highly distinguishable by independent raters, with therapists demonstrating significantly more behaviors consistent with the actual allocated treatment compared to the other two treatment modalities. There were no significant site differences in therapist adherence observed. The findings provide support for the internal validity of the SWAN study. The SWAN-PRS was deemed suitable for use in other trials involving CBT-E, MANTRA, or SSCM. The Authors. International Journal of Eating Disorders Published by Wiley Periodicals, Inc.

  11. Reliability of one-repetition maximum performance in people with chronic heart failure.

    Science.gov (United States)

    Ellis, Rachel; Holland, Anne E; Dodd, Karen; Shields, Nora

    2018-02-24

    Evaluate intra-rater and inter-rater reliability of the one-repetition maximum strength test in people with chronic heart failure. Intra-rater and inter-rater reliability study. A public tertiary hospital in northern metropolitan Melbourne. Twenty-four participants (nine female, mean age 71.8 ± 13.1 years) with mild to moderate heart failure of any aetiology. Lower limb strength was assessed by determining the maximum weight that could be lifted using a leg press. Intra-rater reliability was tested by one assessor on two separate occasions . Inter-rater reliability was tested by two assessors in random order. Intra-class correlation coefficients and 95% confidence intervals were calculated. Bland and Altman analyses were also conducted, including calculation of mean differences between measures ([Formula: see text]) and limits of agreement . Ten intra-rater and 21 inter-rater assessments were completed. Excellent intra-rater (intra-class correlation coefficient 2,1 0.96) and inter-rater (intra-class correlation coefficient 2,1 0.93) reliability was found. Intra-rater assessment showed less variability (mean difference 4.5 kg, limits of agreement -8.11 to 17.11 kg) than inter-rater agreement (mean difference -3.81 kg, limits of agreement -23.39 to 15.77 kg). One-repetition maximum determined using a leg press is a reliable measure in people with heart failure. Given its smaller limits of agreement, intra-rater testing is recommended. Implications for Rehabilitation Using a leg press to determine a one-repetition maximum we were able to demonstrate excellent inter-rater and intra-rater reliability using an intra-class correlation coefficient. The Bland and Altman levels of agreement were wide for inter-rater reliability and so we recommend using one assessor if measuring change in strength within an individual over time.

  12. Inter rater reliability of Pressure Ulcer Scale for Healing (PUSH in patients with chronic leg ulcers Confiabilidad inter-observadores del Pressure Ulcer Scale for Healing (PUSH en pacientes con úlceras crónicas en la pierna Confiabilidade interobservadores do Pressure Ulcer Scale for Healing (PUSH, em pacientes com úlceras crônicas de perna

    Directory of Open Access Journals (Sweden)

    Vera Lúcia Conceição de Gouveia Santos

    2007-06-01

    Full Text Available This study aimed to evaluate the inter rater reliability of the Pressure Ulcer Scale for Healing (PUSH, in its version adapted to the Portuguese language, in patients with chronic leg ulcers. Kappa index was used for the analysis. After accomplishing ethical issues, 41 patients with ulcers were examined. A total of 49% of the ulcers were located in the right leg and 36% of them were venous ulcers. The Kappa indices (0.97 to 1.00 obtained in the comparison between the observations of the clinical nurses and the stomal therapists for all sub-scales and for total score, confirmed the tool inter rater reliability, with statistical significance (pEl objetivo del estúdio fue probar la confiabilidad inter-observadores del Pressure Ulcer Scale for Healing (PUSH, en su versión adaptada al portugués, en pacientes con úlceras crónicas en la pierna. Para el análisis de concordancia se utilizó el Indice Kappa. Posterior a la aprobación del Comité de Ética, 41 pacientes con úlcera fueron examinados, siendo que 49% de las úlceras se localizaron a la derecha y 36% eran de etiología venosa. Los indices Kappa obtenidos (0,97 a 1,00, con un nivel significativo de pTestar a confiabilidade interobservadores do Pressure Ulcer Scale for Healing (PUSH, em sua versão adaptada para o português, em pacientes com úlceras crônicas de perna foi o objetivo deste estudo. Para a análise de concordância, utilizou-se o índice Kappa. Após aprovação pelo Comitê de Ética, pacientes com úlceras (41 úlceras foram examinados, sendo que 49% das úlceras localizavam-se à direita e 36% eram de etiologia venosa. Os índices Kappa obtidos (0,97 a 1,00, com significância estatística (p<0,001, ratificaram a confiabilidade interobservadores, ao ser obtida concordância de muito boa a total entre as observações de enfermeiros clínicos e especialistas em estomaterapia (padrão-ouro, para todas as subescalas do PUSH, como para o escore total. Esses resultados

  13. Towards criterion validity in classroom language analysis: methodological constraints of metadiscourse and inter-rater agreement

    Directory of Open Access Journals (Sweden)

    Douglas Altamiro Consolo

    2001-02-01

    Full Text Available

    This paper reports on a process to validate a revised version of a system for coding classroom discourse in foreign language lessons, a context in which the dual role of language (as content and means of communication and the speakers' specific pedagogical aims lead to a certain degree of ambiguity in language analysis. The language used by teachers and students has been extensively studied, and a framework of concepts concerning classroom discourse well-established. Models for coding classroom language need, however, to be revised when they are applied to specific research contexts. The application and revision of an initial framework can lead to the development of earlier models, and to the re-definition of previously established categories of analysis that have to be validated. The procedures followed to validate a coding system are related here as guidelines for conducting research under similar circumstances. The advantages of using instruments that incorporate two types of data, that is, quantitative measures and qualitative information from raters' metadiscourse, are discussed, and it is suggested that such procedure can contribute to the process of validation itself, towards attaining reliability of research results, as well as indicate some constraints of the adopted research methodology.

  14. Comment on the internal consistency of thermodynamic databases supporting repository safety assessments

    International Nuclear Information System (INIS)

    Arthur, R.C.

    2001-11-01

    This report addresses the concept of internal consistency and its relevance to the reliability of thermodynamic databases used in repository safety assessments. In addition to being internally consistent, a reliable database should be accurate over a range of relevant temperatures and pressures, complete in the sense that all important aqueous species, gases and solid phases are represented, and traceable to original experimental results. No single definition of internal consistency need to be universally accepted as the most appropriate under all conditions, however. As a result, two databases that are each internally consistent may be inconsistent with respect to each other, and a database derived from two or more such databases must itself be internally inconsistent. The consequences of alternative definitions that are reasonably attributable to the concept of internal consistency can be illustrated with reference to the thermodynamic database supporting SKB's recent SR 97 safety assessment. This database is internally inconsistent because it includes equilibrium constants calculated over a range of temperatures: using conflicting reference values for some solids, gases and aqueous species that are common to two internally consistent databases (the OECD/NEA database for radioelements and SUPCRT databases for non-radioactive elements) that serve as source databases for the SR 97 TDB, using different definitions in these source databases of standard states for condensed phases and aqueous species, based on different mathematical expressions used in these source databases representing the temperature dependence of the heat capacity, and based on different chemical models adopted in these source databases for the aqueous phase. The importance of such inconsistencies must be considered in relation to the other database reliability criteria noted above, however. Thus, accepting a certain level of internal inconsistency in a database it is probably preferable to use a

  15. Comment on the internal consistency of thermodynamic databases supporting repository safety assessments

    Energy Technology Data Exchange (ETDEWEB)

    Arthur, R.C. [Monitor Scientific, LLC, Denver, CO (United States)

    2001-11-01

    This report addresses the concept of internal consistency and its relevance to the reliability of thermodynamic databases used in repository safety assessments. In addition to being internally consistent, a reliable database should be accurate over a range of relevant temperatures and pressures, complete in the sense that all important aqueous species, gases and solid phases are represented, and traceable to original experimental results. No single definition of internal consistency need to be universally accepted as the most appropriate under all conditions, however. As a result, two databases that are each internally consistent may be inconsistent with respect to each other, and a database derived from two or more such databases must itself be internally inconsistent. The consequences of alternative definitions that are reasonably attributable to the concept of internal consistency can be illustrated with reference to the thermodynamic database supporting SKB's recent SR 97 safety assessment. This database is internally inconsistent because it includes equilibrium constants calculated over a range of temperatures: using conflicting reference values for some solids, gases and aqueous species that are common to two internally consistent databases (the OECD/NEA database for radioelements and SUPCRT databases for non-radioactive elements) that serve as source databases for the SR 97 TDB, using different definitions in these source databases of standard states for condensed phases and aqueous species, based on different mathematical expressions used in these source databases representing the temperature dependence of the heat capacity, and based on different chemical models adopted in these source databases for the aqueous phase. The importance of such inconsistencies must be considered in relation to the other database reliability criteria noted above, however. Thus, accepting a certain level of internal inconsistency in a database it is probably preferable to

  16. Graphic Creativity Assessment: Psychometric Properties in College Students From Buenos Aires

    Directory of Open Access Journals (Sweden)

    Agustín Freiberg Hoffmann

    2017-04-01

    Full Text Available Research on creativity has acquired major development due to its relevance concerning teaching in college. Its assessment is generally conducted by means of verbal and graphic measures. A short scale to measure verbal creativity (CREA in college students from Buenos Aires is currently available. However, right now there are no similar scales designed to assess graphic creativity. In view of that, this study will analyse psychometric features of the ECG scale locally known as Evaluación de la Creatividad Gráfica – Graphic Creativity Assessment, to be employed in the academic milieu in order to provide a complementary measure of verbal creativity. Face and construct validity evidences (converging validity analysis and confirmatory factor analysis were examined as well as reliability, taking into account internal consistency aspects, inter-rater and test-retest stability. The resulting scale showed adequate technical features. The original version, supported by De la Torre’s model, was composed by 12 indicators. This study’s findings maintained 9 of them but, considering new analyses, only 4 of the original ones were retained. This 4-indicator model obtained a better fit to empirical data and good indexes of correlation with a verbal creativity measure, as well as good reliability indicators (internal consistency, inter-rater and test-retest. Findings are discussed taking into account theoretical basis.

  17. The Stanmore Nursing Assessment of Psychological Status: Understanding the emotions of patients with spinal cord injury.

    Science.gov (United States)

    Smyth, Carol; Spada, Marcantonio M; Coultry-Keane, Katherine; Ikkos, George

    2016-09-01

    Research has shown that individuals who have sustained a spinal cord injury can experience strong and abrupt variations in their emotional state; however no instrument for nurses has been developed to assess these patients' psychological status. To develop a brief, reliable instrument to enable nurses to accurately assess, record and respond to spinal cord injury patients' psychological status. In Phase 1, semi-structured interviews were conducted with spinal cord injury patients (n = 10) and nurses (n = 10) which were audio recorded, transcribed and thematically analysed to develop the instrument. The instrument's content validity was then ensured via independent expert review. In Phase 2, the instrument was trialled on 80 spinal cord injury patients to determine inter-rater reliability, internal consistency and test-retest reliability. In Phase 1, four core themes (emotional impact, coping, relationships and assessment) were identified together with a number of related sub-themes. In Phase 2, the instrument was shown to have excellent inter-rater reliability, acceptable internal consistency and satisfactory test re-test reliability. Subsequently a rating sheet, user manual and prompt card were produced. The new instrument, the Stanmore Nursing Assessment of Psychological Status, was shown to be valid and reliable. It is anticipated that training nurses to use this instrument may help to enhance good emotional care of patients.

  18. Reliability and Validity of a Survey of Cat Caregivers on Their Cats’ Socialization Level in the Cat’s Normal Environment

    Directory of Open Access Journals (Sweden)

    Margaret Slater

    2013-12-01

    Full Text Available Stray cats routinely enter animal welfare organizations each year and shelters are challenged with determining the level of human socialization these cats may possess as quickly as possible. However, there is currently no standard process to guide this determination. This study describes the development and validation of a caregiver survey designed to be filled out by a cat’s caregiver so it accurately describes a cat’s personality, background, and full range of behavior with people when in its normal environment. The results from this survey provided the basis for a socialization score that ranged from unsocialized to well socialized with people. The quality of the survey was evaluated based on inter-rater and test-retest reliability and internal consistency and estimates of construct and criterion validity. In general, our results showed moderate to high levels of inter-rater (median of 0.803, range 0.211–0.957 and test-retest agreement (median 0.92, range 0.211–0.999. Cronbach’s alpha showed high internal consistency (0.962. Estimates of validity did not highlight any major shortcomings. This survey will be used to develop and validate an effective assessment process that accurately differentiates cats by their socialization levels towards humans based on direct observation of cats’ behavior in an animal shelter.

  19. Evaluation of the Swedish version of the Child Drawing: Hospital Manual.

    Science.gov (United States)

    Wennström, Berith; Nasic, Salmir; Hedelin, Hans; Bergh, Ingrid

    2011-05-01

    This paper is a report of psychometric testing of the Swedish version of the Child Drawing: Hospital Manual. Drawings have shown to be useful in assessing emotional status and anxiety in children because they generally speak to us more clearly and openly through their drawings than they are willing or able to verbally. The Child Drawing: Hospital Manual was translated into Swedish according to World Health Organization guidelines (a routine procedure for translation of English instruments) in order to assess anxiety by analysing the drawings of 59 children (5-11 years), of whom nine were girls and 50 boys undergoing day surgery during 2007-2009. Inter-rater reliability (five independent scorers) was high and internal consistency reliability was good (coefficient alpha =0·77). Parts A and C, as well as the total scale score of the Child Drawing: Hospital Manual, discriminated anxiety significantly between the group of children undergoing day surgery and a comparison group of school children, indicating adequate construct validity. For the Swedish version of the Child Drawing: Hospital Manual, our study demonstrates evidence for adequate construct validity in Parts A and C (and total scale score), high inter-rater reliability and acceptable internal consistency reliability. However, some improvements are needed before the instrument will be a clinically useful assessment of anxiety in children undergoing day surgery. © 2011 Blackwell Publishing Ltd.

  20. Reliability of the Alzheimer's disease assessment scale (ADAS-Cog) in longitudinal studies.

    Science.gov (United States)

    Khan, Anzalee; Yavorsky, Christian; DiClemente, Guillermo; Opler, Mark; Liechti, Stacy; Rothman, Brian; Jovic, Sofija

    2013-11-01

    Considering the scarcity of longitudinal assessments of reliability, there is need for a more precise understanding of cognitive decline in Alzheimer's Disease (AD). The primary goal was to assess longitudinal changes in inter-rater reliability, test retest reliability and internal consistency of scores of the ADAS-Cog. 2,618 AD subjects were enrolled in seven randomized, double-blind, placebo-controlled, multicenter-trials from 1986 to 2009. Reliability, internal-consistency and cross-sectional analysis of ADAS-Cog and MMSE across seven visits were examined. Intra-class correlation (ICC) for ADAS-Cog was moderate to high supporting their reliability. Absolute Agreement ICCs 0.392 (Visit-7) to 0.806 (Visit-2) showed a progressive decrease in correlations across time. Item analysis revealed a decrease in item correlations, with the lowest correlations for Visit 7 for Commands (ICC=0.148), Comprehension (ICC=0.092), Spoken Language (ICC=0.044). Suitable assessment of AD treatments is maintained through accurate measurement of clinically significant outcomes. Targeted rater education ADAS-Cog items over-time can improve ability to administer and score the scale.

  1. Investigating Differences between American and Indian Raters in Assessing TOEFL iBT Speaking Tasks

    Science.gov (United States)

    Wei, Jing; Llosa, Lorena

    2015-01-01

    This article reports on an investigation of the role raters' language background plays in raters' assessment of test takers' speaking ability. Specifically, this article examines differences between American and Indian raters in their scores and scoring processes when rating Indian test takers' responses to the Test of English as a Foreign…

  2. Adaptation and validation of the Alzheimer's Disease Assessment Scale - Cognitive (ADAS-Cog) in a low-literacy setting in sub-Saharan Africa.

    Science.gov (United States)

    Paddick, Stella-Maria; Kisoli, Aloyce; Mkenda, Sarah; Mbowe, Godfrey; Gray, William Keith; Dotchin, Catherine; Ogunniyi, Adesola; Kisima, John; Olakehinde, Olaide; Mushi, Declare; Walker, Richard William

    2017-08-01

    This study aimed to assess the feasibility of a low-literacy adaptation of the Alzheimer's Disease Assessment Scale - Cognitive (ADAS-Cog) for use in rural sub-Saharan Africa (SSA) for interventional studies in dementia. No such adaptations currently exist. Tanzanian and Nigerian health professionals adapted the ADAS-Cog by consensus. Validation took place in a cross-sectional sample of 34 rural-dwelling older adults with mild/moderate dementia alongside 32 non-demented controls in Tanzania. Participants were oversampled for lower educational level. Inter-rater reliability was conducted by two trained raters in 22 older adults (13 with dementia) from the same population. Assessors were blind to diagnostic group. Median ADAS-Cog scores were 28.75 (interquartile range (IQR), 22.96-35.54) in mild/moderate dementia and 12.75 (IQR 9.08-16.16) in controls. The area under the receiver operating characteristic curve (AUC) was 0.973 (95% confidence interval (CI) 0.936-1.00) for dementia. Internal consistency was high (Cronbach's α 0.884) and inter-rater reliability was excellent (intra-class correlation coefficient 0.905, 95% CI 0.804-0.964). The low-literacy adaptation of the ADAS-Cog had good psychometric properties in this setting. Further evaluation in similar settings is required.

  3. The validity and internal structure of the Bipolar Depression Rating Scale: data from a clinical trial of N-acetylcysteine as adjunctive therapy in bipolar disorder.

    Science.gov (United States)

    Berk, Michael; Dodd, Seetal; Dean, Olivia M; Kohlmann, Kristy; Berk, Lesley; Malhi, Gin S

    2010-10-01

    Berk M, Dodd S, Dean OM, Kohlmann K, Berk L, Malhi GS. The validity and internal structure of the Bipolar Depression Rating Scale: data from a clinical trial of N-acetylcysteine as adjunctive therapy in bipolar disorder. The phenomenology of unipolar and bipolar disorders differ in a number of ways, such as the presence of mixed states and atypical features. Conventional depression rating instruments are designed to capture the characteristics of unipolar depression and have limitations in capturing the breadth of bipolar disorder. The Bipolar Depression Rating Scale (BDRS) was administered together with the Montgomery Asberg Rating Scale (MADRS) and Young Mania Rating Scale (YMRS) in a double-blind randomised placebo-controlled clinical trial of N-acetyl cysteine for bipolar disorder (N = 75). A factor analysis showed a two-factor solution: depression and mixed symptom clusters. The BDRS has strong internal consistency (Cronbach's alpha = 0.917), the depression cluster showed robust correlation with the MADRS (r = 0.865) and the mixed subscale correlated with the YMRS (r = 0.750). The BDRS has good internal validity and inter-rater reliability and is sensitive to change in the context of a clinical trial.

  4. The Effects of Primacy on Rater Cognition: An Eye-Tracking Study

    Science.gov (United States)

    Ballard, Laura

    2017-01-01

    Rater scoring has an impact on writing test reliability and validity. Thus, there has been a continued call for researchers to investigate issues related to rating (Crusan, 2015). Investigating the scoring process and understanding how raters arrive at particular scores are critical "because the score is ultimately what will be used in making…

  5. Use of three-dimensional speckle tracking to assess left ventricular myocardial mechanics: inter-vendor consistency and reproducibility of strain measurements.

    Science.gov (United States)

    Badano, Luigi P; Cucchini, Umberto; Muraru, Denisa; Al Nono, Osama; Sarais, Cristiano; Iliceto, Sabino

    2013-03-01

    Since there is insufficient data available about the inter-vendor consistency of three-dimensional (3D) speckle-tracking (STE) measurements, we undertook this study to (i) assess the inter-vendor consistency of 3D LV global strain values obtained using two different scanners; (ii) identify the sources of inter-vendor inconsistencies, if any; and (iii) compare their respective intrinsic variability. Sixty patients (38 ± 12 years, 64% males) with a wide range of LV end-diastolic volumes (from 74 to 205 ml) and ejection fractions (from 17 to 70%) underwent two 3D LV data set acquisitions using VividE9 and Artida ultrasound systems. Global longitudinal (Lε), radial (Rε), circumferential (Cε) and area (Aε) strain values were obtained offline using the corresponding 3D STE softwares. Despite being significantly different, Lε showed the closest values between the two platforms (bias = 1.5%, limits of agreement (LOA) from -2.9 to -5.9%, P < 0.05). Artida produced significantly higher values of both Cε and Aε than VividE9 (bias = 6.6, LOA: -14.1 to 0.9%, and bias = 6.0, LOA = -28.2-8.6%, respectively, P < 0.001). Conversely, Rε values obtained with Artida were significantly lower than those measured using VividE9 platform (bias = -24.2, LOA: 1.5-49.9, P < 0.001). All strain components showed good reproducibility (intra-class correlation coefficients: 0.82-0.98), except for Rε by Artida, which showed only a moderate reproducibility. Apart from Lε, the inter-vendor agreement of Rε, Cε and Aε measured with Artida and VividE9 was poor. Reference values should be specific for each system and baseline and follow-up data in longitudinal studies should be obtained using the same 3D STE platform.

  6. Delimiting coefficient alpha from internal consistency and unidimensionality

    NARCIS (Netherlands)

    Sijtsma, K.

    2015-01-01

    I discuss the contribution by Davenport, Davison, Liou, & Love (2015) in which they relate reliability represented by coefficient α to formal definitions of internal consistency and unidimensionality, both proposed by Cronbach (1951). I argue that coefficient α is a lower bound to reliability and

  7. International exchange of emergency phase information and assessments: an aid to inter/national decision makers

    International Nuclear Information System (INIS)

    Sullivan, T.J.

    2003-01-01

    plots via Internet web sites and interactively dialogue via web-based two-way televideo conferencing technology [LLNL/JAERI report]. While, in principle, the results were functional, the linkups proved to be rudimentary and somewhat unstable for combined video, voice and whiteboard interaction. JAERI (WSPEEDI) and LLNL (NARAC) did successfully exploit this project during two separate radiological accidents in Tokai, Japan. In 1999 the EU/RODOS project expressed an interest to join in this effort. In 2002 the USA renewed interest in this project and subsequently Russia/FEERC joined. Since 1999 there has been substantial improvement in Internet bandwidth, efficient and versatile data exchange protocols (e.g., XML) and televideo conferencing technology. Implementation of data exchange protocols (user ID and password protected) at those four major centers/projects in combination with a multi-party televideo conferencing capability provides the mechanism for the exchange of key information in near realtime, and examination and comparison of calculated assessments in a quasi-peer review mode. This capability provides the opportunity to detect missed input data as well as deficiencies in meteorology, resolution, topography, etc., thus leading to refinement, consensus and 'harmonization' in real time prior to the release of assessments to decision makers. Such a system should be a benefit to all the inter/national agencies involved in advising and protecting impacted citizens by reducing some of the information challenges they (the decision makers) face and, hopefully, resulting in consistent and presumably the best advice. We expect that with successful demonstration and experience with this system, in the future it could provide a tool to non-nuclear countries and international agencies such as the IAEA and WHO. (author)

  8. Psychometric properties of the portuguese version of the Jebsen-Taylor test for adults with mild hemiparesis Avaliação das propriedades pscicométricas da versão em português do teste de Jebsen Taylor para adultos com hemiparesia leve

    Directory of Open Access Journals (Sweden)

    Karina N. Ferreiro

    2010-10-01

    Full Text Available OBJECTIVES: To evaluate the psychometric properties of the Portuguese version of the Jebsen-Taylor Test (JTT in patients with stroke. METHODS: Forty participants who suffered a stroke in the cerebral hemisphere were videotaped while performing the JTT. Scores were defined by the time taken to perform the tasks, and two physical therapists evaluated the performance of the participants. Intra- and inter-rater reliability was defined by intraclass correlation coefficients (ICC through videotape analysis. Cronbach's alpha and Pearson's correlation coefficient (r were used to measure the internal consistency of the scale. Confidence intervals (CI were calculated, and the influence of handedness and educational level on the JTT scores was evaluated. RESULTS: Inter-rater (ICC = 1.0; CI, 1.0-1.0 and intra-rater reliabilities (ICC=0.997; CI, 0.995-0.998 were excellent. Regarding internal consistency, Cronbach's α was 0.924. The item "writing a sentence" was less consistent than the other items (Cronbach's alpha=0.884. Pearson's r (item score - total score was lower for the item "small objects" (r=0.657. There was no significant influence of handedness or educational level on the JTT scores. CONCLUSIONS: Videotaping test performances can be a useful tool in multicenter studies if inter-rater reliability is appropriate. The inter- and intra-rater reliabilities of the Portuguese version of the JTT were excellent in patients with stroke. The JTT can be a valuable tool for evaluating dexterity in research protocols aiming at efficacy of rehabilitation interventions.OBJETIVOS: Avaliar as propriedades psicométricas da versão em Português do teste de Jebsen-Taylor (TJT em pacientes com acidente vascular encefálico (AVE. MÉTODOS: Quarenta pacientes com AVEs em hemisférios cerebrais foram filmados enquanto realizaram o TJT. A pontuação no teste é definida pelo tempo de execução de tarefas motoras. Duas fisioterapeutas avaliaram o desempenho dos

  9. The Shoulder Objective Practical Assessment Tool: Evaluation of a New Tool Assessing Residents Learning in Diagnostic Shoulder Arthroscopy.

    Science.gov (United States)

    Talbot, Christopher L; Holt, Edward M; Gooding, Benjamin W T; Tennent, Thomas D; Foden, Philip

    2015-08-01

    To design and validate an objective practical assessment tool for diagnostic shoulder arthroscopy that would provide residents with a method to evaluate their progression in this field of surgery and to identify specific learning needs. We designed and evaluated the shoulder Objective Practical Assessment Tool (OPAT). The shoulder OPAT was designed by us, and scoring domains were created using a Delphi process. The shoulder OPAT was trialed by members of the British Elbow & Shoulder Society Education Committee for internal consistency and ease of use before being offered to other trainers and residents. Inter-rater reliability and intrarater reliability were calculated. One hundred forty orthopaedic residents, of varying seniority, within 5 training regions in the United Kingdom, were questioned regarding the tool. A pilot study of 6 residents was undertaken. Internal consistency was 0.77 (standardized Cronbach α). Inter-rater reliability was 0.60, and intrarater reliability was 0.82. The Spearman correlation coefficient (r) between the global summary score for the shoulder OPAT and the current assessment tool used in postgraduate training for orthopaedic residents undertaking diagnostic shoulder arthroscopy equaled 0.74. Of the residents, 82% agreed or strongly agreed when asked if the shoulder OPAT would be a useful tool in monitoring progression and 72% agreed or strongly agreed with the introduction of the shoulder OPAT within the orthopaedic domain. This study shows that the shoulder OPAT fulfills several aspects of reliability and validity when tested. Despite the inter-rater reliability being 0.60, we believe that the shoulder OPAT has the potential to play a role alongside the current assessment tool in the training of orthopaedic residents. The shoulder OPAT can be used to assess residents during shoulder arthroscopy and has the potential for use in medical education, as well as arthroscopic skills training in the operating theater. Copyright © 2015

  10. Rater Agreement Indexes for Performance Assessment.

    Science.gov (United States)

    Burry-Stock, Judith A.; And Others

    1996-01-01

    It is argued that interrater agreement is a psychometric property which is theoretically different from classic reliability. Formulas are presented to illustrate a set of algebraically equivalent rater agreement indices that are intended to provide educational and psychological researchers with a practical way to establish a measure of rater…

  11. Reliability and main findings of the FEES-Tensilon Test in patients with myasthenia gravis and dysphagia.

    Science.gov (United States)

    Im, Sun; Suntrup-Krueger, Sonja; Colbow, Sigrid; Sauer, Sonja; Claus, Inga; Meuth, Sven G; Dziewas, Rainer; Warnecke, Tobias

    2018-05-26

    Diagnosis of pharyngeal dysphagia caused by myasthenia gravis (MG) based on clinical examination alone is often challenging. Flexible endoscopic evaluation of swallowing (FEES) combined with Tensilon (edrophonium) application, referred to as the FEES-Tensilon Test, was developed to improve diagnostic accuracy and to detect the main symptoms of pharyngeal dysphagia in MG. Here we investigated inter- and intra-rater reliability of the FEES-Tensilon Test and analyzed the main endoscopic findings. Four experienced raters reviewed a total of 20 FEES-Tensilon-Test videos in randomized order. Residue severity was graded at 4 different pharyngeal spaces before and after Tensilon administration. All interpretations were performed twice per rater, 4 weeks apart (a total of 160 scorings). Intra-rater test-retest reliability and inter-rater reliability levels were calculated. The most frequent FEES findings in MG patients before Tensilon application were prominent residues of semi solids spread all over the hypopharynx in varying locations. The reliability level in the interpretation of the FEES-Tensilon test was excellent regardless of the raters' profession or years of experience with FEES. All 4 raters showed high inter- and intra- reliability levels in interpreting the FEES-Tensilon Test based on residue clearance (kappa=0.922, 0.981). Degree of residue normalization in the vallecular space after Tensilon application showed the highest inter- and intra-rater reliability level (kappa=0.863, 0.957) followed by the epiglottis (kappa=0.813, 0.946) and pyriform sinuses (kappa=0.836, 0.929). Interpretation of the FEES-Tensilon Test based on residue severity and degree of Tensilon clearance, especially in the vallecular space, is consistent and reliable. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  12. Inter operability studies between the GTRS and EUMEDGRID e-Infrastructures

    International Nuclear Information System (INIS)

    Abbes, H.; Jemni, M.; Barbera, R.

    2007-01-01

    Grid computing enables sharing, selection, and aggregation of a wide variety of geographically distributed computational resources such as supercomputers, clusters, storage systems and data sources. A grid presents them as one unified resource for solving large scale and data intensive computing applications. A middle ware supports applications in distributed computing environments by providing services that enable the inter connectivity and inter operability of applications, systems and machines. Considering the evolution of this type of middle ware, it is important to regroup several national and international grids, by creating a gateway between middle wares, to gather more power and resources. In this setting, our work consists on making the Tunisian national grid GTRS inter operable with the grid infrastructure of EU funded project EUMEDGRID. A new concept of Super Worker Node is proposed in this work to reach the inter operability between the two grids of GTRS and EUMEDGRID. (Author)

  13. Qualitative analysis of MMI raters' scorings of medical school candidates: A matter of taste?

    Science.gov (United States)

    Christensen, Mette K; Lykkegaard, Eva; Lund, Ole; O'Neill, Lotte D

    2018-05-01

    Recent years have seen leading medical educationalists repeatedly call for a paradigm shift in the way we view, value and use subjectivity in assessment. The argument is that subjective expert raters generally bring desired quality, not just noise, to performance evaluations. While several reviews document the psychometric qualities of the Multiple Mini-Interview (MMI), we currently lack qualitative studies examining what we can learn from MMI raters' subjectivity. The present qualitative study therefore investigates rater subjectivity or taste in MMI selection interview. Taste (Bourdieu 1984) is a practical sense, which makes it possible at a pre-reflective level to apply 'invisible' or 'tacit' categories of perception for distinguishing between good and bad. The study draws on data from explorative in-depth interviews with 12 purposefully selected MMI raters. We find that MMI raters spontaneously applied subjective criteria-their taste-enabling them to assess the candidates' interpersonal attributes and to predict the candidates' potential. In addition, MMI raters seemed to share a taste for certain qualities in the candidates (e.g. reflectivity, resilience, empathy, contact, alikeness, 'the good colleague'); hence, taste may be the result of an ongoing enculturation in medical education and healthcare systems. This study suggests that taste is an inevitable condition in the assessment of students' performance. The MMI set-up should therefore make room for MMI raters' taste and their connoisseurship, i.e. their ability to taste, to improve the quality of their assessment of medical school candidates.

  14. Validation of the Brazilian version of the Clinical Gait and Balance Scale and comparison with the Berg Balance Scale

    Directory of Open Access Journals (Sweden)

    Jussara Almeida Oliveira Baggio

    2013-09-01

    Full Text Available Objective To validate the Clinical Gait and Balance Scale (GABS for a Brazilian population of patients with Parkinson's disease (PD and to compare it to the Berg Balance Scale (BBS. Methods One hundred and seven PD patients were evaluated by shortened UPDRS motor scale (sUPDRSm, Hoehn and Yahr (HY, Schwab and England scale (SE, Falls Efficacy Scale International (FES-I, Freezing of Gait Questionnaire (FOG-Q, BBS and GABS. Results The internal consistency of the GABS was 0.94, the intra-rater and inter-rater reliability were 0.94 and 0.98 respectively. The area under the receiver operating characteristic (ROC curve was 0.72, with a sensitivity of 0.75 and specificity of 0.6, to discriminate patients with a history of falls in the last twelve months, for a cut-off score of 13 points. Conclusions Our study shows that the Brazilian version of the GABS is a reliable and valid instrument to assess gait and balance in PD.

  15. A Surgery Oral Examination: Interrater Agreement and the Influence of Rater Characteristics.

    Science.gov (United States)

    Burchard, Kenneth W.; And Others

    1995-01-01

    A study measured interrater reliability among 140 United States and Canadian surgery exam raters and the influences of age, years in practice, and experience as an examiner on individual scores. Results indicate three aspects of examinee performance influenced scores: verbal style, dress, and content of answers. No rater characteristic…

  16. Commitment to Change and Challenges to Implementing Changes After Workplace-Based Assessment Rater Training.

    Science.gov (United States)

    Kogan, Jennifer R; Conforti, Lisa N; Yamazaki, Kenji; Iobst, William; Holmboe, Eric S

    2017-03-01

    Faculty development for clinical faculty who assess trainees is necessary to improve assessment quality and impor tant for competency-based education. Little is known about what faculty plan to do differently after training. This study explored the changes faculty intended to make after workplace-based assessment rater training, their ability to implement change, predictors of change, and barriers encountered. In 2012, 45 outpatient internal medicine faculty preceptors (who supervised residents) from 26 institutions participated in rater training. They completed a commitment to change form listing up to five commitments and ranked (on a 1-5 scale) their motivation for and anticipated difficulty implementing each change. Three months later, participants were interviewed about their ability to implement change and barriers encountered. The authors used logistic regression to examine predictors of change. Of 191 total commitments, the most common commitments focused on what faculty would change about their own teaching (57%) and increasing direct observation (31%). Of the 183 commitments for which follow-up data were available, 39% were fully implemented, 40% were partially implemented, and 20% were not implemented. Lack of time/competing priorities was the most commonly cited barrier. Higher initial motivation (odds ratio [OR] 2.02; 95% confidence interval [CI] 1.14, 3.57) predicted change. As anticipated difficulty increased, implementation became less likely (OR 0.67; 95% CI 0.49, 0.93). While higher baseline motivation predicted change, multiple system-level barriers undermined ability to implement change. Rater-training faculty development programs should address how faculty motivation and organizational barriers interact and influence ability to change.

  17. Older People’s External Residential Assessment Tool (OPERAT: a complementary participatory and metric approach to the development of an observational environmental measure

    Directory of Open Access Journals (Sweden)

    Vanessa Burholt

    2016-09-01

    Full Text Available Abstract Background The potential for environmental interventions to improve health and wellbeing has assumed particular importance in the face of unprecedented population ageing. However, presently observational environmental assessment tools are unsuitable for ‘all ages’. This article describes the development of the Older People’s External Residential Assessment Tool (OPERAT. Methods Potential items were identified through review and consultation with an Expert Advisory Group. Items were ranked according the importance ascribed to them by older people who responded to a survey distributed by 50+ forum in Wales (N = 545. 40 highly ranked items were selected for the OPERAT pilot. An observational assessment was conducted in 405 postcodes in Wales. Items validated with data from a survey of older residents (N = 500 in the postcode areas were selected for statistical modelling (Kendall’s Tau-b, p < .05. Data reduction techniques (exploratory factor analysis with Geomin rotation identified the underlying factor structure of OPERAT. Items were weighted (Thurstone scaling approach and scores calculated for each domain. Internal consistency: all items were tested for scale-domain total correlation (Spearman’s rank. Construct validity: correlation analysis examined the associations between domains and the extent to which participants enjoyed living in the area, felt that it was a desirable place to live, or felt safe at night or during the day (Spearman’s rank. Usability: analysis of variance compared mean OPERAT domain scores between neighbourhoods that were homogenous in terms of (a deprivation (quintiles of the Townsend Index and (b geographic settlement type. Inter-rater reliability: Krippendorff’s alpha was used to evaluate inter-rater consistency in ten postcode areas. Results A four factor model was selected as the best interpretable fit to the data. The domains were named Natural Elements, Incivilities and Nuisance

  18. Psychometric validation of the behavioral indicators of pain scale for the assessment of pain in mechanically ventilated and unable to self-report critical care patients.

    Science.gov (United States)

    Latorre-Marco, I; Acevedo-Nuevo, M; Solís-Muñoz, M; Hernández-Sánchez, L; López-López, C; Sánchez-Sánchez, M M; Wojtysiak-Wojcicka, M; de Las Pozas-Abril, J; Robleda-Font, G; Frade-Mera, M J; De Blas-García, R; Górgolas-Ortiz, C; De la Figuera-Bayón, J; Cavia-García, C

    2016-11-01

    To assess the psychometric properties of the behavioral indicators of pain scale (ESCID) when applied to a wide range of medical and surgical critical patients. A multicentre, prospective observational study was designed to validate a scale measuring instrument. Twenty Intensive Care Units of 14 hospitals belonging to the Spanish National Health System. A total of 286 mechanically ventilated, unable to self-report critically ill medical and surgical adult patients. Pain levels were measured by two independent evaluators simultaneously, using two scales: ESCID and the behavioral pain scale (BPS). Pain was observed before, during, and after two painful procedures (turning, tracheal suctioning) and one non-painful procedure. ESCID reliability was measured on the basis of internal consistency using the Cronbach-α coefficient. Inter-rater and intra-rater agreement were measured. The Spearman correlation coefficient was used to assess the correlation between ESCID and BPS. A total of 4386 observations were made in 286 patients (62% medical and 38% surgical). High correlation was found between ESCID and BPS (r=0.94-0.99; p<0.001), together with high intra-rater and inter-rater concordance. ESCID was internally reliable, with a Cronbach-α value of 0.85 (95%CI 0.81-0.88). Cronbach-α coefficients for ESCID domains were high: facial expression 0.87 (95%CI 0.84-0.89), calmness 0.84 (95%CI 0.81-0.87), muscle tone 0.80 (95%CI 0.75-0.84), compliance with mechanical ventilation 0.70 (95%CI 0.63-0.75) and consolability 0.85 (95%CI 0.81-0.88). ESCID is valid and reliable for measuring pain in mechanically ventilated unable to self-report medical and surgical critical care patients. CLINICALTRIALS.GOV: NCT01744717. Copyright © 2016 The Authors. Publicado por Elsevier España, S.L.U. All rights reserved.

  19. Reliable and fast volumetry of the lumbar spinal cord using cord image analyser (Cordial).

    Science.gov (United States)

    Tsagkas, Charidimos; Altermatt, Anna; Bonati, Ulrike; Pezold, Simon; Reinhard, Julia; Amann, Michael; Cattin, Philippe; Wuerfel, Jens; Fischer, Dirk; Parmar, Katrin; Fischmann, Arne

    2018-04-30

    To validate the precision and accuracy of the semi-automated cord image analyser (Cordial) for lumbar spinal cord (SC) volumetry in 3D T1w MRI data of healthy controls (HC). 40 3D T1w images of 10 HC (w/m: 6/4; age range: 18-41 years) were acquired at one 3T-scanner in two MRI sessions (time interval 14.9±6.1 days). Each subject was scanned twice per session, allowing determination of test-retest reliability both in back-to-back (intra-session) and scan-rescan images (inter-session). Cordial was applied for lumbar cord segmentation twice per image by two raters, allowing for assessment of intra- and inter-rater reliability, and compared to a manual gold standard. While manually segmented volumes were larger (mean: 2028±245 mm 3 vs. Cordial: 1636±300 mm 3 , p<0.001), accuracy assessments between manually and semi-automatically segmented images showed a mean Dice-coefficient of 0.88±0.05. Calculation of within-subject coefficients of variation (COV) demonstrated high intra-session (1.22-1.86%), inter-session (1.26-1.84%), as well as intra-rater (1.73-1.83%) reproducibility. No significant difference was shown between intra- and inter-session reproducibility or between intra-rater reliabilities. Although inter-rater reproducibility (COV: 2.87%) was slightly lower compared to all other reproducibility measures, between rater consistency was very strong (intraclass correlation coefficient: 0.974). While under-estimating the lumbar SCV, Cordial still provides excellent inter- and intra-session reproducibility showing high potential for application in longitudinal trials. • Lumbar spinal cord segmentation using the semi-automated cord image analyser (Cordial) is feasible. • Lumbar spinal cord is 40-mm cord segment 60 mm above conus medullaris. • Cordial provides excellent inter- and intra-session reproducibility in lumbar spinal cord region. • Cordial shows high potential for application in longitudinal trials.

  20. [Reliability of nursing outcomes classification label "Knowledge: cardiac disease management (1830)" in outpatients with heart failure].

    Science.gov (United States)

    Cañón-Montañez, Wilson; Oróstegui-Arenas, Myriam

    2015-01-01

    To determine the reliability (internal consistency, inter-rater reproducibility and level of agreement) of nursing outcome: "Knowledge: cardiac disease management (1830)" of the version published in Spanish, in outpatients with heart failure. A reliability study was conducted on 116 outpatients with heart failure. Six indicators of nursing outcome were operationalized. All participants were assessed simultaneously by two evaluators. Three evaluation periods were defined: initial (at baseline), final (a month later), and follow-up (two months later). Internal consistency by Cronbach alpha coefficient, inter-rater reproducibility with intraclass correlation coefficient of reproducibility or agreement and level agreement using the 95% limits of Bland and Altman. Cronbach's alpha was 0.83 (95% CI: 0.77 - 0.89) in the final evaluation, and follow-up values of 0.85 (95% CI: 0.82-0.89) and 0.83 (95% CI: 0.78 - 0.88) were found for the first and second evaluator, respectively. The intraclass correlation coefficient showed values greater 0.9 in the three evaluation periods in both the random and mixed model. The Bland-Altman 95% limits of agreement were close to zero in the three evaluations performed. The questionnaire operationalized to assess the nursing outcome: "Knowledge: cardiac disease management (1830)" in its Spanish version, is a reliable method to measure skills and knowledge in outpatients with heart failure in the Colombian context. Copyright © 2015 Elsevier España, S.L.U. All rights reserved.

  1. A new music therapy engagement scale for persons with dementia.

    Science.gov (United States)

    Tan, Jane; Wee, Shiou-Liang; Yeo, Pei Shi; Choo, Juliet; Ritholz, Michele; Yap, Philip

    2018-05-25

    ABSTRACTObjectives:To develop and validate a new scale to assess music therapy engagement in persons with dementia (PWDs). A draft scale was derived from literature review and >2 years of qualitative recording of PWDs during music therapy. Content validity was attained through iterative consultations, trial sessions, and revisions. The final five-item Music Therapy Engagement scale for Dementia (MTED) assessed music and non-music related elements. Internal consistency and inter-rater reliability were assessed over 120 music therapy sessions. MTED was validated with the Greater Cincinnati Chapter Well-being Observation Tool, Holden Communication Scale, and Participant Engagement Observation Checklist - Music Sessions. A total of 62 PWDs (83.2 ± 7.7 years, modified version of the mini-mental state examination = 13.2/30 ± 4.1) in an acute hospital dementia unit were involved. The mean MTED score was 13.02/30 ± 4.27; internal consistency (Cronbach's α = 0.87) and inter-rater reliability (intra-class correlation = 0.96) were good. Principal component analysis revealed a one-factor structure with Eigen value > 1 (3.27), which explained 65.4% of the variance. MTED demonstrated good construct validity. The MTED total score correlated strongly with the combined items comprising Pleasure, Interest, Sadness, and Sustained attention of the Greater Cincinnati Chapter Well-being Observation Tool (rs = 0.88, p < 0.001). MTED is a clinically appropriate and psychometrically valid scale to evaluate music therapy engagement in PWDs.

  2. Psychometrics and the neuroscience of individual differences: Internal consistency limits between-subjects effects.

    Science.gov (United States)

    Hajcak, Greg; Meyer, Alexandria; Kotov, Roman

    2017-08-01

    In the clinical neuroscience literature, between-subjects differences in neural activity are presumed to reflect reliable measures-even though the psychometric properties of neural measures are almost never reported. The current article focuses on the critical importance of assessing and reporting internal consistency reliability-the homogeneity of "items" that comprise a neural "score." We demonstrate how variability in the internal consistency of neural measures limits between-subjects (i.e., individual differences) effects. To this end, we utilize error-related brain activity (i.e., the error-related negativity or ERN) in both healthy and generalized anxiety disorder (GAD) participants to demonstrate options for psychometric analyses of neural measures; we examine between-groups differences in internal consistency, between-groups effect sizes, and between-groups discriminability (i.e., ROC analyses)-all as a function of increasing items (i.e., number of trials). Overall, internal consistency should be used to inform experimental design and the choice of neural measures in individual differences research. The internal consistency of neural measures is necessary for interpreting results and guiding progress in clinical neuroscience-and should be routinely reported in all individual differences studies. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  3. Effects of Rating Purpose and Rater Self-Esteem on Performance Ratings.

    Science.gov (United States)

    1983-03-01

    examined in a laboratory study, using a 2x2 analysis of variance design. Results indicate that low self - esteem raters assign significantly higher...design. Results indicate that low self - esteem raters assign significantly higher performance ratings when performance appraisal information will be used...studies indicated that individuals low in self - esteem have less self -confidence, feel less competent, and rely more on others’ opinions than do individuals

  4. A Simulation Study of Rater Agreement Measures with 2x2 Contingency Tables

    Science.gov (United States)

    Ato, Manuel; Lopez, Juan Jose; Benavente, Ana

    2011-01-01

    A comparison between six rater agreement measures obtained using three different approaches was achieved by means of a simulation study. Rater coefficients suggested by Bennet's [sigma] (1954), Scott's [pi] (1955), Cohen's [kappa] (1960) and Gwet's [gamma] (2008) were selected to represent the classical, descriptive approach, [alpha] agreement…

  5. The reliability of WorkWell Systems Functional Capacity Evaluation: a systematic review

    Science.gov (United States)

    2014-01-01

    Background Functional capacity evaluation (FCE) determines a person’s ability to perform work-related tasks and is a major component of the rehabilitation process. The WorkWell Systems (WWS) FCE (formerly known as Isernhagen Work Systems FCE) is currently the most commonly used FCE tool in German rehabilitation centres. Our systematic review investigated the inter-rater, intra-rater and test-retest reliability of the WWS FCE. Methods We performed a systematic literature search of studies on the reliability of the WWS FCE and extracted item-specific measures of inter-rater, intra-rater and test-retest reliability from the identified studies. Intraclass correlation coefficients ≥ 0.75, percentages of agreement ≥ 80%, and kappa coefficients ≥ 0.60 were categorised as acceptable, otherwise they were considered non-acceptable. The extracted values were summarised for the five performance categories of the WWS FCE, and the results were classified as either consistent or inconsistent. Results From 11 identified studies, 150 item-specific reliability measures were extracted. 89% of the extracted inter-rater reliability measures, all of the intra-rater reliability measures and 96% of the test-retest reliability measures of the weight handling and strength tests had an acceptable level of reliability, compared to only 67% of the test-retest reliability measures of the posture/mobility tests and 56% of the test-retest reliability measures of the locomotion tests. Both of the extracted test-retest reliability measures of the balance test were acceptable. Conclusions Weight handling and strength tests were found to have consistently acceptable reliability. Further research is needed to explore the reliability of the other tests as inconsistent findings or a lack of data prevented definitive conclusions. PMID:24674029

  6. Reliability of histologic assessment in patients with eosinophilic oesophagitis.

    Science.gov (United States)

    Warners, M J; Ambarus, C A; Bredenoord, A J; Verheij, J; Lauwers, G Y; Walsh, J C; Katzka, D A; Nelson, S; van Viegen, T; Furuta, G T; Gupta, S K; Stitt, L; Zou, G; Parker, C E; Shackelton, L M; D Haens, G R; Sandborn, W J; Dellon, E S; Feagan, B G; Collins, M H; Jairath, V; Pai, R K

    2018-04-01

    The validity of the eosinophilic oesophagitis (EoE) histologic scoring system (EoEHSS) has been demonstrated, but only preliminary reliability data exist. Formally assess the reliability of the EoEHSS and additional histologic features. Four expert gastrointestinal pathologists independently reviewed slides from adult patients with EoE (N = 45) twice, in random order, using standardised training materials and scoring conventions for the EoEHSS and additional histologic features agreed upon during a modified Delphi process. Intra- and inter-rater reliability for scoring the EoEHSS, a visual analogue scale (VAS) of overall histopathologic disease severity, and additional histologic features were assessed using intra-class correlation coefficients (ICCs). Almost perfect intra-rater reliability was observed for the composite EoEHSS scores and the VAS. Inter-rater reliability was also almost perfect for the composite EoEHSS scores and substantial for the VAS. Of the EoEHSS items, eosinophilic inflammation was associated with the highest ICC estimates and consistent with almost perfect intra- and inter-rater reliability. With the exception of dyskeratotic epithelial cells and surface epithelial alteration, ICC estimates for the remaining EoEHSS items were above the benchmarks for substantial intra-rater, and moderate inter-rater reliability. Estimation of peak eosinophil count and number of lamina propria eosinophils were associated with the highest ICC estimates among the exploratory items. The composite EoEHSS and most component items are associated with substantial reliability when assessed by central pathologists. Future studies should assess responsiveness of the score to change after a therapeutic intervention to facilitate its use in clinical trials. © 2018 John Wiley & Sons Ltd.

  7. On the internal consistency of the term structure of forecasts of housing starts

    DEFF Research Database (Denmark)

    Pierdzioch, C.; Rulke, J. C.; Stadtmann, G.

    2013-01-01

    We use the term structure of forecasts of housing starts to test for rationality of forecasts. Our test is based on the idea that short-term and long-term forecasts should be internally consistent. We test the internal consistency of forecasts using data for Australia, Canada, Japan and the United...

  8. Evaluating Rater Responses to an Online Training Program for L2 Writing Assessment

    Science.gov (United States)

    Elder, Catherine; Barkhuizen, Gary; Knoch, Ute; von Randow, Janet

    2007-01-01

    The use of online rater self-training is growing in popularity and has obvious practical benefits, facilitating access to training materials and rating samples and allowing raters to reorient themselves to the rating scale and self monitor their behaviour at their own convenience. However there has thus far been little research into rater…

  9. Comparability of Mayo-Portland Adaptability Inventory ratings by staff, significant others and people with acquired brain injury.

    Science.gov (United States)

    Malec, James F

    2004-06-01

    To determine the internal consistency, reliability and comparability of the Mayo-Portland Adaptability Inventory (MPAI-4) and sub-scales completed by people with acquired brain injury (ABI), family and significant others (SO) and rehabilitation staff. 134 people with ABI consecutively seen for outpatient rehabilitation evaluation. MPAI-4 protocols based on independent ratings by the people with ABI undergoing evaluation, SO and rehabilitation staff were submitted to Rasch Facets analysis to determine the internal consistency of the overall measure and sub-scales (Ability, Adjustment and Participation indices) for each rater group and for a composite measure based on all rater groups. Rater agreement for individual items was also examined. Rasch indicators of internal consistency were entirely within acceptable limits for 3-rater composite full scale and sub-scale measures; these indicators were generally within acceptable limits for measures based on a single rater group. Item agreement was generally acceptable; disagreements suggested various sources of bias for specific rater groups. The MPAI-4 possesses satisfactory internal consistency regardless of rating source. A composite measure based on ratings made independently by people with ABI, SO and staff may serve as a 'gold standard' for research purposes. In the clinical setting, assessment of varying perspectives and biases may not only best represent outcome as evaluated by all parties involved but be essential to developing effective rehabilitation plans.

  10. Putting Raters in Ratees' Shoes: Perspective Taking and Assessment of Creative Products

    Science.gov (United States)

    Han, Jiantao; Long, Haiying; Pang, Weiguo

    2017-01-01

    This study reported 2 experiments that studied the effect of perspective taking on assessment of creative products by using human raters. Forty responses of 2 alternative uses tasks (AUTs) and 15 alien stories generated by 6th-grade students were used as assessment materials. Undergraduate students as the novice raters assessed the products under…

  11. Versão Brasileira da Avaliação Sensorial de Nottingham: validade, concordância e confiabilidade Brazilian version of the Nottingham Sensory Assessment: validity, agreement and reliability

    Directory of Open Access Journals (Sweden)

    Daniela H. F. Lima

    2010-04-01

    Full Text Available OBJETIVO: Verificar a concordância inter e intraexaminador, validade construtiva e consistência interna da versão brasileira do instrumento Nottingham Sensory Assessment para hemiparéticos após acidente vascular encefálico (AVE. MÉTODOS: O instrumento foi traduzido para língua portuguesa com base na sua versão original em Inglês por um tradutor bilíngue e, posteriormente, revertido para a língua inglesa. Vinte e um hemiparéticos foram avaliados por dois examinadores pela Avaliação Sensorial de Nottingham para pacientes pós-AVE (ASN e pelo Protocolo de Desempenho Físico de Fugl-Meyer (FM. RESULTADOS: Foi encontrada correlação entre os instrumentos FM e ASN (0,752; excelente consistência interna da ASN (0,86; excelentes coeficientes de concordância interexaminador e intraexaminador para todos os itens da ASN, exceto temperatura e efeito teto significativo para ASN e FM. CONCLUSÃO: A versão brasileira da Nottingham Sensory Assessment cumpriu os critérios de concordância, consistência interna e validade concorrente, sendo um instrumento de rápida e fácil aplicação, podendo ser utilizada nos ambulatórios de neuroreabilitação para avaliar a função sensorial pós-AVE. O efeito teto significativo da ASN não limita seu uso, tendo em vista que, para os mesmos pacientes, o Protocolo de Fugl-Meyer também revelou efeito teto.OBJECTIVES: To investigate the inter-rater and intra-rater reliability, construct validity and internal consistency of the Brazilian version of the Nottingham Sensory Assessment for Stroke Patients (NSA. METHODS: The instrument was translated into Portuguese from its original in English by a bilingual translator and was then back-translated into English. Twenty-one hemiparetics were evaluated by two examiners using the NSA and the Fugl-Meyer Assessment (FMA of physical performance. RESULTS: Significant correlation were found between the FMA and the NSA (r=0.752. The NSA showed excellent internal

  12. Inter-Industry and Inter-Firm Wage and Hours Differentials in Switzerland

    OpenAIRE

    José V. Ramirez

    2000-01-01

    In the present paper, we analyse the role of demand factors on wages and hours in Switzerland. To accomplish this task, we used the 1996 Swiss Wage Structure Survey, a large employee-employer survey. Results indicate that capital intensity appears to have a certain impact on the relation between wages and hours: the "inter-industry wage-hours differentials line" we inferred is clearly positive. Further, an analysis of the determinants of inter-firm wage differentials shows that the internal o...

  13. Can Raters with Reduced Job Descriptive Information Provide Accurate Position Analysis Questionnaire (PAQ) Ratings?

    Science.gov (United States)

    Friedman, Lee; Harvey, Robert J.

    1986-01-01

    Job-naive raters provided with job descriptive information made Position Analysis Questionnaire (PAQ) ratings which were validated against ratings of job analysts who were also job content experts. None of the reduced job descriptive information conditions enabled job-naive raters to obtain either acceptable levels of convergent validity with…

  14. Translation, adaptation and validation of "Community Integration Questionnaire"

    Directory of Open Access Journals (Sweden)

    Helena Maria Silveira Fraga-Maia

    2015-05-01

    Full Text Available Objective: To translate, adapt, and validate the "Community Integration Questionnaire (CIQ," a tool that evaluates community integration after traumatic brain injury (TBI.Methods: A study of 61 TBI survivors was carried out. The appraisal of the measurement equivalence was based on a reliability assessment by estimating inter-rater agreement, item-scale correlation and internal consistency of CIQ scales, concurrent validity, and construct validity.Results: Inter-rater agreement ranged from substantial to almost perfect. The item-scale correlations were generally higher between the items and their respective domains, whereas the intra-class correlation coefficients were high for both the overall scale and the CIQ domains. The correlation between the CIQ and Disability Rating Scale (DRS, the Extended Glasgow Outcome Scale (GOSE, and the Rancho Los Amigos Level of Cognitive Functioning Scale (RLA reached values considered satisfactory. However, the factor analysis generated four factors (dimensions that did not correspond with the dimensional structure of the original tool.Conclusion: The resulting tool herein may be useful in globally assessing community integration after TBI in the Brazilian context, at least until new CIQ psychometric assessment studies are developed with larger samples.

  15. Assessment of competence for caesarean section with global rating scale

    International Nuclear Information System (INIS)

    Qureshi, R.N.; Ali, S.K.

    2013-01-01

    Objective: To establish as reliable and valid the nine-point global rating scale for assessing residents' independent performance of Caesarean Section. Methods: The validation study was conducted at the Department of Obstetrics and Gynaecology, Aga Khan University Hospital, from April to December 2008, and comprised 15 residents during 40 Caesarean Sections over 9 months. Independently two evaluators rated each procedure and the difficulty of each case. Results: The observations per faculty ranged from 1-8 (mean 4.07+- 2.56). The Year 4 residents were observed the most i.e. 32 (40%), followed by Year 3, 30 (37.5%); Year 2; 14 (17.5%); and Year 1, 4 (5%). Mean time required for observation of the surgery was 43.81+-14.28 (range: 20-90) with a mode of 45 min. Mean aggregate rating on all items showed gradual progression with the year of residency. The assessment tool had an internal consistency reliability (Cronbach's alpha) of 0.9097 with low inter-rater reliability. Conclusion: The evaluation tool was found to be reliable and valid for evaluating a resident's competence for performing Caesarean Section. Training of the assessors is required for a better inter-rater agreement. (author)

  16. Cultural values and performance appraisal: assessing the effects of rater self-construal on performance ratings.

    Science.gov (United States)

    Mishra, Vipanchi; Roch, Sylvia G

    2013-01-01

    Much of the prior research investigating the influence of cultural values on performance ratings has focused either on conducting cross-national comparisons among raters or using cultural level individualism/collectivism scales to measure the effects of cultural values on performance ratings. Recent research has shown that there is considerable within country variation in cultural values, i.e. people in one country can be more individualistic or collectivistic in nature. Taking the latter perspective, the present study used Markus and Kitayama's (1991) conceptualization of independent and interdependent self-construals as measures of individual variations in cultural values to investigate within culture variations in performance ratings. Results suggest that rater self-construal has a significant influence on overall performance evaluations; specifically, raters with a highly interdependent self-construal tend to show a preference for interdependent ratees, whereas raters high on independent self-construal do not show a preference for specific type of ratees when making overall performance evaluations. Although rater self-construal significantly influenced overall performance evaluations, no such effects were observed for specific dimension ratings. Implications of these results for performance appraisal research and practice are discussed.

  17. Evaluation of Freehand B-Mode and Power-Mode 3D Ultrasound for Visualisation and Grading of Internal Carotid Artery Stenosis.

    Directory of Open Access Journals (Sweden)

    Johann Otto Pelz

    Full Text Available Currently, colour-coded duplex sonography (2D-CDS is clinical standard for detection and grading of internal carotid artery stenosis (ICAS. However, unlike angiographic imaging modalities, 2D-CDS assesses ICAS by its hemodynamic effects rather than luminal changes. Aim of this study was to evaluate freehand 3D ultrasound (3DUS for direct visualisation and quantification of ICAS.Thirty-seven patients with 43 ICAS were examined with 2D-CDS as reference standard and with freehand B-mode respectively power-mode 3DUS. Stenotic value of 3D reconstructed ICAS was calculated as distal diameter respectively distal cross-sectional area (CSA reduction percentage and compared with 2D-CDS.There was a trend but no significant difference in successful 3D reconstruction of ICAS between B-mode and power mode (examiner 1 {Ex1} 81% versus 93%, examiner 2 {Ex2} 84% versus 88%. Inter-rater agreement was best for power-mode 3DUS and assessment of stenotic value as distal CSA reduction percentage (intraclass correlation coefficient {ICC} 0.90 followed by power-mode 3DUS and distal diameter reduction percentage (ICC 0.81. Inter-rater agreement was poor for B-mode 3DUS (ICC, distal CSA reduction 0.36, distal diameter reduction 0.51. Intra-rater agreement for power-mode 3DUS was good for both measuring methods (ICC, distal CSA reduction 0.88 {Ex1} and 0.78 {Ex2}; ICC, distal diameter reduction 0.83 {Ex1} and 0.76 {Ex2}. In comparison to 2D-CDS inter-method agreement was good and clearly better for power-mode 3DUS (ICC, distal diameter reduction percentage: Ex1 0.85, Ex2 0.78; distal CSA reduction percentage: Ex1 0.63, Ex2 0.57 than for B-mode 3DUS (ICC, distal diameter reduction percentage: Ex1 0.40, Ex2 0.52; distal CSA reduction percentage: Ex1 0.15, Ex2 0.51.Non-invasive power-mode 3DUS is superior to B-mode 3DUS for imaging and quantification of ICAS. Thereby, further studies are warranted which should now compare power-mode 3DUS with the angiographic gold standard

  18. Evaluation of Freehand B-Mode and Power-Mode 3D Ultrasound for Visualisation and Grading of Internal Carotid Artery Stenosis.

    Science.gov (United States)

    Pelz, Johann Otto; Weinreich, Anna; Karlas, Thomas; Saur, Dorothee

    2017-01-01

    Currently, colour-coded duplex sonography (2D-CDS) is clinical standard for detection and grading of internal carotid artery stenosis (ICAS). However, unlike angiographic imaging modalities, 2D-CDS assesses ICAS by its hemodynamic effects rather than luminal changes. Aim of this study was to evaluate freehand 3D ultrasound (3DUS) for direct visualisation and quantification of ICAS. Thirty-seven patients with 43 ICAS were examined with 2D-CDS as reference standard and with freehand B-mode respectively power-mode 3DUS. Stenotic value of 3D reconstructed ICAS was calculated as distal diameter respectively distal cross-sectional area (CSA) reduction percentage and compared with 2D-CDS. There was a trend but no significant difference in successful 3D reconstruction of ICAS between B-mode and power mode (examiner 1 {Ex1} 81% versus 93%, examiner 2 {Ex2} 84% versus 88%). Inter-rater agreement was best for power-mode 3DUS and assessment of stenotic value as distal CSA reduction percentage (intraclass correlation coefficient {ICC} 0.90) followed by power-mode 3DUS and distal diameter reduction percentage (ICC 0.81). Inter-rater agreement was poor for B-mode 3DUS (ICC, distal CSA reduction 0.36, distal diameter reduction 0.51). Intra-rater agreement for power-mode 3DUS was good for both measuring methods (ICC, distal CSA reduction 0.88 {Ex1} and 0.78 {Ex2}; ICC, distal diameter reduction 0.83 {Ex1} and 0.76 {Ex2}). In comparison to 2D-CDS inter-method agreement was good and clearly better for power-mode 3DUS (ICC, distal diameter reduction percentage: Ex1 0.85, Ex2 0.78; distal CSA reduction percentage: Ex1 0.63, Ex2 0.57) than for B-mode 3DUS (ICC, distal diameter reduction percentage: Ex1 0.40, Ex2 0.52; distal CSA reduction percentage: Ex1 0.15, Ex2 0.51). Non-invasive power-mode 3DUS is superior to B-mode 3DUS for imaging and quantification of ICAS. Thereby, further studies are warranted which should now compare power-mode 3DUS with the angiographic gold standard imaging

  19. Validation of a survey instrument to assess home environments for physical activity and healthy eating in overweight children

    Directory of Open Access Journals (Sweden)

    Crane Lori A

    2008-01-01

    Full Text Available Abstract Background Few measures exist to measure the overall home environment for its ability to support physical activity (PA and healthy eating in overweight children. The purpose of this study was to develop and test the reliability and validity of such a measure. Methods The Home Environment Survey (HES was developed to reflect availability, accessibility, parental role modelling, and parental policies related to PA resources, fruits and vegetables (F&V, and sugar sweetened drinks and snacks (SS. Parents of overweight children (n = 219 completed the HES and concurrent behavioural assessments. Children completed the Block Kids survey and wore an accelerometer for one week. A subset of parents (n = 156 completed the HES a second time to determine test-retest reliability. Finally, 41 parent dyads living in the same home (n = 41 completed the survey to determine inter-rater reliability. Initial psychometric analyses were completed to trim items from the measure based on lack of variability in responses, moderate or higher item to scale correlation, or contribution to strong internal consistency. Inter-rater and test-retest reliability were completed using intraclass correlation coefficients. Validity was assessed using Pearson correlations between the HES scores and child and parent nutrition and PA. Results Eight items were removed and acceptable internal consistency was documented for all scales (α = .66–84 with the exception of the F&V accessibility. The F&V accessibility was reduced to a single item because the other two items did not meet reliability standards. Test-retest reliability was high (r > .75 for all scales. Inter-rater reliability varied across scales (r = .22–.89. PA accessibility, parent role modelling, and parental policies were all related significantly to child (r = .14–.21 and parent (r = .15–.31 PA. Similarly, availability of F&V and SS, parental role modelling, and parental policies were related to child (r

  20. The validity and reliability of the Turkish version of Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) in patients with mild and moderate Alzheimer's disease and normal subjects.

    Science.gov (United States)

    Mavioglu, H; Gedizlioglu, M; Akyel, S; Aslaner, T; Eser, E

    2006-03-01

    The cognitive subscale of the Alzheimer's Disease Assesment Scale (ADAS-Cog) is the most widely used test in clinical trials dealing with Alzheimer's disease (AD). The aim of this study was to investigate the validity and reliability of the Turkish version of ADAS-Cog. Twenty-nine patients with AD, fulfilling NINCDS-ADRDA criteria of probable AD, who were in stage 3-5 according to the Global Deterioration Scale (GDS), and 27 non-demented control subjects with similar age, gender and educational status were recruited for the study. The Turkish version of ADAS-Cog, Standardized Mini Mental Status Examination (MMSE) and Short Orientation-Memory-Concentration Test (SOMCT) were applied to both of the groups. Inter-rater reliability, internal consistency, test-retest reliability; face validity, differential validity and convergent validity were statistically analyzed. Both MMSE and ADAS-Cog have significantly differentiated patients with AD and control subjects (p ADAS-Cog scores in AD group (r: -0.739). ADAS-Cog was also highly significantly correlated with GDS (r: 0.720) and SOMCT (r: 0.738). For the group with AD, control and whole cohort coefficients of internal consistency, Cronbach's alpha: 0.800, 0.515, 0.873 were found respectively. Inter-rater reliability for total ADAS-Cog score was found as ICC: 0.99 and 0.98 and test-retest reliability was found as ICC: 0.91 and 0.95 for demented and nondemented subjects, respectively. The Turkish version of ADAS-Cog has been found to be highly reliable and valid in differentiating patients with mild and moderate AD from nondemented subjects.

  1. Basal View Reference Photographs for Nasolabial Appearance Rating in Unilateral Cleft Lip and Palate.

    Science.gov (United States)

    Rubin, Marcie S; Lowe, Kristen M; Clouston, Sean; Shetye, Pradip R; Warren, Stephen M; Grayson, Barry H

    2015-07-01

    The Asher-McDade system is a 5-point ordinal scale frequently used to rate the components of nasolabial appearance, including nasal form and nasal symmetry, in unilateral cleft lip and palate. Although reference photographs illustrating this scale have been identified for the frontal and right profile view, no reference photographs exist for the basal view. The aim of this study was to identify reference photographs for nasal form and nasal symmetry from the basal view to illustrate this scale and facilitate its use. Four raters assessed nasolabial appearance (form and symmetry) on basal view photographs of 50 children (average age 8 years) with a repaired cleft lip. Intraclass correlation coefficients show fair to moderate inter-rater reliability. Cronbach α indicated strong agreement between raters (0.77 nasal form; 0.78 nasal symmetry; 0.80 overall), along with low duplicate measurement error and strong internal consistency between the measures. The photographs with the highest agreement among raters were selected to illustrate each point on the 5-point scale for nasal form and for nasal symmetry, resulting in the selection of 10 reference photographs. The basal view reference photograph set developed from this study may complement existing reference photograph sets for other views and facilitate rating tasks.

  2. Validity and internal consistency of a whiplash-specific disability measure

    NARCIS (Netherlands)

    Pinfold, Melanie; Niere, Ken R.; O'Leary, Elizabeth F.; Hoving, Jan Lucas; Green, Sally; Buchbinder, Rachelle

    2004-01-01

    STUDY DESIGN: Cross-sectional study of patients with whiplash-associated disorders investigating the internal consistency, factor structure, response rates, and presence of floor and ceiling effects of the Whiplash Disability Questionnaire (WDQ). OBJECTIVES: The aim of this study was to confirm the

  3. Reliability of physical examination tests for the diagnosis of knee disorders: Evidence from a systematic review.

    Science.gov (United States)

    Décary, Simon; Ouellet, Philippe; Vendittoli, Pascal-André; Desmeules, François

    2016-12-01

    Clinicians often rely on physical examination tests to guide them in the diagnostic process of knee disorders. However, reliability of these tests is often overlooked and may influence the consistency of results and overall diagnostic validity. Therefore, the objective of this study was to systematically review evidence on the reliability of physical examination tests for the diagnosis of knee disorders. A structured literature search was conducted in databases up to January 2016. Included studies needed to report reliability measures of at least one physical test for any knee disorder. Methodological quality was evaluated using the QAREL checklist. A qualitative synthesis of the evidence was performed. Thirty-three studies were included with a mean QAREL score of 5.5 ± 0.5. Based on low to moderate quality evidence, the Thessaly test for meniscal injuries reached moderate inter-rater reliability (k = 0.54). Based on moderate to excellent quality evidence, the Lachman for anterior cruciate ligament injuries reached moderate to excellent inter-rater reliability (k = 0.42 to 0.81). Based on low to moderate quality evidence, the Tibiofemoral Crepitus, Joint Line and Patellofemoral Pain/Tenderness, Bony Enlargement and Joint Pain on Movement tests for knee osteoarthritis reached fair to excellent inter-rater reliability (k = 0.29 to 0.93). Based on low to moderate quality evidence, the Lateral Glide, Lateral Tilt, Lateral Pull and Quality of Movement tests for patellofemoral pain reached moderate to good inter-rater reliability (k = 0.49 to 0.73). Many physical tests appear to reach good inter-rater reliability, but this is based on low-quality and conflicting evidence. High-quality research is required to evaluate the reliability of knee physical examination tests. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Consistent inter-individual differences in common marmosets (Callithrix jacchus) in Boldness-Shyness, Stress-Activity, and Exploration-Avoidance.

    Science.gov (United States)

    Šlipogor, Vedrana; Gunhold-de Oliveira, Tina; Tadić, Zoran; Massen, Jorg J M; Bugnyar, Thomas

    2016-09-01

    The study of animal personality, defined as consistent inter-individual differences in correlated behavioral traits stable throughout time and/or contexts, has recently become one of the fastest growing areas in animal biology, with study species ranging from insects to non-human primates. The latter have, however, only occasionally been tested with standardized experiments. Instead their personality has usually been assessed using questionnaires. Therefore, this study aimed to test 21 common marmosets (Callithrix jacchus) living in three family groups, in five different experiments, and their corresponding controls. We found that behavioral differences between our animals were not only consistent over time, but also across different contexts. Moreover, the consistent behaviors formed a construct of four major non-social personality components: Boldness-Shyness in Foraging, Boldness-Shyness in Predation, Stress-Activity, and Exploration-Avoidance. We found no sex or age differences in these components, but our results did reveal differences in Exploration-Avoidance between the three family groups. As social environment can have a large influence on behavior of individuals, our results may suggest group-level similarity in personality (i.e., "group personality") in common marmosets, a species living in highly cohesive social groups. Am. J. Primatol. 78:961-973, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  5. [Reliability and Validity of the Behavioral Check List for Preschool Children to Measure Attention Deficit Hyperactivity Behaviors].

    Science.gov (United States)

    Tsuno, Kanami; Yoshimasu, Kouichi; Hayashi, Takashi; Tatsuta, Nozomi; Ito, Yuki; Kamijima, Michihiro; Nakai, Kunihiko

    2018-01-01

    Nowadays, attention deficit hyperactivity (ADH) problems are observed commonly among school-age children. However, questionnaires specific to ADH behaviors among preschool children are very few. The aim of this study was to investigate the reliability and validity of the 25-item Behavioral Check List (BCL), which was developed from interviews of parents with children who were diagnosed as having Attention-deficit/hyperactivity disorder (ADHD) and measures ADH behaviors in preschool age. We recruited 22 teachers from 10 nurseries/kindergartens in Miyagi Prefecture, Japan. A total of 138 preschool children were assessed using the BCL. To investigate inter-rater reliability, two teachers from each facility assess seven to twenty children in their class, and intraclass correlation coefficients (ICCs) were calculated. The teachers additionally answered questions in the 1/5-5 Caregiver-Teacher Report Form (C-TRF) to investigate the criterion validity of the BCL. To investigate structural validity, exploratory factor analysis with promax rotation and confirmatory factor analysis were performed. The internal consistency reliability of the BCL was good (α = 0.92) and correlation analyses also confirmed its excellent criterion validity. Although exploratory factor analysis for the BCL yielded a five-factor model that consisted of a factor structure different from that of the original one, the results were similar to the original six factors. The ICCs of the BCL were 0.38-0.99 and it was not high enough for inter-rater reliability in some facilities. However, there is a possibility to improve it by giving raters adequate explanations when using BCL. The present study showed acceptable levels of reliability and validity of the BCL among Japanese preschool children.

  6. Inter- and intra-rater reliability of patellofemoral kinematic and contact area quantification by fast spin echo MRI and correlation with cartilage health by quantitative T1ρ MRI.

    Science.gov (United States)

    Lau, Brian C; Thuillier, Daniel U; Pedoia, Valentina; Chen, Ellison Y; Zhang, Zhihong; Feeley, Brian T; Souza, Richard B

    2016-01-01

    Patellar maltracking is a leading cause of patellofemoral pain syndrome (PFPS). The aim of this study was to determine the inter- and intra-rater reliability of a semi-automated program for magnetic resonance imaging (MRI) based patellofemoral kinematics. Sixteen subjects (10 with PFPS [mean age 32.3; SD 5.2; eight females] and six controls without PFPS 19 [mean age 28.6; SD 2.8; three females]) participated in the study. One set of T2-weighted, fat-saturated fast spin-echo (FSE) MRIs were acquired from each subject in full extension and 30° of knee flexion. MRI including axial T1ρ relaxation time mapping sequences was also performed on each knee. Following image acquisitions, regions of interest for kinematic MRI, and patellar and trochlear cartilage were segmented and quantified with in-house designed spline- based MATLAB semi-automated software. Intraclass Correlations Coefficients (ICC) of calculated kinematic parameters were good to excellent, ICC > 0.8 in patellar flexion, rotation, tilt, and translation (anterior -posterior, medial -lateral, and superior -inferior), and contact area translation. Only patellar tilt in the flexed position and motion from extended to flexed state was significantly different between PFPS and control patients (p=0.002 and p=0.006, respectively). No significant correlations were identified between patellofemoral kinematics and contact area with T1ρ relaxation times. A semi-automated, spline-based kinematic MRI technique for patellofemoral kinematic and contact area quantification is highly reproducible with the potential to help better understand the role of patellofemoral maltracking in PFPS and other knee disorders. Level IV. Published by Elsevier B.V.

  7. Inter- and intra-rater reliability of patellofemoral kinematic and contact area quantification by fast spin echo MRI and correlation with cartilage health by quantitative T1ρ MRI☆

    Science.gov (United States)

    Lau, Brian C.; Thuillier, Daniel U.; Pedoia, Valentina; Chen, Ellison Y.; Zhang, Zhihong; Feeley, Brian T.; Souza, Richard B.

    2016-01-01

    Background Patellar maltracking is a leading cause of patellofemoral pain syndrome (PFPS). The aim of this study was to determine the inter- and intra-rater reliability of a semi-automated program for magnetic resonance imaging (MRI) based patellofemoral kinematics. Methods Sixteen subjects (10 with PFPS [mean age 32.3; SD 5.2; eight females] and six controls without PFPS 19 [mean age 28.6; SD 2.8; three females]) participated in the study. One set of T2-weighted, fat-saturated fast spin-echo (FSE) MRIs were acquired from each subject in full extension and 30° of knee flexion. MRI including axial T1ρ relaxation time mapping sequences was also performed on each knee. Following image acquisitions, regions of interest for kinematic MRI, and patellar and trochlear cartilage were segmented and quantified with in-house designed spline- based MATLAB semi-automated software. Results Intraclass Correlations Coefficients (ICC) of calculated kinematic parameters were good to excellent, ICC > 0.8 in patellar flexion, rotation, tilt, and translation (anterior -posterior, medial -lateral, and superior -inferior), and contact area translation. Only patellar tilt in the flexed position and motion from extended to flexed state was significantly different between PFPS and control patients (p = 0.002 and p = 0.006, respectively). No significant correlations were identified between patellofemoral kinematics and contact area with T1ρ relaxation times. Conclusions A semi-automated, spline-based kinematic MRI technique for patellofemoral kinematic and contact area quantification is highly reproducible with the potential to help better understand the role of patellofemoral maltracking in PFPS and other knee disorders. PMID:26746045

  8. Validity and internal consistency of a whiplash-specific disability measure.

    Science.gov (United States)

    Pinfold, Melanie; Niere, Ken R; O'Leary, Elizabeth F; Hoving, Jan Lucas; Green, Sally; Buchbinder, Rachelle

    2004-02-01

    Cross-sectional study of patients with whiplash-associated disorders investigating the internal consistency, factor structure, response rates, and presence of floor and ceiling effects of the Whiplash Disability Questionnaire (WDQ). The aim of this study was to confirm the appropriateness of the proposed WDQ items. Whiplash injuries are a common cause of pain and disability after motor vehicle accidents. Neck disability questionnaires are often used in whiplash studies to assess neck pain but lack content validity for patients with whiplash-associated disorders. The newly developed WDQ measures functional limitations associated with whiplash injury and was designed after interviews with 83 patients with whiplash in a previous study. Researchers sought expert opinion on items of the WDQ, and items were then tested on a clinical whiplash population. Data were inspected to determine floor and ceiling effects, response rates, factor structure, and internal consistency. Packages of questionnaires were distributed to 55 clinicians, whose patients with whiplash completed and returned 101 questionnaires to researchers. No substantial floor or ceiling effects were identified on inspection of data. The overall floor effect was 12%, and the overall ceiling effect was 4%. Principal component analysis identified one broad factor that accounted for 65% of the variance in responses. Internal consistency was high; Cronbach's alpha = 0.96. Results of the study supported the retention of the 13 proposed items in a whiplash-specific disability questionnaire. Dependent on the results of further psychometric testing, the WDQ is likely to be an appropriate outcome measure for patients with whiplash.

  9. Inter-plant coordination and its relationships with supply chain integration and operational performance

    DEFF Research Database (Denmark)

    Yang, Cheng; Chaudhuri, Atanu; Farooq, Sami

    2016-01-01

    Based on the data obtained from the sixth version of International Manufacturing Strategy Survey (IMSS VI), this paper explores the relationships at the level of plant between (1) inter-plant coordination and operational performance, and (2) between inter-plant coordination and internal/external ......Based on the data obtained from the sixth version of International Manufacturing Strategy Survey (IMSS VI), this paper explores the relationships at the level of plant between (1) inter-plant coordination and operational performance, and (2) between inter-plant coordination and internal...

  10. The Karen instruments for measuring quality of nursing care: construct validity and internal consistency.

    Science.gov (United States)

    Lindgren, Margareta; Andersson, Inger S

    2011-06-01

    Valid and reliable instruments for measuring the quality of care are needed for evaluation and improvement of nursing care. Previously developed and evaluated instruments, the Karen-patient and the Karen-personnel based on Donabedian's Structure-Process-Outcome triad (S-P-O triad) had promising content validity, discriminative power and internal consistency. The objective of this study was to further develop the instruments with regard to construct validity and internal consistency. This prospective study was carried out in medical and surgical wards at a hospital in Sweden. A total of 95 patients and 120 personnel were included. The instruments were tested for construct validity by performing factor analyses in two steps and for internal consistency using Cronbach's alpha coefficient. The first confirmatory factor analyses, with a pre-determined three-factor solution did not load well according to the S-P-O triad, but the second exploratory factor analysis with a six-factor solution appeared to be more coherent and the distribution of variables seemed to be logical. The reliability, i.e. internal consistency, was good in both factor analyses. The Karen-patient and the Karen-personnel instruments have achieved acceptable levels of construct validity. The internal consistency of the instruments is good. This indicates that the instruments may be suitable to use in clinical practice for measuring the quality of nursing care.

  11. Psychometric properties of a test in evidence based practice: the Spanish version of the Fresno test

    Directory of Open Access Journals (Sweden)

    Jiménez-Villa Josep

    2010-06-01

    Full Text Available Abstract Background Validated instruments are needed to evaluate the programmatic impact of Evidence Based Practice (EBP training and to document the competence of individual trainees. This study aimed to translate the Fresno test into Spanish and subsequently validate it, in order to ensure the equivalence of the Spanish version against the original English version. Methods Before and after study performed between October 2007 and June 2008. Three groups of participants: (a Mentors of family medicine residents (expert group (n = 56; (b Family medicine physicians (intermediate experience group (n = 17; (c Family medicine residents (novice group (n = 202; Medical residents attended an EBP course, and two sets of the test were administered before and after the course. The Fresno test is a performance based measure for use in medical education that assesses EBP skills. The outcome measures were: inter-rater and intra-rater reliability, internal consistency, item analyses, construct validity, feasibility of administration, and responsiveness. Results Inter-rater correlations were 0.95 and 0.85 in the pre-test and the post-test respectively. The overall intra-rater reliability was 0.71 and 0.81 in the pre-test and post-test questionnaire, respectively. Cronbach's alpha was 0.88 and 0.77, respectively. 152 residents (75.2% returned both sets of the questionnaire. The observed effect size for the residents was 1.77 (CI 95%: 1.57-1.95, the standardised response mean was 1.65 (CI 95%:1.47-1.82. Conclusions The Spanish version of the Fresno test is a useful tool in assessing the knowledge and skills of EBP in Spanish-speaking residents of Family Medicine.

  12. Internal consistency of a Spanish translation of the Francis Scale of Attitude Toward Christianity Short Form.

    Science.gov (United States)

    Campo-Arias, Adalberto; Oviedo, Heidi Celina; Díaz, Carmen Elena; Cogollo, Zuleima

    2006-12-01

    This study evaluated the internal consistency of a Spanish version of the short form of the Francis Scale of Attitude Toward Christianity based on responses of 405 Colombian adolescent students ages 13 to 17 years. This translated short-form version of the scale had an internal consistency of .80. This estimate indicates suitable internal consistency reliability for research use in this population.

  13. Inter-observer and intra-observer reliability in the radiographic diagnosis of avascular necrosis of the femoral head following reconstructive hip surgery in children with cerebral palsy.

    Science.gov (United States)

    Hesketh, Kim; Sankar, Wudbhav; Joseph, Benjamin; Narayanan, Unni; Mulpuri, Kishore

    2016-04-01

    The incidence of avascular necrosis (AVN) following reconstructive hip surgery in cerebral palsy (CP) ranges from 0 to 69 % in the current literature. The purpose of this study was to determine the inter- and intra-observer reliability of radiographically diagnosing AVN in children with CP after hip surgery. A retrospective review of 65 children with CP who had reconstructive hip surgery between 2009 and 2012 at BC Children's Hospital was completed. Anterior-posterior and lateral radiographs were presented to four pediatric orthopaedic surgeons over two rounds. Surgeons were asked to review the set of unidentified radiographs and comment 'yes' or 'no' for the presence of AVN. Two weeks later the same set of radiographs was sent in a different order and the surgeons were again asked to comment on AVN. Inter- and intra-observer reliability was determined using kappa statistics. The intra-observer reliability ranged from 0.65 to 0.88 with an average score of 0.76. Inter-observer reliability showed greater variability, ranging from 0.41 to 0.77 with an average score of 0.56 across all surgeons. Although the intra-rater reliability produced a strength of "good" and the inter-rater reliability a strength of "moderate" agreement, the variability within these scores is clinically important as it demonstrates the difficulty in identifying AVN. This may explain the variability in AVN that is reported in the literature. The need for further education and research in the diagnosis of AVN in children with CP who have undergone reconstructive hip surgery is clinically necessary.

  14. [Factor analysis and internal consistency of pedagogical practices questionnaire among health care teachers].

    Science.gov (United States)

    Pérez V, Cristhian; Vaccarezza G, Giulietta; Aguilar A, César; Coloma N, Katherine; Salgado F, Horacio; Baquedano R, Marjorie; Chavarría R, Carla; Bastías V, Nancy

    2016-06-01

    Teaching practice is one of the most complex topics of the training process in medicine and other health care careers. The Teaching Practices Questionnaire (TPQ) evaluates teaching skills. To assess the factor structure and internal consistency of the Spanish version of the TPP among health care teachers. The TPQ was answered by 315 university teachers from 13 of the 15 administrative Chilean regions, who were selected through a non-probabilistic volunteer sampling. The internal consistency of TPP factors was calculated and the correlation between them was analyzed. Six factors were identified: Student-centered teaching, Teaching planning, Assessment process, Dialogue relationship, Teacher-centered teaching and Use of technological resources. They had Cronbach alphas ranging from 0.60 to 0.85. The factorial structure of TPQ differentiates the most important functions of teaching. It also shows a theoretical consistency and a practical relevance to perform a diagnosis and continuous evaluation of teaching practices. Additionally, it has an adequate internal consistency. Thus, TPQ is valid and reliable to evaluate pedagogical practices in health care careers.

  15. [Validity and internal consistency of the Maslach Burnout Inventory in Dental Students from Cartagena, Colombia].

    Science.gov (United States)

    Simancas-Pallares, Miguel Angel; Fortich Mesa, Natalia; González Martínez, Farith Damián

    To determine the internal consistency and content validity of the Maslach Burnout Inventory-Student Survey (MBI-SS) in dental students from Cartagena, Colombia. Scale validation study in 886 dental students from Cartagena, Colombia. Factor structure was determined through exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Internal consistency was measured using the Cronbach's alpha coefficient. Analyses were performed using the Stata v.13.2 for Windows (Statacorp., USA) and Mplus v.7.31 for Windows (Muthén & Muthén, USA) software. Internal consistency was α=.806. The factor structure showed three that accounted for the 56.6% of the variance. CFA revealed: χ 2 =926.036; df=85; RMSEA=.106 (90%CI, .100-.112); CFI=.947; TLI=.934. The MBI showed an adequate internal consistency and a factor structure being consistent with the original proposed structure with a poor fit, which does not reflect adequate content validity in this sample. Copyright © 2016 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.

  16. Psychometric study of the Required Care Levels for People with Severe Mental Disorder Assessment Scale (ENAR-TMG).

    Science.gov (United States)

    Lascorz, David; López, Victoria; Pinedo, Carmen; Trujols, Joan; Vegué, Joan; Pérez, Víctor

    2016-03-08

    People with severe mental disorder have significant difficulties in everyday life that involve the need for continued support. These needs are not easily measurable with the currently available tools. Therefore, a multidimensional scale that assesses the different levels of need for care is proposed, including a study of its psychometric properties. One-hundred and thirty-nine patients (58% men) with a severe mental disorder were assessed using the Required Care Levels for People with Severe Mental Disorder Assessment Scale (ENAR-TMG), the Camberwell Assessment of Need scale, and the Health of the Nation Outcome Scales. ENAR-TMG's psychometric features were examined by: a) evaluating 2 sources of validity evidence (evidence based on internal structure and evidence based on relations to other variables), and b) estimating the internal consistency, temporal stability, inter-rater reliability, and sensitivity to change of scores of the ENAR-TMG's subscales. Exploratory factor analyses revealed a one-factor structure for each of the theoretical dimensions of the scale, in which all but one showed a significant and positive correlation with the Camberwell Assessment of Need (range of r: 0.143-0.557) and Health of the Nation Outcome Scales (range of r: 0.241-0.474) scales. ENAR-TMG subscale scores showed acceptable internal consistency (range of ordinal α coefficients: 0.682-0.804), excellent test-retest (range of intraclass correlation coefficients: 0.889-0.999) and inter-rater reliabilities (range of intraclass correlation coefficients: 0.926-0.972), and satisfactory sensitivity to treatment-related changes (range of η 2 : 0.003-0.103). The satisfactory psychometric behaviour of the ENAR-TMG makes the scale a promising tool to assess global functioning in people with a severe mental disorder. Copyright © 2016 SEP y SEPB. Published by Elsevier España. All rights reserved.

  17. Development of the Italian version of the trunk impairment scale in subjects with acute and chronic stroke. Cross-cultural adaptation, reliability, validity and responsiveness.

    Science.gov (United States)

    Monticone, Marco; Ambrosini, Emilia; Verheyden, Geert; Brivio, Flavia; Brunati, Roberto; Longoni, Luca; Mauri, Gaia; Molteni, Alessandro; Nava, Claudia; Rocca, Barbara; Ferrante, Simona

    2017-09-10

    To cross-culturally adapt and psychometrically analyse the Italian version of the Trunk Impairment Scale on acute (cohort 1) and chronic stroke patients (cohort 2). The Trunk Impairment Scale was culturally adapted in accordance with international standards. The psychometric testing included: internal consistency (Cronbach's alpha), inter- and intra-rater reliability (intraclass correlation coefficient; standard error of measurement and minimal detectable change), construct validity by comparing Trunk Impairment Scale score with Barthel Index, motor subscale of Functional Independence Measure, and Trunk Control Test (Pearson's correlation), and responsiveness (Effect Size, Effect Size with Guyatt approach, standardized response mean, and Receiver Operating Characteristics curves). The Trunk Impairment Scale was administered to 125 and 116 acute and chronic stroke patients, respectively. Internal consistency was acceptable (α > 0.7), inter- and intra-rater reliability (ICC > 0.9, Minimal Detectable Change for total score  0.4) with all scales but the motor Functional Independence Measure in cohort 2. Distribution-based methods showed large effects in cohort 1 and moderate to large effects in cohort 2. The Minimal Important Difference was 3.5 both from patient's and therapist's perspective in cohort 1 and 2.5 and 1.5 from patient's and therapist's perspective, respectively, in cohort 2. The Trunk Impairment Scale was successfully translated into Italian and proved to be reliable, valid, and responsive. Its use is recommended for clinical and research purposes. Implications for Rehabilitation Trunk control is an essential part of balance and postural control, constituting an important prerequisite for daily activities and function. The TIS administered in subjects with subacute and chronic stroke was reliable, valid and responsive. The TIS is expected to help clinicians and researchers by identifying key functional processes related to disability in people

  18. Construct validity and internal consistency in the Leisure Practices Scale (EPL) for adults.

    Science.gov (United States)

    Andrade, Rubian Diego; Schwartz, Gisele Maria; Tavares, Giselle Helena; Pelegrini, Andreia; Teixeira, Clarissa Stefani; Felden, Érico Pereira Gomes

    2018-02-01

    This study proposes and analyzes the construct validity and internal consistency of the Leisure Practices Scale (EPL). This survey seeks to identify the preferences and involvement in in different leisure practices in adults. The instrument was formed based on the cultural leisure content (artistic, manual, physical, sports, intellectual, social, tourist, virtual and contemplation/leisure). The validation process was conducted with: a) content analysis by leisure experts, who evaluated the instrument for clarity of language and practical relevance, which allowed the calculation of the content validity coefficient (CVC); b) reproducibility test-retest with 51 subjects to calculate the temporal variation coefficient; c) internal consistency analysis with 885 participants. The evaluation presented appropriate coefficients, both with respect to language clarity (CVCt = 0.883) and practical relevance (CVCt = 0.879). The reproducibility coefficients were moderate to excellent. The scale showed adequate internal consistency (0.72). The EPL has psychometric quality and acceptable values in its structure, and can be used to investigate adult involvement in leisure activities.

  19. Evaluation of the female pelvic floor in pelvic organ prolapse using 3.0-Tesla diffusion tensor imaging and fibre tractography

    Energy Technology Data Exchange (ETDEWEB)

    Zijta, F.M. [University of Amsterdam, Department of Radiology, Academic Medical Centre, Amsterdam (Netherlands); Onze Lieve Vrouwe Gasthuis, Amsterdam and Department of Radiology, Amsterdam (Netherlands); Academic Medical Center, Department of Radiology, Amsterdam, AZ (Netherlands); Lakeman, M.M.E.; Roovers, J.P. [University of Amsterdam the Netherlands and Biomedical NMR, Amsterdam and Department of Gynaecology, Academic Medical Centre, Amsterdam (Netherlands); Froeling, M. [University of Amsterdam, Department of Radiology, Academic Medical Centre, Amsterdam (Netherlands); Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven (Netherlands); Paardt, M.P. van der; Borstlap, C.S.V.; Bipat, S.; Nederveen, A.J.; Stoker, J. [University of Amsterdam, Department of Radiology, Academic Medical Centre, Amsterdam (Netherlands); Montauban van Swijndregt, A.D. [Onze Lieve Vrouwe Gasthuis, Amsterdam and Department of Radiology, Amsterdam (Netherlands); Strijkers, G.J. [Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven (Netherlands)

    2012-12-15

    To prospectively explore the clinical application of diffusion tensor imaging (DTI) and fibre tractography in evaluating the pelvic floor. Ten patients with pelvic organ prolapse, ten with pelvic floor symptoms and ten asymptomatic women were included. A two-dimensional (2D) spin-echo (SE) echo-planar imaging (EPI) sequence of the pelvic floor was acquired. Offline fibre tractography and morphological analysis of pelvic magnetic resonance imaging (MRI) were performed. Inter-rater agreement for quality assessment of fibre tracking results was evaluated using weighted kappa ({kappa}). From agreed tracking results, eigen values ({lambda}1, {lambda}2, {lambda}3), mean diffusivity (MD) and fractional anisotropy (FA) were calculated. MD and FA values were compared using ANOVA. Inter-rater reliability of DTI parameters was interpreted using the intra-class correlation coefficient (ICC). Substantial inter-rater agreement was found ({kappa} = 0.71 [95% CI 0.63-0.78]). Four anatomical structures were reliably identified. Substantial inter-rater agreement was found for MD and FA (ICC 0.60-0.91). No significant differences between groups were observed for anal sphincter, perineal body and puboperineal muscle. A significant difference in FA was found for internal obturator muscle between the prolapse group and the asymptomatic group (0.27 {+-} 0.05 vs 0.22 {+-} 0.03; P = 0.015). DTI with fibre tractography permits identification of part of the clinically relevant pelvic structures. Overall, no significant differences in DTI parameters were found between groups. circle Diffusion tensor MRI offers new insights into female pelvic floor problems. (orig.)

  20. En Face Optical Coherence Tomography Angiography Imaging Versus Fundus Photography in the Measurement of Choroidal Nevi.

    Science.gov (United States)

    Lee, Michele D; Kaidonis, Georgia; Kim, Alice Y; Shields, Ryan A; Leng, Theodore

    2017-09-01

    Choroidal nevi are common benign intraocular tumors with a small risk of malignant transformation. This retrospective study investigates the use of en face spectral-domain optical coherence tomography angiography (SD-OCTA) in determining the clinical features and measurement of choroidal nevi. Patients with choroidal nevi were imaged with both OCTA and a fundus photography device. Greatest longitudinal dimension (GLD), perpendicular dimension (PD), and the GLD/PD ratio were assessed on each device. Inter-device variation and intra- and inter-rater reliability analyses were performed. Fourteen patients with choroidal nevi were included. No significant difference between the GLD/PD ratio as measured by all three devices was found (Chi-square = 2.8, 2 df, P = .247). Intraclass correlation coefficients were greater than 0.7 for repeated measures on all devices, suggesting good repeatability and reproducibility. This study demonstrated inter-device consistency and high intra- and inter-rater reliability when measuring choroidal nevi. [Ophthalmic Surg Lasers Imaging Retina. 2017;48:741-747.]. Copyright 2017, SLACK Incorporated.

  1. Introducing a true internal standard for the Comet assay to minimize intra- and inter-experiment variability in measures of DNA damage and repair

    Science.gov (United States)

    Zainol, Murizal; Stoute, Julia; Almeida, Gabriela M.; Rapp, Alexander; Bowman, Karen J.; Jones, George D. D.

    2009-01-01

    The Comet assay (CA) is a sensitive/simple measure of genotoxicity. However, many features of CA contribute variability. To minimize these, we have introduced internal standard materials consisting of ‘reference’ cells which have their DNA substituted with BrdU. Using a fluorescent anti-BrdU antibody, plus an additional barrier filter, comets derived from these cells could be readily distinguished from the ‘test’-cell comets, present in the same gel. In experiments to evaluate the reference cell comets as external and internal standards, the reference and test cells were present in separate gels on the same slide or mixed together in the same gel, respectively, before their co-exposure to X-irradiation. Using the reference cell comets as internal standards led to substantial reductions in the coefficient of variation (CoV) for intra- and inter-experimental measures of comet formation and DNA damage repair; only minor reductions in CoV were noted when the reference and test cell comets were in separate gels. These studies indicate that differences between individual gels appreciably contribute to CA variation. Further studies using the reference cells as internal standards allowed greater significance to be obtained between groups of replicate samples. Ultimately, we anticipate that development will deliver robust quality assurance materials for CA. PMID:19828597

  2. Standardized voluntary force measurement in a lower extremity rehabilitation robot

    Directory of Open Access Journals (Sweden)

    Bolliger Marc

    2008-10-01

    Full Text Available Abstract Background Isometric force measurements in the lower extremity are widely used in rehabilitation of subjects with neurological movement disorders (NMD because walking ability has been shown to be related to muscle strength. Therefore muscle strength measurements can be used to monitor and control the effects of training programs. A new method to assess isometric muscle force was implemented in the driven gait orthosis (DGO Lokomat. To evaluate the capabilities of this new measurement method, inter- and intra-rater reliability were assessed. Methods Reliability was assessed in subjects with and without NMD. Subjects were tested twice on the same day by two different therapists to test inter-rater reliability and on two separate days by the same therapist to test intra-rater reliability. Results Results showed fair to good reliability for the new measurement method to assess isometric muscle force of lower extremities. In subjects without NMD, intraclass correlation coefficients (ICC for inter-rater reliability ranged from 0.72 to 0.97 and intra-rater reliability from 0.71 to 0.90. In subjects with NMD, ICC ranged from 0.66 to 0.97 for inter-rater and from 0.50 to 0.96 for intra-rater reliability. Conclusion Inter- and intra- rater reliability of an assessment method for measuring maximal voluntary isometric muscle force of lower extremities was demonstrated. We suggest that this method is a valuable tool for documentation and controlling of the rehabilitation process in patients using a DGO.

  3. The Reliability of Quality of Upper Extremity Skills Test in Children with Cerebral Palsy

    Directory of Open Access Journals (Sweden)

    Nazila Akbar-Fahimi

    2012-01-01

    Full Text Available Objective: The aim of this study was to survey the reliability of Intra-rater and Inter-rater with and without video camera assessment in children with spastic cerebral palsy. Materials & Methods: In this cross-sectional study, we validate the Quality of Upper Extremity Skill Test questionnaire. Fifty children with hemiplegia aged 19 to 95 months (mean age 61.31 ± 25.7 month were enrolled in our study using non random available approach. After obtaining parents’ consent, intra-rater assessment was performed in one session and intera rater assessment with camera after 10 days. Then, the third examiner did the reassessment using film observation of 46 children from 50. Spearman correlation for survey the reliability of intra-rater & inter rater with & without video recording assessment & gross motor function classification system 66 for determined functionality of child were used. Results: Intra-rater correlation was 0.774-0.996, Inter-rater correlation was 0.663-0.998 and correlation for video camera assessment was 0.710-0.974 for the first and third evaluation and 0.652-0.938 for second and third evaluation. P value for sub scales and total score was P<0.01. Conclusion: There is a high correlation in Intra rater and inter rater assessment with and without video recording in Quality of Upper Extremity Skill Test in children with cerebral palsy. So that it can be used as a reliable test to evaluate Quality of Upper Extremity Skills in these children.

  4. Consistency and reliability of judgements by assessors of case based discussions in general practice specialty training programmes in the United Kingdom.

    Science.gov (United States)

    Bodgener, Susan; Denney, Meiling; Howard, John

    2017-01-01

    Case based discussions (CbDs) are a mandatory workplace assessment used throughout general practitioner (GP) specialty training; they contribute to the annual review of competence progression (ARCP) for each trainee. This study examined the judgements arising from CbDs made by different groups of assessors and whether or not these assessments supported ARCP decisions. The trainees selected were at the end of their first year of GP training and had been identified during their ARCPs to need extra training time. CbDs were specifically chosen as they are completed by both hospital and GP supervisors, enabling comparison between these two groups. The results raise concern with regard to the consistency of judgements made by different groups of assessors, with significant variance between assessors of different status and seniority. Further work needs to be done on whether the CbD in its current format is fit for purpose as one of the mandatory WPBAs for GP trainees, particularly during their hospital placements. There is a need to increase the inter-rater reliability of CbDs to ensure a consistent contribution to subsequent decisions about a trainee's overall progress.

  5. Factor validity and reliability of the aberrant behavior checklist-community (ABC-C) in an Indian population with intellectual disability.

    Science.gov (United States)

    Lehotkay, R; Saraswathi Devi, T; Raju, M V R; Bada, P K; Nuti, S; Kempf, N; Carminati, G Galli

    2015-03-01

    In this study realised in collaboration with the department of psychology and parapsychology of Andhra University, validation of the Aberrant Behavior Checklist-Community (ABC-C) in Telugu, the official language of Andhra Pradesh, one of India's 28 states, was carried out. To assess the factor validity and reliability of this Telugu version, 120 participants with moderate to profound intellectual disability (94 men and 26 women, mean age 25.2, SD 7.1) were rated by the staff of the Lebenshilfe Institution for Mentally Handicapped in Visakhapatnam, Andhra Pradesh, India. Rating data were analysed with a confirmatory factor analysis. The internal consistency was estimated by Cronbach's alpha. To confirm the test-retest reliability, 50 participants were rated twice with an interval of 4 weeks, and 50 were rated by pairs of raters to assess inter-rater reliability. Confirmatory factor analysis revealed that the root mean square error of approximation (RMSEA) was equal to 0.06, the comparative fit index (CFI) was equal to 0.77, and the Tucker Lewis index (TLI) was equal to 0.77, which indicated that the model with five correlated factors had a good fit. Coefficient alpha ranged from 0.85 to 0.92 across the five subscales. Spearman's rank correlation coefficients for inter-rater reliability tests ranged from 0.65 to 0.75, and the correlations for test-retest reliability ranged from 0.58 to 0.76. All reliability coefficients were statistically significant (P reliability of Telugu version of the ABC-C evidenced factor validity and reliability comparable to the original English version and appears to be useful for assessing behaviour disorders in Indian people with intellectual disabilities. © 2014 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.

  6. The Hong Kong version of the Oxford Cognitive Screen (HK-OCS): validation study for Cantonese-speaking chronic stroke survivors.

    Science.gov (United States)

    Kong, Anthony Pak-Hin; Lam, Pinky Hiu-Ping; Ho, Diana Wai-Lam; Lau, Johnny King; Humphreys, Glyn W; Riddoch, Jane; Weekes, Brendan

    2016-09-01

    This study reports the validation of the Hong Kong version of Oxford Cognitive Screen (HK-OCS). Seventy Cantonese-speaking healthy individuals participated to establish normative data and 46 chronic stroke survivors were assessed using the HK-OCS, Albert's Test of Visual Neglect, short test of gestural production, and Hong Kong version of the following assessments: Western Aphasia Battery, MMSE, MoCA, Modified Barthel Index, and Lawton Instrumental Activities of Daily Living scale. The validity of the HK-OCS was appraised by the difference between the two participant groups. Neurologically unimpaired individuals performed significantly better than stroke survivors on the HK-OCS. Positive and significant correlations found between cognitive subtests in the HK-OCS and related assessments indicated good concurrent validity. Excellent intra-rater and inter-rater reliabilities, fair test-retest reliability, and acceptable internal consistency suggested that the HK-OCS had good reliability. Specific HK-OCS subtests including semantics, episodic memory, number writing, and orientation were the best predictors of functional outcomes.

  7. Internal Consistency and Convergent Validity of the Klontz Money Behavior Inventory (KMBI

    Directory of Open Access Journals (Sweden)

    Colby D. Taylor

    2015-12-01

    Full Text Available The Klontz Money Behavior Inventory (KMBI is a standalone, multi-scale measure than can screen for the presence of eight distinct money disorders. Given the well-established relationship between mental health and financial behaviors, results from the KMBI can be used to inform both mental health care professionals and financial planners. The present study examined the internal consistency and convergent validity of the KMBI, through comparison with similar measures, among a sample of college students (n = 232. Results indicate that the KMBI demonstrates acceptable internal consistency reliability and some convergence for most subscales when compared to other analogous measures. These findings highlight a need for literature and assessments to identify and describe disordered money behaviors.

  8. Harmonization process and reliability assessment of anthropometric measurements in the elderly EXERNET multi-centre study.

    Directory of Open Access Journals (Sweden)

    Alba Gómez-Cabello

    Full Text Available BACKGROUND: The elderly EXERNET multi-centre study aims to collect normative anthropometric data for old functionally independent adults living in Spain. PURPOSE: To describe the standardization process and reliability of the anthropometric measurements carried out in the pilot study and during the final workshop, examining both intra- and inter-rater errors for measurements. MATERIALS AND METHODS: A total of 98 elderly from five different regions participated in the intra-rater error assessment, and 10 different seniors living in the city of Toledo (Spain participated in the inter-rater assessment. We examined both intra- and inter-rater errors for heights and circumferences. RESULTS: For height, intra-rater technical errors of measurement (TEMs were smaller than 0.25 cm. For circumferences and knee height, TEMs were smaller than 1 cm, except for waist circumference in the city of Cáceres. Reliability for heights and circumferences was greater than 98% in all cases. Inter-rater TEMs were 0.61 cm for height, 0.75 cm for knee-height and ranged between 2.70 and 3.09 cm for the circumferences measured. Inter-rater reliabilities for anthropometric measurements were always higher than 90%. CONCLUSION: The harmonization process, including the workshop and pilot study, guarantee the quality of the anthropometric measurements in the elderly EXERNET multi-centre study. High reliability and low TEM may be expected when assessing anthropometry in elderly population.

  9. Improving the Accuracy of Laplacian Estimation with Novel Variable Inter-Ring Distances Concentric Ring Electrodes

    Directory of Open Access Journals (Sweden)

    Oleksandr Makeyev

    2016-06-01

    Full Text Available Noninvasive concentric ring electrodes are a promising alternative to conventional disc electrodes. Currently, the superiority of tripolar concentric ring electrodes over disc electrodes, in particular, in accuracy of Laplacian estimation, has been demonstrated in a range of applications. In our recent work, we have shown that accuracy of Laplacian estimation can be improved with multipolar concentric ring electrodes using a general approach to estimation of the Laplacian for an (n + 1-polar electrode with n rings using the (4n + 1-point method for n ≥ 2. This paper takes the next step toward further improving the Laplacian estimate by proposing novel variable inter-ring distances concentric ring electrodes. Derived using a modified (4n + 1-point method, linearly increasing and decreasing inter-ring distances tripolar (n = 2 and quadripolar (n = 3 electrode configurations are compared to their constant inter-ring distances counterparts. Finite element method modeling and analytic results are consistent and suggest that increasing inter-ring distances electrode configurations may decrease the truncation error resulting in more accurate Laplacian estimates compared to respective constant inter-ring distances configurations. For currently used tripolar electrode configuration, the truncation error may be decreased more than two-fold, while for the quadripolar configuration more than a six-fold decrease is expected.

  10. Improving the Accuracy of Laplacian Estimation with Novel Variable Inter-Ring Distances Concentric Ring Electrodes

    Science.gov (United States)

    Makeyev, Oleksandr; Besio, Walter G.

    2016-01-01

    Noninvasive concentric ring electrodes are a promising alternative to conventional disc electrodes. Currently, the superiority of tripolar concentric ring electrodes over disc electrodes, in particular, in accuracy of Laplacian estimation, has been demonstrated in a range of applications. In our recent work, we have shown that accuracy of Laplacian estimation can be improved with multipolar concentric ring electrodes using a general approach to estimation of the Laplacian for an (n + 1)-polar electrode with n rings using the (4n + 1)-point method for n ≥ 2. This paper takes the next step toward further improving the Laplacian estimate by proposing novel variable inter-ring distances concentric ring electrodes. Derived using a modified (4n + 1)-point method, linearly increasing and decreasing inter-ring distances tripolar (n = 2) and quadripolar (n = 3) electrode configurations are compared to their constant inter-ring distances counterparts. Finite element method modeling and analytic results are consistent and suggest that increasing inter-ring distances electrode configurations may decrease the truncation error resulting in more accurate Laplacian estimates compared to respective constant inter-ring distances configurations. For currently used tripolar electrode configuration, the truncation error may be decreased more than two-fold, while for the quadripolar configuration more than a six-fold decrease is expected. PMID:27294933

  11. Online self-report questionnaire on computer work-related exposure (OSCWE): validity and internal consistency.

    Science.gov (United States)

    Mekhora, Keerin; Jalayondeja, Wattana; Jalayondeja, Chutima; Bhuanantanondh, Petcharatana; Dusadiisariyavong, Asadang; Upiriyasakul, Rujiret; Anuraktam, Khajornyod

    2014-07-01

    To develop an online, self-report questionnaire on computer work-related exposure (OSCWE) and to determine the internal consistency, face and content validity of the questionnaire. The online, self-report questionnaire was developed to determine the risk factors related to musculoskeletal disorders in computer users. It comprised five domains: personal, work-related, work environment, physical health and psychosocial factors. The questionnaire's content was validated by an occupational medical doctor and three physical therapy lecturers involved in ergonomic teaching. Twenty-five lay people examined the feasibility of computer-administered and the user-friendly language. The item correlation in each domain was analyzed by the internal consistency (Cronbach's alpha; alpha). The content of the questionnaire was considered congruent with the testing purposes. Eight hundred and thirty-five computer users at the PTT Exploration and Production Public Company Limited registered to the online self-report questionnaire. The internal consistency of the five domains was: personal (alpha = 0.58), work-related (alpha = 0.348), work environment (alpha = 0.72), physical health (alpha = 0.68) and psychosocial factor (alpha = 0.93). The findings suggested that the OSCWE had acceptable internal consistency for work environment and psychosocial factors. The OSCWE is available to use in population-based survey research among computer office workers.

  12. Clinical observed performance evaluation: a prospective study in final year students of surgery.

    LENUS (Irish Health Repository)

    Markey, G C

    2010-06-24

    We report a prospective study of clinical observed performance evaluation (COPE) for 197 medical students in the pre-qualification year of clinical education. Psychometric quality was the main endpoint. Students were assessed in groups of 5 in 40-min patient encounters, with each student the focus of evaluation for 8 min. Each student had a series of assessments in a 25-week teaching programme. Over time, several clinicians from a pool of 16 surgical consultants and registrars evaluated each student by direct observation. A structured rating form was used for assessment data. Variance component analysis (VCA), internal consistency and inter-rater agreement were used to estimate reliability. The predictive and convergent validity of COPE in relation to summative OSCE, long case, and overall final examination was estimated. Median number of COPE assessments per student was 7. Generalisability of a mean score over 7 COPE assessments was 0.66, equal to that of an 8 x 7.5 min station final OSCE. Internal consistency was 0.88-0.97 and inter-rater agreement 0.82. Significant correlations were observed with OSCE performance (R = 0.55 disattenuated) and long case (R = 0.47 disattenuated). Convergent validity was 0.81 by VCA. Overall final examination performance was linearly related to mean COPE score with standard error 3.7%. COPE permitted efficient serial assessment of a large cohort of final year students in a real world setting. Its psychometric quality compared well with conventional assessments and with other direct observation instruments as reported in the literature. Effect on learning, and translation to clinical care, are directions for future research.

  13. Designing and Determining Psychometric Properties of the Elder Neglect Checklist

    Directory of Open Access Journals (Sweden)

    Majideh Heravi-Karimooi

    2013-10-01

    Full Text Available Objectives: The purpose of this study was to design and determine the psychometric properties of a checklist for assessing domestic elder neglect. Methods & Materials: This study was conducted in four phases. In the first phase, the meaning of domestic elder neglect explored using the qualitative method of phenomenology. In the second phase, a checklist was created, based on the results obtained in the first phase, in conjunction with the inductions from the expert panel. In the third and fourth phases, the psychometric properties including face validity, content validity, construct validity, convergent validity, internal consistency, and Inter- rater reliability were measured. 110 elderly people participated in the this study. Results: The initial 26 item checklist designed using the results of first and second phases of study, reduced to 11 items and 2 factors including the health and care needs neglect, and neglect in providing healthy environment in the process of determining the face and content validity. Acceptable convergent validity was identified in the elder neglect checklist and care neglect scale of the domestic elder abuse questionnaire (r=0.862. The results of known groups' comparisons showed that this checklist could successfully discriminate between subgroups of elderly people in the index of re-hospitalization. The internal consistency (Kuder-Richardson Formula 20 was 0.824. Inter- rater reliability of the checklist was 0.850. Conclusion: The elder neglect checklist with 11 items appears to be a promising tool, providing reliable and valid data helping to detect neglect among elders in different settings such as clinical settings, homes and research environments by health care providers and researchers.

  14. Validity and reliability of the European portuguese version of neuropsychiatric inventory in an institutionalized sample.

    Science.gov (United States)

    Ferreira, Ana Rita; Martins, Sonia; Ribeiro, Orquidea; Fernandes, Lia

    2015-01-01

    Neuropsychiatric symptoms are very common in dementia and have been associated with patient and caregiver distress, increased risk of institutionalization and higher costs of care. In this context, the neuropsychiatric inventory (NPI) is the most widely used comprehensive tool designed to measure neuropsychiatric Symptoms in geriatric patients with dementia. The aim of this study was to present the validity and reliability of the European Portuguese version of NPI. A cross-sectional study was carried out with a convenience sample of institutionalized patients (≥ 50 years old) in three nursing homes in Portugal. All patients were also assessed with mini-mental state examination (MMSE) (cognition), geriatric depression scale (GDS) (depression) and adults and older adults functional assessment inventory (IAFAI) (functionality). NPI was administered to a formal caregiver, usually from the clinical staff. Inter-rater and test-retest reliability were assessed in a subsample of 25 randomly selected subjects. The sample included 166 elderly, with a mean age of 80.9 (standard deviation: 10.2) years. Three out of the NPI behavioral items had negative correlations with MMSE: delusions (rs = -0.177, P = 0.024), disinhibition (rs = -0.174, P = 0.026) and aberrant motor activity (rs = -0.182, P = 0.020). The NPI subsection of depression/dysphoria correlated positively with GDS total score (rs = 0.166, P = 0.038). NPI showed good internal consistency (overall α = 0.766; frequency α = 0.737; severity α = 0.734). The inter-rater reliability was excellent (intraclass correlation coefficient (ICC): 1.00, 95% confidence interval (CI) 1.00 - 1.00), as well as test-retest reliability (ICC: 0.91, 95% CI 0.80 - 0.96). The results found for convergent validity, inter-rater and test-retest reliability, showed that this version appears to be a valid and reliable instrument for evaluation of neuropsychiatric symptoms in institutionalized elderly.

  15. Reliable and valid assessment of Lichtenstein hernia repair skills.

    Science.gov (United States)

    Carlsen, C G; Lindorff-Larsen, K; Funch-Jensen, P; Lund, L; Charles, P; Konge, L

    2014-08-01

    Lichtenstein hernia repair is a common surgical procedure and one of the first procedures performed by a surgical trainee. However, formal assessment tools developed for this procedure are few and sparsely validated. The aim of this study was to determine the reliability and validity of an assessment tool designed to measure surgical skills in Lichtenstein hernia repair. Key issues were identified through a focus group interview. On this basis, an assessment tool with eight items was designed. Ten surgeons and surgical trainees were video recorded while performing Lichtenstein hernia repair, (four experts, three intermediates, and three novices). The videos were blindly and individually assessed by three raters (surgical consultants) using the assessment tool. Based on these assessments, validity and reliability were explored. The internal consistency of the items was high (Cronbach's alpha = 0.97). The inter-rater reliability was very good with an intra-class correlation coefficient (ICC) = 0.93. Generalizability analysis showed a coefficient above 0.8 even with one rater. The coefficient improved to 0.92 if three raters were used. One-way analysis of variance found a significant difference between the three groups which indicates construct validity, p fashion with the new procedure-specific assessment tool. We recommend this tool for future assessment of trainees performing Lichtenstein hernia repair to ensure that the objectives of competency-based surgical training are met.

  16. Reliability, Dimensionality, and Internal Consistency as Defined by Cronbach: Distinct Albeit Related Concepts

    Science.gov (United States)

    Davenport, Ernest C.; Davison, Mark L.; Liou, Pey-Yan; Love, Quintin U.

    2015-01-01

    This article uses definitions provided by Cronbach in his seminal paper for coefficient a to show the concepts of reliability, dimensionality, and internal consistency are distinct but interrelated. The article begins with a critique of the definition of reliability and then explores mathematical properties of Cronbach's a. Internal consistency…

  17. The Use of Assessment Criteria to Ensure Consistency of Marking: Some Implications for Good Practice.

    Science.gov (United States)

    Saunders, Mark N. K.; Davis, Susan M.

    1998-01-01

    Lecturers at a British university participated in two workshops to examine the consistency of assessments of undergraduates' work. Use of both analytical and global quality measures, when clearly understood by the raters, improved assessment practices. Ongoing discussion of evaluation criteria was recommended. (SK)

  18. Consistency between Overall and Account-level Materiality Measures: An Inter-Industry Comparison and an Analysis of the Correlation with the Financial Ratios System

    OpenAIRE

    N. Pecchiari; G. Pogliani

    2006-01-01

    This study analyses the consistency between overall and account-level materiality measures. The study starts emphasizing the need for further research on planning materiality, considering that prior studies have shown large differences in materiality methods. A review of literature on materiality [Messier et al. (2005)] suggests continuing inter-industry investigations on planning materiality [Wheeler and Pany (1989)]. We have also noticed the absence of research in the area of connection bet...

  19. Evaluation of the FOCUS (Feedback on Counseling Using Simulation) instrument for assessment of client-centered nutrition counseling behaviors.

    Science.gov (United States)

    Henry, Beverly W; Smith, Thomas J

    2010-01-01

    To develop an instrument to assess client-centered counseling behaviors (skills) of student-counselors in a standardized patient (SP) exercise. Descriptive study of the accuracy and utility of a newly developed counseling evaluation instrument. Study participants included 11 female student-counselors at a Midwestern university-10 Caucasian, 1 African-American-for the simulated counseling sessions, in which the Feedback on Counseling Using Simulation (FOCUS) instrument was applied in 2 SP scenarios (cardiovascular disease and diabetes). FOCUS ratings of student-counselors by 4 SPs during 22 sessions were compared with ratings from a 3-member panel of experts who independently viewed the 22 videotaped sessions. Quantitative analysis of instrument validity included inter-rater reliability by computing generalizability coefficients, Pearson correlations, and Spearman rank-order correlations. FOCUS criteria encompassed relevant dimensions of nutrition counseling based in a client-centered perspective. The critical points of information gathering and counseling behaviors showed internal consistency overall and good inter-rater reliability with the cardiovascular disease scenario. For both scenarios, pooled ratings of 3 experts agreed with ratings carried out by SPs. Initial findings suggest that the FOCUS instrument with client-centered criteria may enhance evaluation of counseling skills in SP exercises, meriting further study with larger groups.

  20. An International, Multi-Specialty Validation Study of the IgG4-Related Disease Responder Index.

    Science.gov (United States)

    Wallace, Zachary S; Khosroshahi, Arezou; Carruthers, Mollie D; Perugino, Cory A; Choi, Hyon; Campochiaro, Corrado; Culver, Emma L; Cortazar, Frank; Della-Torre, Emanuel; Ebbo, Mikael; Fernandes, Ana; Frulloni, Luca; Hart, Philip; Karadag, Omer; Kawa, Shigeyuki; Kawano, Mitsuhiro; Kim, Myung-Hwan; Lanzillotta, Marco; Matsui, Shoko; Okazaki, Kazuichi; Ryu, Jay H; Saeki, Takako; Schleinitz, Nicolas; Tanasa, Paula; Umehara, Hisanori; Webster, George; Zhang, Wen; Stone, John H

    2018-02-18

    IgG4-related disease (IgG4-RD) can cause fibro-inflammatory lesions in nearly any organ, leading to organ dysfunction and failure. The IgG4-RD Responder Index (RI) was developed to help investigators assess the efficacy of treatment in a structured manner. We sought to validate the RI in a multi-national investigation. The RI guides investigators through assessments of disease activity and damage in 25 domains, incorporating higher weights for disease manifestations that require treatment urgently or that worsen despite treatment. After a training exercise, investigators reviewed 12 written IgG4-RD vignettes (mean length: 279 words, range: 76-511 words) based upon real patients. Investigators calculated both an RI score as well as a physician global assessment (PGA) for each vignette. Three investigators used the RI on fifteen patients followed over serial visits after treatment. We assessed inter- and intra-rater reliability, precision, validity, and responsiveness. Twenty-six physician-investigators included representatives from 6 specialties and 9 countries. The inter-rater and intra-rater reliabilities of the RI were strong (0.88 and 0.69, respectively) and superior to those of the PGA. Correlations (construct validity) between the RI and PGA were high (Spearman's r=0.9, Preliable disease activity assessment tool that can be used to measure response to therapy. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  1. Implementing standardized, inter-unit communication in an international setting: handoff of patients from emergency medicine to internal medicine.

    Science.gov (United States)

    Balhara, Kamna S; Peterson, Susan M; Elabd, Mohamed Moheb; Regan, Linda; Anton, Xavier; Al-Natour, Basil Ali; Hsieh, Yu-Hsiang; Scheulen, James; Stewart de Ramirez, Sarah A

    2018-04-01

    Standardized handoffs may reduce communication errors, but research on handoff in community and international settings is lacking. Our study at a community hospital in the United Arab Emirates characterizes existing handoff practices for admitted patients from emergency medicine (EM) to internal medicine (IM), develops a standardized handoff tool, and assesses its impact on communication and physician perceptions. EM physicians completed a survey regarding handoff practices and expectations. Trained observers utilized a checklist based on the Systems Engineering Initiative for Patient Safety model to observe 40 handoffs. EM and IM physicians collaboratively developed a written tool encouraging bedside handoff of admitted patients. After the intervention, surveys of EM physicians and 40 observations were subsequently repeated. 77.5% of initial observed handoffs occurred face-to-face, with 42.5% at bedside, and in four different languages. Most survey respondents considered face-to-face handoff ideal. Respondents noted 9-13 patients suffering harm due to handoff in the prior month. After handoff tool implementation, 97.5% of observed handoffs occurred face-to-face (versus 77.5%, p = 0.014), with 82.5% at bedside (versus 42.5%, p face-to-face and bedside handoff, positively impacted workflow, and increased perceptions of safety by EM physicians in an international, non-academic setting. Our three-step approach can be applied towards developing standardized, context-specific inter-specialty handoff in a variety of settings.

  2. The situation analysis of the international relations management and inter-university collaboration in Tabriz University of Medical Sciences, Iran, during the years 2005-2010

    Directory of Open Access Journals (Sweden)

    Alireza Farajollahi

    2013-08-01

    Full Text Available BACKGROUND: Nowadays, with the development of science and communication, collaboration with other countriesand universities seems inevitable to universities. The aim of this study was to analyze the situation of internationalrelations management and inter-university collaboration (IRM-IUC in Tabriz University of Medical Sciences (TUMS,Iran, during the years 2005-2010. METHODS: In this descriptive study, one checklist was used for analysis of the inter-university collaboration management and another one for the situation analysis of international relations management which included 4 sections itself. There were a total of 56 questions designed and developed through literature review and the expert panel.RESULTS: The results indicated the poor performance of Tabriz University of Medical Sciences in the international relations management and inter-university collaboration fields. Most of the reviewed items had not been adequatelypaid attention to in the management of international relations and only one out of 14 evaluated items was considered inthe field of inter-university collaboration. CONCLUSIONS: In line with the overall globalization process, education and research have also become globalizedprocesses, and as a result, it is necessary for universities to develop effective ties and relationships with otherorganizations. However, Tabriz University of Medical Sciences has not been doing quite optimally in this regard. Thus,it is suggested that, based on the shortcomings pointed out in this study, new appropriate plans and policies be set todevelop fruitful and effective relations and correspondences with other universities and countries.

  3. Reliability of Alberta Infant Motor Scale Using Recorded Video Observations Among the Preterm Infants in India: A Reliability Study

    Directory of Open Access Journals (Sweden)

    Veena Kirthika S

    2017-10-01

    Full Text Available Background: Assessment of motor function is a vital characteristic of infant development. Alberta Infant Motor scale (AIMS is considered to be one of the tool available for screening the developmental delays, but this scale was formulated by using western samples. Every country has its own ethnic and cultural background and various differences are observed in the culture and ethnicity. Therefore, there is a need to obtain reliability for the use of AIMS in south Indian population. Purpose: To find the intra-rater and inter-rater reliability of Alberta Infant Motor Scale (AIMS on pre-term infants using the recorded video observations in Indian population. Method: 30 preterm infants in three age groups, 0-3 months (10 infants, 4-7 months (10 infants, 8-18 months (10 infants were recruited for this reliability study. The AIMS was administered to the preterm infants and the performance was videotaped. The performance was then rescored by the same therapist, immediately from the video and on another two consecutive months to estimate intra-rater reliability using ICC (3,1, two-way mixed effects model. For reporting inter-rater reliability, AIMS was scored by three different raters, using ICC (2,k two-way random effects model and by two other therapists to examine the inter and intra-rater reliability. Results: The two-way mixed effects model for intra-rater reliability of AIMS, ICC (3,1 = 0.99 and for reporting inter-rater reliability of AIMS by two-way random effects model, ICC (2,k = 0.96. Conclusion: AIMS has excellent intra and inter-rater reliability using recorded video observations among the preterm infants in India

  4. Histologic processing artifacts and inter-pathologist variation in measurement of inked margins of canine mast cell tumors.

    Science.gov (United States)

    Kiser, Patti K; Löhr, Christiane V; Meritet, Danielle; Spagnoli, Sean T; Milovancev, Milan; Russell, Duncan S

    2018-05-01

    Although quantitative assessment of margins is recommended for describing excision of cutaneous malignancies, there is poor understanding of limitations associated with this technique. We described and quantified histologic artifacts in inked margins and determined the association between artifacts and variance in histologic tumor-free margin (HTFM) measurements based on a novel grading scheme applied to 50 sections of normal canine skin and 56 radial margins taken from 15 different canine mast cell tumors (MCTs). Three broad categories of artifact were 1) tissue deformation at inked edges, 2) ink-associated artifacts, and 3) sectioning-associated artifacts. The most common artifacts in MCT margins were ink-associated artifacts, specifically ink absent from an edge (mean prevalence: 50%) and inappropriate ink coloring (mean: 45%). The prevalence of other artifacts in MCT skin was 4-50%. In MCT margins, frequency-adjusted kappa statistics found fair or better inter-rater reliability for 9 of 10 artifacts; intra-rater reliability was moderate or better in 9 of 10 artifacts. Digital HTFM measurements by 5 blinded pathologists had a median standard deviation (SD) of 1.9 mm (interquartile range: 0.8-3.6 mm; range: 0-6.2 mm). Intraclass correlation coefficients demonstrated good inter-pathologist reliability in HTFM measurement (κ = 0.81). Spearman rank correlation coefficients found negligible correlation between artifacts and HTFM SDs ( r ≤ 0.3). These data confirm that although histologic artifacts commonly occur in inked margin specimens, artifacts are not meaningfully associated with variation in HTFM measurements. Investigators can use the grading scheme presented herein to identify artifacts associated with tissue processing.

  5. Rater Judgment and English Language Speaking Proficiency. Research Report

    Science.gov (United States)

    Chalhoub-Deville, Micheline; Wigglesworth, Gillian

    2005-01-01

    The paper investigates whether there is a shared perception of speaking proficiency among raters from different English speaking countries. More specifically, this study examines whether there is a significant difference among English language learning (ELL) teachers, residing in Australia, Canada, the UK, and the USA when rating speech samples of…

  6. Peer-review for selection of oral presentations for conferences: Are we reliable?

    Science.gov (United States)

    Deveugele, Myriam; Silverman, Jonathan

    2017-11-01

    Although peer-review for journal submission, grant-applications and conference submissions has been called 'a counter- stone of science', and even 'the gold standard for evaluating scientific merit', publications on this topic remain scares. Research that has investigated peer-review reveals several issues and criticisms concerning bias, poor quality review, unreliability and inefficiency. The most important weakness of the peer review process is the inconsistency between reviewers leading to inadequate inter-rater reliability. To report the reliability of ratings for a large international conference and to suggest possible solutions to overcome the problem. In 2016 during the International Conference on Communication in Healthcare, organized by EACH: International Association for Communication in Healthcare, a calibration exercise was proposed and feedback was reported back to the participants of the exercise. Most abstracts, as well as most peer-reviewers, receive and give scores around the median. Contrary to the general assumption that there are high and low scorers, in this group only 3 peer-reviewers could be identified with a high mean, while 7 has a low mean score. Only 2 reviewers gave only high ratings (4 and 5). Of the eight abstracts included in this exercise, only one abstract received a high mean score and one a low mean score. Nevertheless, both these abstracts received both low and high scores; all other abstracts received all possible scores. Peer-review of submissions for conferences are, in accordance with the literature, unreliable. New and creative methods will be needed to give the participants of a conference what they really deserve: a more reliable selection of the best abstracts. More raters per abstract improves the inter-rater reliability; training of reviewers could be helpful; providing feedback to reviewers can lead to less inter-rater disagreement; fostering negative peer-review (rejecting the inappropriate submissions) rather than a

  7. Internal Consistency, Retest Reliability, and their Implications For Personality Scale Validity

    Science.gov (United States)

    McCrae, Robert R.; Kurtz, John E.; Yamagata, Shinji; Terracciano, Antonio

    2010-01-01

    We examined data (N = 34,108) on the differential reliability and validity of facet scales from the NEO Inventories. We evaluated the extent to which (a) psychometric properties of facet scales are generalizable across ages, cultures, and methods of measurement; and (b) validity criteria are associated with different forms of reliability. Composite estimates of facet scale stability, heritability, and cross-observer validity were broadly generalizable. Two estimates of retest reliability were independent predictors of the three validity criteria; none of three estimates of internal consistency was. Available evidence suggests the same pattern of results for other personality inventories. Internal consistency of scales can be useful as a check on data quality, but appears to be of limited utility for evaluating the potential validity of developed scales, and it should not be used as a substitute for retest reliability. Further research on the nature and determinants of retest reliability is needed. PMID:20435807

  8. Rating Written Performance: What Do Raters Do and Why?

    Science.gov (United States)

    Kuiken, Folkert; Vedder, Ineke

    2014-01-01

    This study investigates the relationship in L2 writing between raters' judgments of communicative adequacy and linguistic complexity by means of six-point Likert scales, and general measures of linguistic performance. The participants were 39 learners of Italian and 32 of Dutch, who wrote two short argumentative essays. The same writing tasks…

  9. The Mayo-Portland Participation Index: A brief and psychometrically sound measure of brain injury outcome.

    Science.gov (United States)

    Malec, James F

    2004-12-01

    To evaluate the internal consistency, interrater agreement, concurrent validity, and floor and ceiling effects of the 8-item Participation Index (M2PI) of the Mayo-Portland Adaptability Inventory (MPAI). M2PI data derived from MPAIs completed independently by the people with acquired brain injury undergoing evaluation, their significant others, and rehabilitation staff were submitted to Rasch Facets analysis to determine the internal consistency of each independent rater group and of composite measures that combined rater groups. Correlations with the full-scale MPAI were examined to assess concurrent validity, as was interrater agreement. Outpatient rehabilitation in academic physical medicine and rehabilitation department. People with acquired brain injury (N=134) consecutively seen for evaluation, significant others, and evaluating staff. Not applicable. The MPAI and M2PI. The M2PI showed satisfactory internal consistency, concurrent validity, interrater agreement, and minimal floor and ceiling effects, although evidence of rater bias was also apparent. Composite indices showed more desirable psychometric properties than ratings by individual rater groups. The M2PI, particularly in composite indices and with attention to rater biases, provides an outcome measure with satisfactory psychometric qualities and the potential to represent the varying perspectives of people with acquired brain injury, significant others, and rehabilitation staff.

  10. Measuring participants' immersion in healthcare simulation: the development of an instrument.

    Science.gov (United States)

    Hagiwara, Magnus Andersson; Backlund, Per; Söderholm, Hanna Maurin; Lundberg, Lars; Lebram, Mikael; Engström, Henrik

    2016-01-01

    Immersion is important for simulation-based education; however, questionnaire-based instruments to measure immersion have some limitations. The aim of the present work is to develop a new instrument to measure immersion among participants in healthcare simulation scenarios. The instrument was developed in four phases: trigger identification, content validity scores, inter-rater reliability analysis and comparison with an existing immersion measure instrument. A modified Delphi process was used to develop the instrument and to establish validity and reliability. The expert panel consisted of 10 researchers. All the researchers in the team had previous experience of simulation in the health and/or fire and rescue services as researchers and/or educators and simulation designers. To identify triggers, the panel members independently screened video recordings from simulation scenarios. Here, a trigger is an event in a simulation that is considered a sign of reduced or enhanced immersion among simulation participants. The result consists of the Immersion Score Rating Instrument (ISRI). It contains 10 triggers, of which seven indicate reduced and three enhanced immersion. When using ISRI, a rater identifies trigger occurrences and assigns them strength between 1 and 3. The content validity analysis shows that all the 10 triggers meet an acceptable content validity index for items (I-CVI) standard. The inter-rater reliability (IRR) among raters was assessed using a two-way mixed, consistency, average-measures intra-class correlation (ICC). The ICC for the difference between weighted positive and negative triggers was 0.92, which indicates that the raters are in agreement. Comparison with results from an immersion questionnaire mirrors the ISRI results. In conclusion, we present a novel and non-intrusive instrument for identifying and rating the level of immersion among participants in healthcare simulation scenarios.

  11. A diagnostic test for apraxia in stroke patients: internal consistency and diagnostic value.

    NARCIS (Netherlands)

    Heugten, C.M. van; Dekker, J.; Deelman, B.G.; Stehmann-Saris, F.C.; Kinebanian, A.

    1999-01-01

    The internal consistency and the diagnostic value of a test for apraxia in patients having had a stroke are presented. Results indicate that the items of the test form a strong and consistent scale: Cronbach's alpha as well as the results of a Mokken scale analysis present good reliability and good

  12. Evaluation of the McMahon Competence Assessment Instrument for Use with Midwifery Students During a Simulated Shoulder Dystocia.

    Science.gov (United States)

    McMahon, Erin; Jevitt, Cecilia; Aronson, Barbara

    2018-03-01

    Intrapartum emergencies occur infrequently but require a prompt and competent response from the midwife to prevent morbidity and mortality of the woman, fetus, and newborn. Simulation provides the opportunity for student midwives to develop competence in a safe environment. The purpose of this study was to determine the inter-rater reliability of the McMahon Competence Assessment Instrument (MCAI) for use with student midwives during a simulated shoulder dystocia scenario. A pilot study using a nonprobability convenience sample was used to evaluate the MCAI. Content validity indices were calculated for the individual items and the overall instrument using data from a panel of expert reviewers. Fourteen student midwives consented to be video recorded while participating in a simulated shoulder dystocia scenario. Three faculty raters used the MCAI to evaluate the student performance. These quantitative data were used to determine the inter-rater reliability of the MCAI. The intraclass correlation coefficient (ICC) was used to assess the inter-rater reliability of MCAI scores between 2 or more raters. The ICC was 0.86 (95% confidence interval, 0.60-0.96). Fleiss's kappa was calculated to determine the inter-rater reliability for individual items. Twenty-three of the 42 items corresponded to excellent strength of agreement. This study demonstrates a method to determine the inter-rater reliability of a competence assessment instrument to be used with student midwives. Data produced by this study were used to revise and improve the instrument. Additional research will further document the inter-rater reliability and can be used to determine changes in student competence. Valid and reliable methods of assessment will encourage the use of simulation to efficiently develop the competence of student midwives. © 2018 by the American College of Nurse-Midwives.

  13. Norming a VALUE rubric to assess graduate information literacy skills.

    Science.gov (United States)

    Turbow, David J; Evener, Julie

    2016-07-01

    The study evaluated whether a modified version of the information literacy Valid Assessment of Learning in Undergraduate Education (VALUE) rubric would be useful for assessing the information literacy skills of graduate health sciences students. Through facilitated calibration workshops, an interdepartmental six-person team of librarians and faculty engaged in guided discussion about the meaning of the rubric criteria. They applied the rubric to score student work for a peer-review essay assignment in the "Information Literacy for Evidence-Based Practice" course. To determine inter-rater reliability, the raters participated in a follow-up exercise in which they independently applied the rubric to ten samples of work from a research project in the doctor of physical therapy program: the patient case report assignment. For the peer-review essay, a high level of consistency in scoring was achieved for the second workshop, with statistically significant intra-class correlation coefficients above 0.8 for 3 criteria: "Determine the extent of evidence needed," "Use evidence effectively to accomplish a specific purpose," and "Access the needed evidence." Participants concurred that the essay prompt and rubric criteria adequately discriminated the quality of student work for the peer-review essay assignment. When raters independently scored the patient case report assignment, inter-rater agreement was low and statistically insignificant for all rubric criteria (kappa=-0.16, p>0.05-kappa=0.12, p>0.05). While the peer-review essay assignment lent itself well to rubric calibration, scorers had a difficult time with the patient case report. Lack of familiarity among some raters with the specifics of the patient case report assignment and subject matter might have accounted for low inter-rater reliability. When norming, it is important to hold conversations about search strategies and expectations of performance. Overall, the authors found the rubric to be appropriate for

  14. Contested Norms in Inter-National Encounters: The ‘Turbot War’ as a Prelude to Fairer Fisheries Governance

    Directory of Open Access Journals (Sweden)

    Antje Wiener

    2016-08-01

    Full Text Available This article is about contested norms in inter-national encounters in global fisheries governance. It illustrates how norms work by reconstructing the trajectory of the 1995 ‘Turbot War’ as a series of inter-national encounters among diverse sets of Canadian and European stakeholders. By unpacking the contestations and identifying the norms at stake, it is suggested that what began as action at cross-purposes (i.e. each party referring to a different fundamental norm, ultimately holds the potential for fairer fisheries governance. This finding is shown by linking source and settlement of the dispute and identifying the shared concern for the balance between the right to fish and the responsibility for sustainable fisheries. The article develops a framework to elaborate on procedural details including especially the right for stakeholder access to regular contestation. It is organised in four sections: section 1 summarises the argument, section 2 presents the framework of critical norms research, section 3 reconstructs contestations of fisheries norms over the duration of the dispute, and section 4 elaborates on the dispute as a prelude to fairer fisheries governance. The latter is based on a novel conceptual focus on stakeholder access to contestation at the meso-layer of fisheries governance where organising principles are negotiated close to policy and political processes, respectively. The conclusion suggests for future research to pay more attention to the link between the ‘is’ and the ‘ought’ of norms in critical norms research in International Relations theories (IR.

  15. Qualitative and quantitative assessment of degeneration of cervical intervertebral discs and facet joints.

    Science.gov (United States)

    Walraevens, Joris; Liu, Baoge; Meersschaert, Joke; Demaerel, Philippe; Delye, Hans; Depreitere, Bart; Vander Sloten, Jos; Goffin, Jan

    2009-03-01

    Degeneration of intervertebral discs and facet joints is one of the most frequently encountered spinal disorders. In order to describe and quantify degeneration and evaluate a possible relationship between degeneration and biomechanical parameters, e.g., the intervertebral range of motion and intradiscal pressure, a scoring system for degeneration is mandatory. However, few scoring systems for the assessment of degeneration of the cervical spine exist. Therefore, two separate objective scoring systems to qualitatively and quantitatively assess the degree of cervical intervertebral disc and facet joint degeneration were developed and validated. The scoring system for cervical disc degeneration consists of three variables which are individually scored on neutral lateral radiographs: "height loss" (0-4 points), "anterior osteophytes" (0-3 points) and "endplate sclerosis" (0-2 points). The scoring system for facet joint degeneration consists of four variables which are individually scored on neutral computed tomography scans: "hypertrophy" (0-2 points), "osteophytes" (0-1 point), "irregularity" on the articular surface (0-1 point) and "joint space narrowing" (0-1 point). Each variable contributes with varying importance to the overall degeneration score (max 9 points for the scoring system of cervical disc degeneration and max 5 points for facet joint degeneration). Degeneration of 20 discs and facet joints of 20 patients was blindly assessed by four raters: two neurosurgeons (one senior and one junior) and two radiologists (one senior and one junior), firstly based on first subjective impression and secondly using the scoring systems. Measurement errors and inter- and intra-rater agreement were determined. The measurement error of the scoring system for cervical disc degeneration was 11.1 versus 17.9% of the subjective impression results. This scoring system showed excellent intra-rater agreement (ICC = 0.86, 0.75-0.93) and excellent inter-rater agreement (ICC = 0

  16. Establishment of the reliability and validity of the Stress Index for Children or Adolescents with Tourette Syndrome (SICATS).

    Science.gov (United States)

    Chao, Kuo-Yu; Wang, Huei-Shyong; Chang, Hsueh-Ling; Wang, Yi-Wen; See, Lai-Chu

    2010-02-01

    The aim of this study was to evaluate the validity and reliability of the stress index for 10-18-years-old children or adolescents with Tourette syndrome. Tourette syndrome is a chronic tic disorder, which occurs in childhood. Children with Tourette syndrome exhibit sudden and unexpected voices or movements that may have influence on their daily activities and cause interaction barriers for children with Tourette syndrome. Therefore, a self-report stress index is necessary for children with Tourette syndrome to quickly measure the stress they have. Eight experts rated appropriateness, comprehensiveness and relevance of the questionnaire to establish content validity. A total of 116 paediatric patients filled out the stress index for 10-18-years-old children or adolescents with Tourette syndrome to evaluate its construct validity using exploratory factor analysis and internal consistency. Data from 90 pairs of paediatric patients and their caregivers were used to evaluate the inter-rater reliability. The criterion validity index ranged from 80-98%. One item was deleted because of a small item-to-total correlation. Therefore, 26 items made up the final stress index for 10-18-years-old children or adolescents with Tourette syndrome. In exploratory factor analysis, four factors (unfairly treated, psychological, symptom control and future concern) were achieved and accounted for 52.3% of the total variance. Cronbach's alphas of the stress index for 10-18-years-old children or adolescents with Tourette syndrome were 0.89. The inter-rater reliability of stress Index for 10-18-years-old children or adolescents with Tourette syndrome (Pearson correlation coefficient between patients and their caregivers) was 0.56. The stress Index for 10-18-years-old children or adolescents with Tourette syndrome is a self-administered tool to assess the stress of children or adolescents with Tourette syndrome. Validity (content and construct) and reliability (internal consistency and inter-rater

  17. The development and validation of a custom built device for assessing frontal knee joint laxity.

    Science.gov (United States)

    Ismail, Shiek Abdullah; Simic, Milena; Clarke, Jillian L; Lopes, Thiago Jambo Alves; Pappas, Evangelos

    2017-12-01

    This study reports the development and validation of a quantitative technique of assessing frontal knee joint laxity through a custom built device named KLICP. The objectives of this study were to determine: (i) the intra- and inter-rater reliability and (ii) the validity of the device when compared to real time ultrasound. Twenty-five participants had their frontal knee joint laxity assessed by the KLICP, by manual varus/valgus tests and by ultrasound. Two raters independently assessed laxity manually by three repeated measurements, repeated at least 48h later. Results were validated by comparing them to the medial and lateral joint space opening measured by the ultrasound. Intraclass correlation coefficients and standard error of measurement reliability were calculated. Pearson's correlation coefficients were calculated to determine the correlation between the KLICP and the joint space. Intra-rater reliability (intra-session) for each rater was good on both sessions (0.91-0.98), intra-rater reliability (inter-sessions) was moderate to good (0.62-0.87), and inter-rater reliability (intra-session) was good (0.75-0.80). There is low agreement for intra-rater (inter-session) and for inter-rater (intra-session) reliability. The KLICP measurement has a significant positive fair to moderate correlation to the ultrasound measurement at the left (r: 0.61, p: 0.01) and right (r: 0.48, p: 0.02) knee in the valgus direction and at the left (r: 0.51, p: 0.01) and right (r: 0.39, p: 0.05) knee in the varus direction. There is low agreement between the KLICP and the RTU. Reliability and agreement was good only when measured for intra-rater, within session. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Self- and rater-assessed effectiveness of "thinking-aloud" and "regular" morning report to intensify young physicians' clinical skills.

    Science.gov (United States)

    Hsu, Hui-Chi; Lee, Fa-Yauh; Yang, Ying-Ying; Tsao, Yen-Po; Lee, Wen-Shin; Chuang, Chiao-Lin; Chang, Ching-Chih; Huang, Chia-Chang; Huang, Chin-Chou; Ho, Shung-Tai

    2015-09-01

    This study compared the effects of the "thinking aloud" (TA) morning report (MR), which is characterized by sequential and interactive case discussion by all participants, with "regular" MR for clinical skill training of young physicians. Between February 2011 and February 2014, young physicians [including postgraduate year-1 (PGY1) residents, interns, and clerks) from our hospital were sequentially enrolled and followed for 3 months. The self- and rater-assessed educational values of two MR models for building up clinical skills of young physicians were compared. The junior (intern and clerk) attendees had higher self-assessed educational values scores and reported post-training application frequency of skills trained by TA MR compared with the senior (PGY1 resident) attendees. Higher average and percentage of increased overall rater-assessed OSCE scores were noted among the regular MR senior attendees and TA MR junior attendees than in their corresponding control groups (regular MR junior attendees and TA MR senior attendees). Interestingly, regular MRs provided additional beneficial effects for establishing the "professionalism, consulting skills and organization efficiency" aspects of clinical skills of senior/junior attendees. Moreover, senior and junior attendees benefited the most by participating in seven sessions of regular MR and TA MR each month, respectively. TA MR effectively trains junior attendees in basic clinical skills, whereas regular MR enhances senior attendees' "work reports, professionalism, organizational efficiency, skills in dealing with controversial and professional issues." Undoubtedly, all elements of the two MR models should be integrated together to ensure patient safety and good discipline among young physicians. Copyright © 2015. Published by Elsevier Taiwan.

  19. A method for reducing misclassification in the extended Glasgow Outcome Score.

    Science.gov (United States)

    Lu, Juan; Marmarou, Anthony; Lapane, Kate; Turf, Elizabeth; Wilson, Lindsay

    2010-05-01

    The eight-point extended Glasgow Outcome Scale (GOSE) is commonly used as the primary outcome measure in traumatic brain injury (TBI) clinical trials. The outcome is conventionally collected through a structured interview with the patient alone or together with a caretaker. Despite the fact that using the structured interview questionnaires helps reach agreement in GOSE assessment between raters, significant variation remains among different raters. We introduce an alternate GOSE rating system as an aid in determining GOSE scores, with the objective of reducing inter-rater variation in the primary outcome assessment in TBI trials. Forty-five trauma centers were randomly assigned to three groups to assess GOSE scores on sample cases, using the alternative GOSE rating system coupled with central quality control (Group 1), the alternative system alone (Group 2), or conventional structured interviews (Group 3). The inter-rater variation between an expert and untrained raters was assessed for each group and reported through raw agreement and with weighted kappa (kappa) statistics. Groups 2 and 3 without central review yielded inter-rater agreements of 83% (weighted kappa = 0.81; 95% CI 0.69, 0.92) and 83% (weighted kappa = 0.76, 95% CI 0.63, 0.89), respectively, in GOS scores. In GOSE, the groups had an agreement of 76% (weighted kappa = 0.79; 95% CI 0.69, 0.89), and 63% (weighted kappa = 0.70; 95% CI 0.60, 0.81), respectively. The group using the alternative rating system coupled with central monitoring yielded the highest inter-rater agreement among the three groups in rating GOS (97%; weighted kappa = 0.95; 95% CI 0.89, 1.00), and GOSE (97%; weighted kappa = 0.97; 95% CI 0.91, 1.00). The alternate system is an improved GOSE rating method that reduces inter-rater variations and provides for the first time, source documentation and structured narratives that allow a thorough central review of information. The data suggest that a collective effort can be made to

  20. The internal consistency of the standard gamble: tests after adjusting for prospect theory.

    Science.gov (United States)

    Oliver, Adam

    2003-07-01

    This article reports a study that tests whether the internal consistency of the standard gamble can be improved upon by incorporating loss weighting and probability transformation parameters in the standard gamble valuation procedure. Five alternatives to the standard EU formulation are considered: (1) probability transformation within an EU framework; and, within a prospect theory framework, (2) loss weighting and full probability transformation, (3) no loss weighting and full probability transformation, (4) loss weighting and no probability transformation, and (5) loss weighting and partial probability transformation. Of the five alternatives, only the prospect theory formulation with loss weighting and no probability transformation offers an improvement in internal consistency over the standard EU valuation procedure.

  1. Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

    Science.gov (United States)

    Haberman, Shelby J.

    2011-01-01

    Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

  2. Validity and reliability of using photography for measuring knee range of motion: a methodological study

    Directory of Open Access Journals (Sweden)

    Adie Sam

    2011-04-01

    for flexion compared to extension, and with the Marker compared to the Line of Femur Method. For intra- and inter-rater reliability, the mean differences (within 2 degrees and 95% limits of agreement (within 5 degrees were generally clinically acceptable for both methods. Conclusion Photography potentially offers a superior method of measurement over standard goniometry as visualising the centre of the knee is simplified in a two-dimensional plane and the permanent record provides greater assessor transparency as well as opportunity to confer. The Marker and Line of Femur Methods have moderate to substantial validity, but the inter- and intra-rater repeatability for trained observers are excellent with both methods yielding small mean differences with narrow limits of agreement. The Line of Femur Method offers the added advantage that it does not rely on inter-clinician consistency in identifying the greater trochanter.

  3. Temporal and Geographic variation in the validity and internal consistency of the Nursing Home Resident Assessment Minimum Data Set 2.0.

    Science.gov (United States)

    Mor, Vincent; Intrator, Orna; Unruh, Mark Aaron; Cai, Shubing

    2011-04-15

    Medicare hospital diagnoses and MDS based diagnoses were between .6 and .7 for major diagnoses like CHF, hypertension, diabetes. Internal consistency, as measured by PPV, of the MDS ADL items with other MDS items measuring impairments and symptoms exceeded .9. The Activities of Daily Living (ADL) long form summary scale achieved an alpha inter-consistency level exceeding .85 and multi-item scale alpha levels of .65 were achieved for well being and mood, and .55 for behavior, levels that were sustained even after stratification by ADL and cognition. The Changes in Health, End-stage disease and Symptoms and Signs (CHESS) index, a summary measure of frailty was highly predictive of one year survival. The MDS demonstrates a reasonable level of consistency both in terms of how well MDS diagnoses correspond to hospital discharge diagnoses and in terms of the internal consistency of functioning and behavioral items. The level of alpha reliability and validity demonstrated by the scales suggest that the data can be useful for research and policy analysis. However, while improving, the MDS discharge tracking record should still not be used to indicate Medicare hospitalizations or mortality. It will be important to monitor the performance of the MDS 3.0 with respect to consistency, reliability and validity now that it has replaced version 2.0, using these results as a baseline that should be exceeded.

  4. INTER-EXAMINER VARIABILITY

    African Journals Online (AJOL)

    Objective: To establish whether inter-examiner variability is still a significant factor for the undergraduate orthopaedic clinical ... D. The scores for each student were tabulated and the range, mean, and pass rate determined for each of the examiners. ... has not the heart to reject the man”, consistently gave higher scores (1).

  5. Assessing motivation for work environment improvements: internal consistency, reliability and factorial structure.

    Science.gov (United States)

    Hedlund, Ann; Ateg, Mattias; Andersson, Ing-Marie; Rosén, Gunnar

    2010-04-01

    Workers' motivation to actively take part in improvements to the work environment is assumed to be important for the efficiency of investments for that purpose. That gives rise to the need for a tool to measure this motivation. A questionnaire to measure motivation for improvements to the work environment has been designed. Internal consistency and test-retest reliability of the domains of the questionnaire have been measured, and the factorial structure has been explored, from the answers of 113 employees. The internal consistency is high (0.94), as well as the correlation for the total score (0.84). Three factors are identified accounting for 61.6% of the total variance. The questionnaire can be a useful tool in improving intervention methods. The expectation is that the tool can be useful, particularly with the aim of improving efficiency of companies' investments for work environment improvements. Copyright 2010 Elsevier Ltd. All rights reserved.

  6. Studies on the consistency of internally taken contrast medium for pancreas CT

    Energy Technology Data Exchange (ETDEWEB)

    Matsushima, Kishio; Mimura, Seiichi; Tahara, Seiji; Kitayama, Takuichi; Inamura, Keiji; Mikami, Yasutaka; Hashimoto, Keiji; Hiraki, Yoshio; Aono, Kaname

    1985-02-01

    A problem of Pancreatic CT scanning is the discrimination between the pancreas and the adjacent gastrointestinal tract. Generally we administer a dilution of gastrografin internally to make the discrimination. The degree of dilution has been decided by experience at each hospital. When the consistency of the contrast medium is low in density, an enhancement effect cannot be expected, but when the consistency is high, artifacts appear. We have experimented on the degree of the dilution and CT-No to decide the optimum consistency of gastrografin for the diagnosis of pancreatic disease. Statistical analysis of the results show the optimum dilution of gastrografin to be 1.5%.

  7. A twin study of schizoaffective-mania, schizoaffective-depression, and other psychotic syndromes.

    Science.gov (United States)

    Cardno, Alastair G; Rijsdijk, Frühling V; West, Robert M; Gottesman, Irving I; Craddock, Nick; Murray, Robin M; McGuffin, Peter

    2012-03-01

    The nosological status of schizoaffective disorders remains controversial. Twin studies are potentially valuable for investigating relationships between schizoaffective-mania, schizoaffective-depression, and other psychotic syndromes, but no such study has yet been reported. We ascertained 224 probandwise twin pairs [106 monozygotic (MZ), 118 same-sex dizygotic (DZ)], where probands had psychotic or manic symptoms, from the Maudsley Twin Register in London (1948-1993). We investigated Research Diagnostic Criteria schizoaffective-mania, schizoaffective-depression, schizophrenia, mania and depressive psychosis primarily using a non-hierarchical classification, and additionally using hierarchical and data-derived classifications, and a classification featuring broad schizophrenic and manic syndromes without separate schizoaffective syndromes. We investigated inter-rater reliability and co-occurrence of syndromes within twin probands and twin pairs. The schizoaffective syndromes showed only moderate inter-rater reliability. There was general significant co-occurrence between syndromes within twin probands and MZ pairs, and a trend for schizoaffective-mania and mania to have the greatest co-occurrence. Schizoaffective syndromes in MZ probands were associated with relatively high risk of a psychotic syndrome occurring in their co-twins. The classification of broad schizophrenic and manic syndromes without separate schizoaffective syndromes showed improved inter-rater reliability, but high genetic and environmental correlations between the two broad syndromes. The results are consistent with regarding schizoaffective-mania as due to co-occurring elevated liability to schizophrenia, mania, and depression; and schizoaffective-depression as due to co-occurring elevated liability to schizophrenia and depression, but with less elevation of liability to mania. If in due course schizoaffective syndromes show satisfactory inter-rater reliability and some specific etiological

  8. Internal consistency of the CHAMPS physical activity questionnaire for Spanish speaking older adults.

    Science.gov (United States)

    Rosario, Martín G; Vázquez, Jenniffer M; Cruz, Wanda I; Ortiz, Alexis

    2008-09-01

    The Community Healthy Activities Model Program for Seniors (CHAMPS) is a physical activity monitoring questionnaire for people between 65 to 90 years old. This questionnaire has been previously translated to Spanish to be used in the Latin American population. To adapt the Spanish version of the CHAMPS questionnaire to Puerto Rico and assess its internal consistency. An external review committee adapted the existent Spanish version of the CHAMPS to be used in the Puerto Rican population. Three older adults participated in a second phase with the purpose of training the research team. After the second phase, 35 older adults participated in a third content adaptation phase. During the third phase, the preliminary Spanish version for Puerto Rico of the CHAMPS was given to the 35 participants to assess for clarity, vocabulary and understandability. Interviews to each participant in the third phase were carried out to obtain feedback and create a final Spanish version of the CHAMPS for Puerto Rico. After analyses of this phase, the external review committee prepared a final Spanish version of the CHAMPS for Puerto Rico. The final version was administered to 15 older adults (76 +/- 6.5 years) to assess the internal consistency by using Cronbach's Alpha analysis. The questionnaire showed a strong internal consistency of 0.76. The total time to answer the questionnaire was 17.4 minutes. The Spanish version of the CHAMPS questionnaire for Puerto Rico suggested being an easy to administer and consistent measurement tool to assess physical activity in older adults.

  9. Expert and Naive Raters Using the PAG: Does it Matter?

    Science.gov (United States)

    Cornelius, Edwin T.; And Others

    1984-01-01

    Questions the observed correlation between job experts and naive raters using the Position Analysis Questionnaire (PAQ); and conducts a replication of the Smith and Hakel study (1979) with college students (N=39). Concluded that PAQ ratings from job experts and college students are not equivalent and therefore are not interchangeable. (LLL)

  10. Psychometrics and utility of Psycho-Educational Profile-Revised as a developmental quotient measure among children with the dual disability of intellectual disability and autism.

    Science.gov (United States)

    Alwinesh, Merlin Thanka Jemi; Joseph, Rachel Beulah Jansirani; Daniel, Anna; Abel, Julie Sandra; Shankar, Satya Raj; Mammen, Priya; Russell, Sushila; Russell, Paul Swamidhas Sudhakar

    2012-09-01

    There is no agreement about the measure to quantify the intellectual/developmental level in children with the dual disability of intellectual disability and autism. Therefore, we studied the psychometric properties and utility of Psycho-Educational Profile-Revised (PEP-R) as a developmental test in this population. We identified 116 children with dual disability from the day care and inpatient database of a specialised Autism Clinic. Scale and domain level scores of PEP-R were collected and analyzed. We examined the internal consistency, domain-total correlation of PEP-R and concurrent validity of PEP-R against Gesell's Developmental Schedule, inter-rater and test-retest reliability and utility of PEP-R among children with dual disability in different ages, functional level and severity of autism. Besides the adequate face and content validity, PEP-R demonstrates a good internal consistency (Cronbach's α ranging from 0.91 to 0.93) and domain-total correlation (ranging from 0.75 to 0.90). The inter-rater reliability (intraclass correlation coefficient, ICC = 0.96) and test-retest reliability (ICC = 0.87) for PEP-R is good. There is moderate-to-high concurrent validity with GDS (r ranging from 0.61 to 0.82; all Ps = 0.001). The utility of PEP-R as a developmental measure was good with infants, toddlers, pre-school and primary school children. The ability of PEP-R to measure the developmental age was good, irrespective of the severity of autism but was better with high-functioning children. The PEP-R as an intellectual/developmental test has strong psychometric properties in children with dual disability. It could be used in children with different age groups and severity of autism. PEP-R should be used with caution as a developmental test in children with dual disability who are low functioning.

  11. Assessing the competences associated with a nursing Bachelor thesis by means of rubrics.

    Science.gov (United States)

    Llaurado-Serra, M; Rodríguez, E; Gallart, A; Fuster, P; Monforte-Royo, C; De Juan, M Á

    2018-07-01

    Writing a Bachelor thesis is the last step in obtaining a university degree. The thesis may be job- or research-orientated, but it must demonstrate certain degree-level competences. Rubrics are a useful way of unifying the assessment criteria. To design a system of rubrics for assessing the competences associated with the Bachelor thesis of a nursing degree, to examine the system's reliability and validity and to analyse results in relation to the final thesis mark. Cross-sectional and psychometric study conducted between 2012 and 2014. Nursing degree at a Spanish university. Twelve tutors who designed the system of rubrics. Students (n = 76) who wrote their Bachelor thesis during the 2013-2014 academic year. After deciding which aspects would be assessed, who would assess them and when, the tutors developed seven rubrics (drafting process, assessment of the written thesis by the supervisor and by a panel, student self-assessment, peer assessment, tutor evaluation of the peer assessment and panel assessment of the viva). We analysed the reliability (inter-rater and internal consistency) and validity (convergent and discriminant) of the rubrics, and also the relationship between the competences assessed and the final thesis mark. All the rubrics had internal consistency coefficients >0.80. The rubric for oral communication skills (viva) yielded inter-rater reliability of 0.95. Factor analysis indicated a unidimensional structure for all but one of the rubrics, the exception being the rubric for peer assessment, which had a two-factor structure. The main competences associated with a good quality Bachelor thesis were written communication skills and the ability to work independently. The assessment system based on seven rubrics is shown to be valid and reliable. Writing a Bachelor thesis requires a range of degree-level competences and it offers nursing students the opportunity to develop their evidence-based practice skills. Copyright © 2018 Elsevier Ltd. All

  12. Reliability and validity of the Wolfram Unified Rating Scale (WURS

    Directory of Open Access Journals (Sweden)

    Nguyen Chau

    2012-11-01

    Full Text Available Abstract Background Wolfram syndrome (WFS is a rare, neurodegenerative disease that typically presents with childhood onset insulin dependent diabetes mellitus, followed by optic atrophy, diabetes insipidus, deafness, and neurological and psychiatric dysfunction. There is no cure for the disease, but recent advances in research have improved understanding of the disease course. Measuring disease severity and progression with reliable and validated tools is a prerequisite for clinical trials of any new intervention for neurodegenerative conditions. To this end, we developed the Wolfram Unified Rating Scale (WURS to measure the severity and individual variability of WFS symptoms. The aim of this study is to develop and test the reliability and validity of the Wolfram Unified Rating Scale (WURS. Methods A rating scale of disease severity in WFS was developed by modifying a standardized assessment for another neurodegenerative condition (Batten disease. WFS experts scored the representativeness of WURS items for the disease. The WURS was administered to 13 individuals with WFS (6-25 years of age. Motor, balance, mood and quality of life were also evaluated with standard instruments. Inter-rater reliability, internal consistency reliability, concurrent, predictive and content validity of the WURS were calculated. Results The WURS had high inter-rater reliability (ICCs>.93, moderate to high internal consistency reliability (Cronbach’s α = 0.78-0.91 and demonstrated good concurrent and predictive validity. There were significant correlations between the WURS Physical Assessment and motor and balance tests (rs>.67, ps>.76, ps=-.86, p=.001. The WURS demonstrated acceptable content validity (Scale-Content Validity Index=0.83. Conclusions These preliminary findings demonstrate that the WURS has acceptable reliability and validity and captures individual differences in disease severity in children and young adults with WFS.

  13. How consistent are lordosis, range of movement and lumbo-pelvic rhythm in people with and without back pain?

    DEFF Research Database (Denmark)

    Laird, Robert A; Kent, Peter; Keating, Jennifer L

    2016-01-01

    with and without chronic LBP (>12 week's duration). METHODS: Wireless, wearable, inertial measurement units measured lumbar lordosis angle, range of movement (ROM) and lumbo-pelvic rhythm in adults (n = 63). Measurements were taken on three separate occasions: two tests on the same day with different raters...... participants with and without LBP for lordosis angle. There were significant differences for pelvic flexion ROM (LBP 60.8°, NoLBP 54.8°, F(1,63) = 4.31, p = 0.04), lumbar right lateral flexion ROM (LBP 22.2°, NoLBP 24.6° F(1,63) = 4.48, p = .04), trunk right lateral flexion ROM (LBP 28.4°, NoLBP 31.7°, F(1......,63) = 5.9, p = .02) and lumbar contribution to lumbo-pelvic rhythm in the LBP group (LBP 45.8 %, F(1,63) = 4.20, NoLBP 51.3 % p = .044). MDC90 estimates for intra and inter-rater comparisons were 10°-15° for lumbar lordosis, and 5°-15° for most ROM. For lumbo-pelvic rhythm, we found 8-15 % variation...

  14. Disability profile/clinician-rated: validity for Brazilian university students with social anxiety disorder.

    Science.gov (United States)

    Vaccaro de Morais Abumusse, Luciene; Osório, Flávia L; Crippa, José Alexandre S; Loureiro, Sonia Regina

    2013-01-01

    Functional impairment scales are important to assess Social Anxiety Disorder (SAD) patients. The present study aims to evaluate the reliability, internal consistency, validity and factorial structure of the Disability Profile/Clinician-Rate (DP) scale, as well as to present an interview-guide to support its application by clinicians. University students (n = 173) of both genders participated in the study (SAD = 84 and Non-SAD = 89), with ages ranging between 17 and 35 years, systematically diagnosed. The SAD group presented more difficulties when compared to the Non-SAD group. The DP presented, for the SAD group, internal consistency of 0.68 (lifetime) and 0.67 (last two weeks). Inter-rater reliability varied from 0.75 to 0.93. Two factors were extracted and the correlation among such factors and the Social Phobia Inventory subscales presented association between fear and avoidance symptoms and the functional impairments. The scale presents good psychometric properties and can contribute to the assessment of functional impairments.

  15. The reliability, validity, and applicability of an English language version of the Mini-ICF-APP.

    Science.gov (United States)

    Molodynski, Andrew; Linden, Michael; Juckel, George; Yeeles, Ksenija; Anderson, Catriona; Vazquez-Montes, Maria; Burns, Tom

    2013-08-01

    This study aimed at establishing the validity and reliability of an English language version of the Mini-ICF-APP. One hundred and five patients under the care of secondary mental health care services were assessed using the Mini-ICF-APP and several well-established measures of functioning and symptom severity. 47 (45 %) patients were interviewed on two occasions to ascertain test-retest reliability and 50 (48 %) were interviewed by two researchers simultaneously to determine the instrument's inter-rater reliability. Occupational and sick leave status were also recorded to assess construct validity. The Mini-ICF-APP was found to have substantial internal consistency (Chronbach's α 0.869-0.912) and all 13 items correlated highly with the total score. Analysis also showed that the Mini-ICF-APP had good test-retest (ICC 0.832) and inter-rater (ICC 0.886) reliability. No statistically significant association with length of sick leave was found, but the unemployed scored higher on the Mini ICF-APP than those in employment (mean 18.4, SD 9.1 vs. 9.4, SD 6.4, p Mini-ICF-APP correlated highly with the other measures of illness severity and functioning considered in the study. The English version of the Mini-ICF-APP is a reliable and valid measure of disorders of capacity as defined by the International Classification of Functioning. Further work is necessary to establish whether the scale could be divided into sub scales which would allow the instrument to more sensitively measure an individual's specific impairments.

  16. The TiltMeter app is a novel and accurate measurement tool for the weight bearing lunge test.

    Science.gov (United States)

    Williams, Cylie M; Caserta, Antoni J; Haines, Terry P

    2013-09-01

    The weight bearing lunge test is increasing being used by health care clinicians who treat lower limb and foot pathology. This measure is commonly established accurately and reliably with the use of expensive equipment. This study aims to compare the digital inclinometer with a free app, TiltMeter on an Apple iPhone. This was an intra-rater and inter-rater reliability study. Two raters (novice and experienced) conducted the measurements in both a bent knee and straight leg position to determine the intra-rater and inter-rater reliability. Concurrent validity was also established. Allied health practitioners were recruited as participants from the workplace. A preconditioning stretch was conducted and the ankle range of motion was established with the weight bearing lunge test position with firstly the leg straight and secondly with the knee bent. The measurement device and each participant were randomised during measurement. The intra-rater reliability and inter-rater reliability for the devices and in both positions were all over ICC 0.8 except for one intra-rater measure (Digital inclinometer, novice, ICC 0.65). The inter-rater reliability between the digital inclinometer and the tilmeter was near perfect, ICC 0.96 (CI: 0.898-0.983); Concurrent validity ICC between the two devices was 0.83 (CI: -0.740 to 0.445). The use of the Tiltmeter app on the iPhone is a reliable and inexpensive tool to measure the available ankle range of motion. Health practitioners should use caution in applying these findings to other smart phone equipment if surface areas are not comparable. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  17. An inter-laboratory comparison of urinary 3-hydroxypropylmercapturic acid measurement demonstrates good reproducibility between laboratories

    Directory of Open Access Journals (Sweden)

    Bailey Brian

    2011-10-01

    Full Text Available Abstract Background Biomarkers have been used extensively in clinical studies to assess toxicant exposure in smokers and non-smokers and have recently been used in the evaluation of novel tobacco products. The urinary metabolite 3-HPMA, a metabolite of the major tobacco smoke toxicity contributor acrolein, is one example of a biomarker used to measure exposure to tobacco smoke. A number of laboratories have developed liquid chromatography with tandem mass spectrometry (LC-MS/MS based methods to measure urinary 3-HPMA; however, it is unclear to what extent the data obtained by these different laboratories are comparable. Findings This report describes an inter-laboratory comparison carried out to evaluate the comparability of 3-HPMA measurement between four laboratories. A common set of spiked and authentic smoker and non-smoker urine samples were used. Each laboratory used their in-house LC-MS/MS method and a common internal standard. A comparison of the repeatability ('r', reproducibility ('R', and coefficient of variation for 3-HPMA demonstrated that within-laboratory variation was consistently lower than between-laboratory variation. The average inter-laboratory coefficient of variation was 7% for fortified urine samples and 16.2% for authentic urine samples. Together, this represents an inter-laboratory variation of 12.2%. Conclusion The results from this first inter-laboratory comparison for the measurement of 3-HPMA in urine demonstrate a reasonably good consensus between laboratories. However, some consistent measurement biases were still observed between laboratories, suggesting that additional work may be required to further reduce the inter-laboratory coefficient of variation.

  18. Validation of the prosthetic esthetic index

    DEFF Research Database (Denmark)

    Özhayat, Esben B; Dannemand, Katrine

    2014-01-01

    OBJECTIVES: In order to diagnose impaired esthetics and evaluate treatments for these, it is crucial to evaluate all aspects of oral and prosthetic esthetics. No professionally administered index currently exists that sufficiently encompasses comprehensive prosthetic esthetics. This study aimed...... to validate a new comprehensive index, the Prosthetic Esthetic Index (PEI), for professional evaluation of esthetics in prosthodontic patients. MATERIAL AND METHODS: The content, criterion, and construct validity; the test-retest, inter-rater, and internal consistency reliability; and the sensitivity...... furthermore distinguish between participants and controls, indicating sufficient sensitivity. CONCLUSION: The PEI is considered a valid and reliable instrument involving sufficient aspects for assessment of the professionally evaluated esthetics in prosthodontic patients. CLINICAL RELEVANCE...

  19. Efficacy of the ADEC in Identifying Autism Spectrum Disorder in Clinically Referred Toddlers in the US.

    Science.gov (United States)

    Hedley, Darren; Nevill, Rose E; Monroy-Moreno, Yessica; Fields, Natalie; Wilkins, Jonathan; Butter, Eric; Mulick, James A

    2015-08-01

    The Autism Detection in Early Childhood (ADEC) is a brief, play-based screening tool for the assessment of autism spectrum disorder (ASD) in children aged 12-36 months. We examined the psychometric properties of the ADEC in a clinical sample of toddlers (n = 114) referred to a US pediatric hospital for assessment due to concerns of developmental delay or ASD. The ADEC (cutoff = 11) returned good sensitivity (.93-.94) but poorer specificity (.62-.64) for best estimate clinical diagnosis of ASD, and compared favorably with the ADOS-2. Internal consistency was acceptable, α = .80, and inter-rater reliability was high, ICC = .95. Results support the use of the ADEC as a clinical screen for ASD.

  20. Clinical global impression of cognition in schizophrenia (CGI-CogS): reliability and validity of a co-primary measure of cognition.

    Science.gov (United States)

    Ventura, Joseph; Cienfuegos, Angel; Boxer, Oren; Bilder, Robert

    2008-11-01

    Cognitive deficits are core features of schizophrenia that have been associated reliably with functional outcomes and now are a focus of treatment research. New rating scales are needed to complement current psychometric testing procedures, both to enable wider clinical use, and to serve as endpoints in clinical trials. Subjects were 35 schizophrenia patient-and-caregiver pairs recruited from the UCLA and West Los Angeles VA Outpatient Psychiatry Departments. Participants were assessed with the Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS), an interview-based rating scale of cognitive functioning, on 3 occasions (baseline, 1 month, and 3 months). A computerized neurocognitive battery (Cogtest), an assessment of functioning, and symptom measures were administered at two occasions (baseline and one month). The CGI-CogS ratings generally showed a high level of internal consistency (Cronbach's alpha=.69 to .96), adequate levels of inter-rater reliability (ICC's=.71 to .80), and high test-retest stability (ICC's=.92 to .95). Correlations of caregiver and rater global (but not "patient only rating") CGI-CogS ratings with neurocognitive performance were in the moderate range (r's=-.27 to -.48), while most of the correlations with functional outcome were moderate to high (r's=-.41 to -.72). In fact, the CGI-CogS ratings were significantly more correlated with Social Functioning than were objective neurocognitive test scores (p=.02) and showed a trend in the same direction for predicting Instrumental Functioning (p=.06). We found moderate correlations between CGI-CogS global ratings and PANSS positive (r's=.36 to .49) and SANS negative symptoms (r=.41 to .61), but not with BPRS depression (r's=.11 to .13). An interview-based measure of cognition demonstrated high internal consistency, good inter-rater reliability, and high test-retest reliability. Caregiver ratings appear to add important clinical information over patient-only ratings. The CGI

  1. Reliability of capturing foot parameters using digital scanning and the neutral suspension casting technique

    Science.gov (United States)

    2011-01-01

    Background A clinical study was conducted to determine the intra and inter-rater reliability of digital scanning and the neutral suspension casting technique to measure six foot parameters. The neutral suspension casting technique is a commonly utilised method for obtaining a negative impression of the foot prior to orthotic fabrication. Digital scanning offers an alternative to the traditional plaster of Paris techniques. Methods Twenty one healthy participants volunteered to take part in the study. Six casts and six digital scans were obtained from each participant by two raters of differing clinical experience. The foot parameters chosen for investigation were cast length (mm), forefoot width (mm), rearfoot width (mm), medial arch height (mm), lateral arch height (mm) and forefoot to rearfoot alignment (degrees). Intraclass correlation coefficients (ICC) with 95% confidence intervals (CI) were calculated to determine the intra and inter-rater reliability. Measurement error was assessed through the calculation of the standard error of the measurement (SEM) and smallest real difference (SRD). Results ICC values for all foot parameters using digital scanning ranged between 0.81-0.99 for both intra and inter-rater reliability. For neutral suspension casting technique inter-rater reliability values ranged from 0.57-0.99 and intra-rater reliability values ranging from 0.36-0.99 for rater 1 and 0.49-0.99 for rater 2. Conclusions The findings of this study indicate that digital scanning is a reliable technique, irrespective of clinical experience, with reduced measurement variability in all foot parameters investigated when compared to neutral suspension casting. PMID:21375757

  2. Inter-dot coupling effects on transport through correlated parallel

    Indian Academy of Sciences (India)

    Transport through symmetric parallel coupled quantum dot system has been studied, using non-equilibrium Green function formalism. The inter-dot tunnelling with on-dot and inter-dot Coulomb repulsion is included. The transmission coefficient and Landaur–Buttiker like current formula are shown in terms of internal states ...

  3. Deriving Oral Assessment Scales across Different Tests and Rater Groups.

    Science.gov (United States)

    Chalhoub-Deville, Micheline

    1995-01-01

    The purpose of this study was to derive the criteria/dimensions underlying learners' second-language oral ability scores across three tests: an oral interview, a narration, and a read-aloud. A stimulus tape of 18 speech samples was presented to 3 native speaker rater groups for evaluation. Results indicate that researchers might need to reconsider…

  4. Feasibility and reliability of a newly developed antenatal risk score card in routine care

    NARCIS (Netherlands)

    E. Birnie; E.A.P. Steegers; Drs. H.W. Torij; M.J. Veen; J. Poeran; G.J. Bonsel

    2015-01-01

    A population-based cross-sectional study (feasibility) and a cohort study (inter-rater reliability) to study in routine care the feasibility and inter-rater reliability of the Rotterdam Reproductive Risk Reduction risk score card (R4U), a new semi-quantitative score card for use during the antenatal

  5. Development and validation of the ASPIRE-VA coaching fidelity checklist (ACFC): a tool to help ensure delivery of high-quality weight management interventions.

    Science.gov (United States)

    Damschroder, Laura J; Goodrich, David E; Kim, Hyungjin Myra; Holleman, Robert; Gillon, Leah; Kirsh, Susan; Richardson, Caroline R; Lutes, Lesley D

    2016-09-01

    Practical and valid instruments are needed to assess fidelity of coaching for weight loss. The purpose of this study was to develop and validate the ASPIRE Coaching Fidelity Checklist (ACFC). Classical test theory guided ACFC development. Principal component analyses were used to determine item groupings. Psychometric properties, internal consistency, and inter-rater reliability were evaluated for each subscale. Criterion validity was tested by predicting weight loss as a function of coaching fidelity. The final 19-item ACFC consists of two domains (session process and session structure) and five subscales (sets goals and monitor progress, assess and personalize self-regulatory content, manages the session, creates a supportive and empathetic climate, and stays on track). Four of five subscales showed high internal consistency (Cronbach alphas > 0.70) for group-based coaching; only two of five subscales had high internal reliability for phone-based coaching. All five sub-scales were positively and significantly associated with weight loss for group- but not for phone-based coaching. The ACFC is a reliable and valid instrument that can be used to assess fidelity and guide skill-building for weight management interventionists.

  6. Development of the “Highly Sensitive Dog” questionnaire to evaluate the personality dimension “Sensory Processing Sensitivity” in dogs

    Science.gov (United States)

    Asher, Lucy; Furrer, Sibylle; Lechner, Isabel; Würbel, Hanno; Melotti, Luca

    2017-01-01

    In humans, the personality dimension ‘sensory processing sensitivity (SPS)’, also referred to as “high sensitivity”, involves deeper processing of sensory information, which can be associated with physiological and behavioral overarousal. However, it has not been studied up to now whether this dimension also exists in other species. SPS can influence how people perceive the environment and how this affects them, thus a similar dimension in animals would be highly relevant with respect to animal welfare. We therefore explored whether SPS translates to dogs, one of the primary model species in personality research. A 32-item questionnaire to assess the “highly sensitive dog score” (HSD-s) was developed based on the “highly sensitive person” (HSP) questionnaire. A large-scale, international online survey was conducted, including the HSD questionnaire, as well as questions on fearfulness, neuroticism, “demographic” (e.g. dog sex, age, weight; age at adoption, etc.) and “human” factors (e.g. owner age, sex, profession, communication style, etc.), and the HSP questionnaire. Data were analyzed using linear mixed effect models with forward stepwise selection to test prediction of HSD-s by the above-mentioned factors, with country of residence and dog breed treated as random effects. A total of 3647 questionnaires were fully completed. HSD-, fearfulness, neuroticism and HSP-scores showed good internal consistencies, and HSD-s only moderately correlated with fearfulness and neuroticism scores, paralleling previous findings in humans. Intra- (N = 447) and inter-rater (N = 120) reliabilities were good. Demographic and human factors, including HSP score, explained only a small amount of the variance of HSD-s. A PCA analysis identified three subtraits of SPS, comparable to human findings. Overall, the measured personality dimension in dogs showed good internal consistency, partial independence from fearfulness and neuroticism, and good intra- and inter-rater

  7. Development of the "Highly Sensitive Dog" questionnaire to evaluate the personality dimension "Sensory Processing Sensitivity" in dogs.

    Directory of Open Access Journals (Sweden)

    Maya Braem

    Full Text Available In humans, the personality dimension 'sensory processing sensitivity (SPS', also referred to as "high sensitivity", involves deeper processing of sensory information, which can be associated with physiological and behavioral overarousal. However, it has not been studied up to now whether this dimension also exists in other species. SPS can influence how people perceive the environment and how this affects them, thus a similar dimension in animals would be highly relevant with respect to animal welfare. We therefore explored whether SPS translates to dogs, one of the primary model species in personality research. A 32-item questionnaire to assess the "highly sensitive dog score" (HSD-s was developed based on the "highly sensitive person" (HSP questionnaire. A large-scale, international online survey was conducted, including the HSD questionnaire, as well as questions on fearfulness, neuroticism, "demographic" (e.g. dog sex, age, weight; age at adoption, etc. and "human" factors (e.g. owner age, sex, profession, communication style, etc., and the HSP questionnaire. Data were analyzed using linear mixed effect models with forward stepwise selection to test prediction of HSD-s by the above-mentioned factors, with country of residence and dog breed treated as random effects. A total of 3647 questionnaires were fully completed. HSD-, fearfulness, neuroticism and HSP-scores showed good internal consistencies, and HSD-s only moderately correlated with fearfulness and neuroticism scores, paralleling previous findings in humans. Intra- (N = 447 and inter-rater (N = 120 reliabilities were good. Demographic and human factors, including HSP score, explained only a small amount of the variance of HSD-s. A PCA analysis identified three subtraits of SPS, comparable to human findings. Overall, the measured personality dimension in dogs showed good internal consistency, partial independence from fearfulness and neuroticism, and good intra- and inter-rater

  8. The law of international organisations

    CERN Document Server

    White, Nigel D

    2017-01-01

    This book provides a concise account of the principles and norms of international law applicable to the main-type of international organisation - the inter-governmental organisation (IGO). That law consists of principles and rules found in the founding documents of IGOs along with applicable principles and rules of international law. The book also identifies and analyses the law produced by IGOs, applied by them and, occasionally, enforced by them. There is a concentration upon the United Nations, as the paradigmatic IGO, not only upon the UN organisation headquartered in New York, but on other IGOs in the UN system (the specialised agencies such as the World Health Organisation).

  9. Expanding the Reach of Participatory Risk Management: Testing an Online Decision-Aiding Framework for Informing Internally Consistent Choices.

    Science.gov (United States)

    Bessette, Douglas L; Campbell-Arvai, Victoria; Arvai, Joseph

    2016-05-01

    This article presents research aimed at developing and testing an online, multistakeholder decision-aiding framework for informing multiattribute risk management choices associated with energy development and climate change. The framework was designed to provide necessary background information and facilitate internally consistent choices, or choices that are in line with users' prioritized objectives. In order to test different components of the decision-aiding framework, a six-part, 2 × 2 × 2 factorial experiment was conducted, yielding eight treatment scenarios. The three factors included: (1) whether or not users could construct their own alternatives; (2) the level of detail regarding the composition of alternatives users would evaluate; and (3) the way in which a final choice between users' own constructed (or highest-ranked) portfolio and an internally consistent portfolio was presented. Participants' self-reports revealed the framework was easy to use and providing an opportunity to develop one's own risk-management alternatives (Factor 1) led to the highest knowledge gains. Empirical measures showed the internal consistency of users' decisions across all treatments to be lower than expected and confirmed that providing information about alternatives' composition (Factor 2) resulted in the least internally consistent choices. At the same time, those users who did not develop their own alternatives and were not shown detailed information about the composition of alternatives believed their choices to be the most internally consistent. These results raise concerns about how the amount of information provided and the ability to construct alternatives may inversely affect users' real and perceived internal consistency. © 2015 Society for Risk Analysis.

  10. Inter-Parental Conflict and Children's Academic Attainment: A Longitudinal Analysis

    Science.gov (United States)

    Harold, Gordon T.; Aitken, Jessica J.; Shelton, Katherine H.

    2007-01-01

    Background: Previous research suggests a link between inter-parental conflict and children's psychological development. Most studies, however, have tended to focus on two broad indices of children's psychological adaptation (internalizing symptoms and externalizing problems) in considering the effects of inter-parental conflict on children's…

  11. A study of the reliability of the Nociception Coma Scale.

    Science.gov (United States)

    Riganello, F; Cortese, M D; Arcuri, F; Candelieri, A; Guglielmino, F; Dolce, G; Sannita, W G; Schnakers, C

    2015-04-01

    In this study, we investigated the reliability of the Nociception Coma Scale which has recently been developed to assess nociception in non-communicative, severely brain-injured patients. Prospective cross-sequential study. Semi-intensive care unit and long-term brain injury care. Forty-four patients diagnosed as being in a vegetative state (n=26) or in a minimally conscious state (n=18). Patients were assessed by two experts (rater A and rater B) on two consecutive weeks to measure inter-rater agreement and test-retest reliability. Total scores and subscores of the Nociception Coma Scale. We performed a total of 176 assessments. The inter-rater agreement was moderate for the total scores (k = 0.57) and fair to substantial for the subscores (0.33 ≤ k ≤ 0.62) on week 2. The test-retest reliability was substantial for the total scores (k = 0.66) and moderate to almost perfect for the subscores (0.53 ≤ k ≤ 0.96) for rater A. The inter-rater agreement was weaker on week 1, whereas the test-retest reliability was lower for the least experienced rater (rater B). This study provides further evidence of the psychometric qualities of the Nociception Coma Scale. Future studies should assess the impact of practical experience and background on administration and scoring of the scale. © The Author(s) 2014.

  12. Strength and Pain Threshold Handheld Dynamometry Test Reliability in Patellofemoral Pain.

    Science.gov (United States)

    van der Heijden, R A; Vollebregt, T; Bierma-Zeinstra, S M A; van Middelkoop, M

    2015-12-01

    Patellofemoral pain syndrome (PFPS), characterized by peri- and retropatellar pain, is a common disorder in young, active people. The etiology is unclear; however, quadriceps strength seems to be a contributing factor, and sensitization might play a role. The study purpose is determining the inter-rater reliability of handheld dynamometry to test both quadriceps strength and pressure pain threshold (PPT), a measure for sensitization, in patients with PFPS. This cross-sectional case-control study comprises 3 quadriceps strength and one PPT measurements performed by 2 independent investigators in 22 PFPS patients and 16 matched controls. Inter-rater reliability was analyzed using intraclass correlation coefficients (ICC) and Bland-Altman plots. Inter-rater reliability of quadriceps strength testing was fair to good in PFPS patients (ICC=0.72) and controls (ICC=0.63). Bland-Altman plots showed an increased difference between assessors when average quadriceps strength values exceeded 250 N. Inter-rater reliability of PPT was excellent in patients (ICC=0.79) and fair to good in controls (ICC=0.52). Handheld dynamometry seems to be a reliable method to test both quadriceps strength and PPT in PFPS patients. Inter-rater reliability was higher in PFPS patients compared to control subjects. With regard to quadriceps testing, a higher variance between assessors occurs when quadriceps strength increases. © Georg Thieme Verlag KG Stuttgart · New York.

  13. Modified Ashworth scale and spasm frequency score in spinal cord injury

    DEFF Research Database (Denmark)

    Baunsgaard, C. B.; Nissen, U. V.; Christensen, K. B.

    2016-01-01

    .94 and inter-rater κweighted=0.93. Correlation between MAS and SFS showed non-significant correlation coefficients from-0.11 to 0.90. CONCLUSION: Reliability of MAS is highly affected by the weighting scheme. With a weighted-κ it was overall reliable and simple-κ overall unreliability. Repeated tests should......STUDY DESIGN: Intra- and inter-rater reliability study. OBJECTIVES: To assess intra- and inter-rater reliability of the Modified Ashworth Scale (MAS) and Spasm Frequency Score (SFS) in lower extremities in a population of spinal cord-injured persons, as well as correlations between the two scales....... SETTING: Clinic for Spinal Cord Injuries, Rigshospitalet, Hornbaek, Denmark. METHODS: Thirty-one persons participated in the study and were tested four times in total with MAS and SFS by three experienced raters. Cohen's kappa (κ), simple and quadratic weighted (nominal and ordinal scale level...

  14. P1 interneurons promote a persistent internal state that enhances inter-male aggression in Drosophila

    Science.gov (United States)

    Hoopfer, Eric D; Jung, Yonil; Inagaki, Hidehiko K; Rubin, Gerald M; Anderson, David J

    2015-01-01

    How brains are hardwired to produce aggressive behavior, and how aggression circuits are related to those that mediate courtship, is not well understood. A large-scale screen for aggression-promoting neurons in Drosophila identified several independent hits that enhanced both inter-male aggression and courtship. Genetic intersections revealed that 8-10 P1 interneurons, previously thought to exclusively control male courtship, were sufficient to promote fighting. Optogenetic experiments indicated that P1 activation could promote aggression at a threshold below that required for wing extension. P1 activation in the absence of wing extension triggered persistent aggression via an internal state that could endure for minutes. High-frequency P1 activation promoted wing extension and suppressed aggression during photostimulation, whereas aggression resumed and wing extension was inhibited following photostimulation offset. Thus, P1 neuron activation promotes a latent, internal state that facilitates aggression and courtship, and controls the overt expression of these social behaviors in a threshold-dependent, inverse manner. DOI: http://dx.doi.org/10.7554/eLife.11346.001 PMID:26714106

  15. Markup of temporal information in electronic health records.

    Science.gov (United States)

    Hyun, Sookyung; Bakken, Suzanne; Johnson, Stephen B

    2006-01-01

    Temporal information plays a critical role in the understanding of clinical narrative (i.e., free text). We developed a representation for marking up temporal information in a narrative, consisting of five elements: 1) reference point, 2) direction, 3) number, 4) time unit, and 5) pattern. We identified 254 temporal expressions from 50 discharge summaries and represented them using our scheme. The overall inter-rater reliability among raters applying the representation model was 75 percent agreement. The model can contribute to temporal reasoning in computer systems for decision support, data mining, and process and outcomes analyses by providing structured temporal information.

  16. Inter-American Institute Data and Information System

    Directory of Open Access Journals (Sweden)

    Luís Marcelo Achite

    2014-06-01

    Full Text Available The Inter-American Institute for Global Change Research (IAI is an international institution supported by 19 countries in the Americas dedicated to foster scientific research, international collaboration and creation of networks and full and open exchange of scientific information. In general terms, the institute was conceived because of the need for an international non-governmental and non-profit institution whose main objective would be to support scientific development in the Americas, f...

  17. Large-Scale Processes Associated with Inter-Decadal and Inter-Annual Early Spring Rainfall Variability in Taiwan

    Directory of Open Access Journals (Sweden)

    Jau-Ming Chen

    2016-02-01

    Full Text Available Early spring (March - April rainfall in Taiwan exhibits evident and distinct inter-annual and inter-decadal variability. The inter-annual varibility has a positive correlation with the El Niño/Southern Oscillation while the inter-decadal variability features a phase change beginning in the late 1970s, coherent with the major phase change in the Pacific decadal oscillation. Rainfall variability in both timescales is regulated by large-scale processes showing consistent dynamic features. Rainfall increases are associated with positive sea surface temperature (SST anomalies in the tropical eastern Pacific and negative SST anomalies in the tropical central Pacific. An anomalous lower-level divergent center appears in the tropical central Pacific. Via a Rossby-wave-like response, an anomalous lower-level anticyclone appears to the southeast of Taiwan over the Philippine Sea-tropical western Pacific region, which is accompanied by an anomalous cyclone to the north-northeast of Taiwan. Both circulation anomalies induce anomalous southwesterly flows to enhance moisture flux from the South China Sea onto Taiwan, resulting in significant moisture convergence nearby Taiwan. With enhanced moisture supplied by anomalous southwesterly flows, significant rainfall increases occur in both inter-annual and inter-decadal timescales in early spring rainfall on Taiwan.

  18. Converting three general-cognitive function scales into Persian and assessment of their validity and reliability

    Directory of Open Access Journals (Sweden)

    Payam Moin

    2011-01-01

    Full Text Available Objectives: Glasgow Outcome Scale Extended (GOSE, Galveston Amnesia and orientation Test (GOAT and Disability Rating Scale (DRS are three popular outcome measure tools used principally in traumatic brain injury (TBI patients. We conducted this study to provide a Farsi version of these outcome scales for use in Iran. Methods: Following a comprehensive literature review, Farsi transcripts were prepared by "forward-backward" translation and reviewed by subject experts. After a pretest on a few patients, the final versions were obtained. 38 patients with closed head injury were interviewed simultaneously by two interviewers. Main statistics used to assess validity and reliability included "Factor analysis" for construct validity, Cronbach′s alpha for internal consistency, and Pearson Correlation and Kappa Coefficient for inter-rater agreement. Results: Factor analysis for Farsi-GOAT (FGOAT revealed 5 independent factors with a total distribution variance of 80.2%. For Farsi-DRS (FDRS, 3 independent factors were found with a 92.3% variance. The Cronbach′s alpha (95% confidence interval was 0.84 (0.763- 0.919 and 0.91 (0.901-0.919 for FGOAT and FDRS, respectively. Pearson Correlation between total scores of two raters was 0.98 and 0.97 for FGOAT and FDRS, in order. Kappa coefficient (95% CI between outcome rankings of raters was 0.73 (0.618-0.852 and 0.68 (0.594-0.770 for FGOAT and FDRS, respectively. As for Farsi-GOSE scale, Kappa value was 0.4 (0.285-0.507 for 8-level outcome ranking and improved to 0.7 (0.585-0.817 for 5-level scale. We found a good correlation between FDRS and FGOSE predicted prognoses (Spearman′s rho= 0.74, 95% CI: 0.676-0.802. Conclusions: FDRS and FGOAT had appropriate validity and reliability. The 8-level outcome FGOSE scale disclosed a low inter-rater agreement, but a suitable observer agreement was achieved when the 5-level outcome was applied.

  19. Developing a digital photography-based method for dietary analysis in self-serve dining settings.

    Science.gov (United States)

    Christoph, Mary J; Loman, Brett R; Ellison, Brenna

    2017-07-01

    Current population-based methods for assessing dietary intake, including food frequency questionnaires, food diaries, and 24-h dietary recall, are limited in their ability to objectively measure food intake. Digital photography has been identified as a promising addition to these techniques but has rarely been assessed in self-serve settings. We utilized digital photography to examine university students' food choices and consumption in a self-serve dining hall setting. Research assistants took pre- and post-photos of students' plates during lunch and dinner to assess selection (presence), servings, and consumption of MyPlate food groups. Four coders rated the same set of approximately 180 meals for inter-rater reliability analyses; approximately 50 additional meals were coded twice by each coder to assess intra-rater agreement. Inter-rater agreement on the selection, servings, and consumption of food groups was high at 93.5%; intra-rater agreement was similarly high with an average of 95.6% agreement. Coders achieved the highest rates of agreement in assessing if a food group was present on the plate (95-99% inter-rater agreement, depending on food group) and estimating the servings of food selected (81-98% inter-rater agreement). Estimating consumption, particularly for items such as beans and cheese that were often in mixed dishes, was more challenging (77-94% inter-rater agreement). Results suggest that the digital photography method presented is feasible for large studies in real-world environments and can provide an objective measure of food selection, servings, and consumption with a high degree of agreement between coders; however, to make accurate claims about the state of dietary intake in all-you-can-eat, self-serve settings, researchers will need to account for the possibility of diners taking multiple trips through the serving line. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Consistent adoption of the International System of Units (SI) in nuclear science and technology

    Energy Technology Data Exchange (ETDEWEB)

    Klumpar, J; Kovar, Z [Ceskoslovenska Akademie Ved, Prague. Laborator Radiologicke Dozimetrie; Sacha, J [Slovenska Akademia Vied, Bratislava (Czechoslovakia). Fyzikalny Ustav

    1975-11-01

    The principles are stressed behind a consistent introduction of the International System of Units (SI) in Czechoslovakia complying with the latest edition of the Czechoslovak Standard CSN 01 1300 on the prescribed system of national and international units. The use of special and auxiliary units in nuclear physics and technology is discussed, particular attention being devoted to the units of activity and to the time units applied in radiology. Conversion graph and tables are annexed.

  1. Delirium diagnosis methodology used in research: a survey-based study.

    Science.gov (United States)

    Neufeld, Karin J; Nelliot, Archana; Inouye, Sharon K; Ely, E Wesley; Bienvenu, O Joseph; Lee, Hochang Benjamin; Needham, Dale M

    2014-12-01

    To describe methodology used to diagnose delirium in research studies evaluating delirium detection tools. The authors used a survey to address reference rater methodology for delirium diagnosis, including rater characteristics, sources of patient information, and diagnostic process, completed via web or telephone interview according to respondent preference. Participants were authors of 39 studies included in three recent systematic reviews of delirium detection instruments in hospitalized patients. Authors from 85% (N = 33) of the 39 eligible studies responded to the survey. The median number of raters per study was 2.5 (interquartile range: 2-3); 79% were physicians. The raters' median duration of clinical experience with delirium diagnosis was 7 years (interquartile range: 4-10), with 5% having no prior clinical experience. Inter-rater reliability was evaluated in 70% of studies. Cognitive tests and delirium detection tools were used in the delirium reference rating process in 61% (N = 21) and 45% (N = 15) of studies, respectively, with 33% (N = 11) using both and 27% (N = 9) using neither. When patients were too drowsy or declined to participate in delirium evaluation, 70% of studies (N = 23) used all available information for delirium diagnosis, whereas 15% excluded such patients. Significant variability exists in reference standard methods for delirium diagnosis in published research. Increasing standardization by documenting inter-rater reliability, using standardized cognitive and delirium detection tools, incorporating diagnostic expert consensus panels, and using all available information in patients declining or unable to participate with formal testing may help advance delirium research by increasing consistency of case detection and improving generalizability of research results. Copyright © 2014 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.

  2. Factorial validity and internal consistency of the motivational climate in physical education scale.

    Science.gov (United States)

    Soini, Markus; Liukkonen, Jarmo; Watt, Anthony; Yli-Piipari, Sami; Jaakkola, Timo

    2014-01-01

    The aim of the study was to examine the construct validity and internal consistency of the Motivational Climate in Physical Education Scale (MCPES). A key element of the development process of the scale was establishing a theoretical framework that integrated the dimensions of task- and ego involving climates in conjunction with autonomy, and social relatedness supporting climates. These constructs were adopted from the self-determination and achievement goal theories. A sample of Finnish Grade 9 students, comprising 2,594 girls and 1,803 boys, completed the 18-item MCPES during one physical education class. The results of the study demonstrated that participants had highest mean in task-involving climate and the lowest in autonomy climate and ego-involving climate. Additionally, autonomy, social relatedness, and task- involving climates were significantly and strongly correlated with each other, whereas the ego- involving climate had low or negligible correlations with the other climate dimensions.The construct validity of the MCPES was analyzed using confirmatory factor analysis. The statistical fit of the four-factor model consisting of motivational climate factors supporting perceived autonomy, social relatedness, task-involvement, and ego-involvement was satisfactory. The results of the reliability analysis showed acceptable internal consistencies for all four dimensions. The Motivational Climate in Physical Education Scale can be considered as psychometrically valid tool to measure motivational climate in Finnish Grade 9 students. Key PointsThis study developed Motivational Climate in School Physical Education Scale (MCPES). During the development process of the scale, the theoretical framework using dimensions of task- and ego involving as well as autonomy, and social relatedness supporting climates was constructed. These constructs were adopted from the self-determination and achievement goal theories.The statistical fit of the four-factor model of the

  3. "War" in the Jurisprudence of the Inter American Court of Human Rights

    Directory of Open Access Journals (Sweden)

    Laurence Burgorgue - Larsen

    2010-12-01

    Full Text Available How have Inter-American Human Rights bodies dealt with the notion of “war”, which has been transformed over time into the notion of internal and international “armed conflicts”? This question provides the analytical foundation of the first part of this study, which sets out the various types of conflicts that have occurred in the American continent. These situations (armed conflicts, internal strife, State terrorism have produced a wide range of legal categorizations, utilized by both the Commission and Inter-American Court of Human Rights in their case-law. This conceptual delimitation carried out by these two bodies is all the more important as it affects the law that applies to armed conflicts. Indeed, by analysing this question, the never-ending debate on the relationship between International Human Rights Law and International Humanitarian Law reappears. The second part of this study therefore focuses on the issue of discovering whether and in which way jus in bello has found its place into the Inter-American Human Rights bodies’ case-law. As the active political life of Latin American societies has shown, the study of the different applicable legal regimes also requires looking into “state of emergency” Law, an issue which has been shaped by the Inter-American Court and Commission’s work.

  4. A Spanish validation of the Coma Recovery Scale-Revised (CRS-R).

    Science.gov (United States)

    Tamashiro, Mercedes; Rivas, Maria Elisa; Ron, Melania; Salierno, Fernando; Dalera, Marisol; Olmos, Lisandro

    2014-01-01

    Analysis of inter-rater reliability and concurrent validity. To determine measurement properties of a Spanish version of The Coma Recovery Scale-Revised (CRS-R). A sample of 35 in-patients with severe acquired brain injury. To test concurrent validity of the translated scale, the Glasgow Coma Scale (GSC) and Disability Rating Scale (DRS) were also administered. Two experts in the field were recruited to assess inter-rater agreement. Inter-rater reliability was good for total CRS-R scores (Cronbach α = 0.973, p = 0.001). Sub-scale analysis showed moderate-to-high inter-rater agreement. Total CRS-R scores correlated significantly (p < 0.05) with total GCS (r = 0.74) and DRS (r = 0.54) scores, indicating acceptable concurrent validity. The Spanish version of CRS-R can be administered reliably by trained and experienced examiners. CRS-R appears capable of differentiating patients in Emergence from Minimally Conscious State (EMCS) or in Minimally Conscious State (MCS) from those in a Vegetative State (VS).

  5. Improving inter-observer variability in the evaluation of ultrasonographic features of polycystic ovaries

    Directory of Open Access Journals (Sweden)

    Leswick David A

    2008-07-01

    Full Text Available Abstract Background We recently reported poor inter-observer agreement in identifying and quantifying individual ultrasonographic features of polycystic ovaries. Our objective was to determine the effect of a training workshop on reducing inter-observer variation in the ultrasonographic evaluation of polycystic ovaries. Methods Transvaginal ultrasound recordings from thirty women with polycystic ovary syndrome (PCOS were evaluated by three radiologists and three reproductive endocrinologists both before and after an ultrasound workshop. The following endpoints were assessed: 1 follicle number per ovary (FNPO, 2 follicle number per single cross-section (FNPS, 3 largest follicle diameter, 4 ovarian volume, 5 follicle distribution pattern and 6 presence of a corpus luteum (CL. Lin's concordance correlation coefficients (rho and kappa statistics for multiple raters (kappa were used to assess level of inter-observer agreement (>0.80 good, 0.60 – 0.80 moderate/fair, Results Following the workshop, inter-observer agreement improved for the evaluation of FNPS (rho = 0.70, delta rho = +0.11, largest follicle diameter (rho = 0.77, delta rho = +0.10, ovarian volume (rho = 0.84, delta rho = +0.12, follicle distribution pattern (kappa = 0.80, delta kappa = +0.21 and presence of a CL (kappa = 0.87, delta kappa = +0.05. No improvement was evident for FNPO (rho = 0.54, delta rho = -0.01. Both radiologists and reproductive endocrinologists demonstrated improvement in scores (p Conclusion Reliability in evaluating ultrasonographic features of polycystic ovaries can be significantly improved following participation in a training workshop. If ultrasonographic evidence of polycystic ovaries is to be used as an objective measure in the diagnosis of PCOS, then standardized training modules should be implemented to unify the approach to evaluating polycystic ovarian morphology.

  6. Test of Gross Motor Development : Expert Validity, confirmatory validity and internal consistence

    Directory of Open Access Journals (Sweden)

    Nadia Cristina Valentini

    2008-12-01

    Full Text Available The Test of Gross Motor Development (TGMD-2 is an instrument used to evaluate children’s level of motordevelopment. The objective of this study was to translate and verify the clarity and pertinence of the TGMD-2 items by expertsand the confirmatory factorial validity and the internal consistence by means of test-retest of the Portuguese TGMD-2. Across-cultural translation was used to construct the Portuguese version. The participants of this study were 7 professionalsand 587 children, from 27 schools (kindergarten and elementary from 3 to 10 years old (51.1% boys and 48.9% girls.Each child was videotaped performing the test twice. The videotaped tests were then scored. The results indicated thatthe Portuguese version of the TGMD-2 contains clear and pertinent motor items; demonstrated satisfactory indices ofconfirmatory factorial validity (χ2/gl = 3.38; Goodness-of-fit Index = 0.95; Adjusted Goodness-of-fit index = 0.92 and Tuckerand Lewis’s Index of Fit = 0.83 and test-retest internal consistency (locomotion r = 0.82; control of object: r = 0.88. ThePortuguese TGMD-2 demonstrated validity and reliability for the sample investigated.

  7. Test of Gross Motor Development: expert validity, confirmatory validity and internal consistence

    Directory of Open Access Journals (Sweden)

    Nadia Cristina Valentini

    2008-01-01

    The Test of Gross Motor Development (TGMD-2 is an instrument used to evaluate children’s level of motor development. The objective of this study was to translate and verify the clarity and pertinence of the TGMD-2 items by experts and the confirmatory factorial validity and the internal consistence by means of test-retest of the Portuguese TGMD-2. A cross-cultural translation was used to construct the Portuguese version. The participants of this study were 7 professionals and 587 children, from 27 schools (kindergarten and elementary from 3 to 10 years old (51.1% boys and 48.9% girls. Each child was videotaped performing the test twice. The videotaped tests were then scored. The results indicated that the Portuguese version of the TGMD-2 contains clear and pertinent motor items; demonstrated satisfactory indices of confirmatory factorial validity (÷2/gl = 3.38; Goodness-of-fit Index = 0.95; Adjusted Goodness-of-fit index = 0.92 and Tucker and Lewis’s Index of Fit = 0.83 and test-retest internal consistency (locomotion r = 0.82; control of object: r = 0.88. The Portuguese TGMD-2 demonstrated validity and reliability for the sample investigated.

  8. Reproducibility of thoracic kyphosis measurements in patients with adolescent idiopathic scoliosis.

    Science.gov (United States)

    Ohrt-Nissen, Søren; Cheung, Jason Pui Yin; Hallager, Dennis Winge; Gehrchen, Martin; Kwan, Kenny; Dahl, Benny; Cheung, Kenneth M C; Samartzis, Dino

    2017-01-01

    Current surgical treatment for adolescent idiopathic scoliosis (AIS) involves correction in both the coronal and sagittal plane, and thorough assessment of these parameters is essential for evaluation of surgical results. However, various definitions of thoracic kyphosis (TK) have been proposed, and the intra- and inter-rater reproducibility of these measures has not been determined. As such, the purpose of the current study was to determine the intra- and inter-rater reproducibility of several TK measurements used in the assessment of AIS. Twenty patients (90% females) surgically treated for AIS with alternate-level pedicle screw fixation were included in the study. Three raters independently evaluated pre- and postoperative standing lateral plain radiographs. For each radiograph, several definitions of TK were measured as well as L1-S1 and nonfixed lumbar lordosis. All variables were measured twice 14 days apart, and a mixed effects model was used to determine the repeatability coefficient (RC), which is a measure of the agreement between repeated measurements. Also, the intra- and inter-rater intra-class correlation coefficient (ICC) was determined as a measure of reliability. Preoperative median Cobb angle was 58° (range 41°-86°), and median surgical curve correction was 68% (range 49-87%). Overall intra-rater RC was highest for T2-T12 and nonfixed TK (11°) and lowest for T4-T12 and T5-T12 (8°). Inter-rater RC was highest for T1-T12, T1-nonfixed, and nonfixed TK (13°) and lowest for T5-T12 (9°). Agreement varied substantially between pre- and postoperative radiographs. Inter-rater ICC was highest for T4-T12 (0.92; 95% CI 0.88-0.95) and T5-T12 (0.92; 95% CI 0.88-0.95) and lowest for T1-nonfixed (0.80; 95% CI 0.72-0.88). Considerable variation for all TK measurements was noted. Intra- and inter-rater reproducibility was best for T4-T12 and T5-T12. Future studies should consider adopting a relevant minimum difference as a limit for true change in TK.

  9. WOrk-Related Questionnaire for UPper extremity disorders (WORQ-UP): Factor Analysis and Internal Consistency.

    Science.gov (United States)

    Aerts, Bas R; Kuijer, P Paul; Beumer, Annechien; Eygendaal, Denise; Frings-Dresen, Monique H

    2018-04-17

    To test a 17-item questionnaire, the WOrk-Related Questionnaire for UPper extremity disorders (WORQ-UP), for dimensionality of the items (factor analysis) and internal consistency. Cross-sectional study. Outpatient clinic. A consecutive sample of patients (N=150) consisting of all new referral patients (either from a general physician or other hospital) who visited the orthopedic outpatient clinic because of an upper extremity musculoskeletal disorder. Not applicable. Number and dimensionality of the factors in the WORQ-UP. Four factors with eigenvalues (EVs) >1.0 were found. The factors were named exertion, dexterity, tools & equipment, and mobility. The EVs of the factors were, respectively, 5.78, 2.38, 1.81, and 1.24. The factors together explained 65.9% of the variance. The Cronbach alpha values for these factors were, respectively, .88, .74, .87, and .66. The 17 items of the WORQ-UP resemble 4 factors-exertion, dexterity, tools & equipment, and mobility-with a good internal consistency. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  10. Internal Consistency and Concurrent Validity of the Questionnaire for Limitations and Restrictions Assessment in Children with ADHD

    Directory of Open Access Journals (Sweden)

    Luisa Matilde Salamanca-Duque

    2014-09-01

    Full Text Available Introduction: ADHD is one of the most common diagnoses in child psychiatry, its early diagnosis is of great importance for intervention at family, school and social environment. Based on the International Classification of Functioning, Disability and Health (ICF, a questionnaire was designed to assess activity limitations and participation restrictions in children with ADHD. The questionnaire was called “CLARP-ADHD Parent and Teacher Version”. Objective: To determine the degree of internal consistency of the CLARP-ADHD questionnaire, and its concurrent validity with the “Strengths and Difficulties Questionnaire SDQ parent and teacher version”. Material and Methods: A sample of 203 children aged 6 to 12 with ADHD, currently attending school in five Colombian cities. The questionnaires were applied to parents and teachers. The internal consistency analysis was performed through Cronbach coefficient and concurrent validity using the Spearman correlation coefficient utilizing multiple and unique predictors through multiple linear regression as well as simple regression models. Results: A high internal consistency was found for global questionnaires for each of its domains. The CLARP-ADHD for parents gave as result an internal consistency of 0.83, and the CLARP-ADHD for teachers one of 0.93. Concurrent validity was found between the CLARP-ADHD and the SDQ Parent and Teacher version; also, concurrence between the CLARPADHD for Teachers and the SDQ Teachers was found, as well as between CLARP ADHD for Parents and CLARP ADHD Teachers, given by p values of p < 0.001.

  11. Minimal detectable change of the Personal and Social Performance scale in individuals with schizophrenia.

    Science.gov (United States)

    Lee, Shu-Chun; Tang, Shih-Fen; Lu, Wen-Shian; Huang, Sheau-Ling; Deng, Nai-Yu; Lue, Wen-Chyn; Hsieh, Ching-Lin

    2016-12-30

    The minimal detectable change (MDC) of the Personal and Social Performance scale (PSP) has not yet been investigated, limiting its utility in data interpretation. The purpose of this study was to determine the MDCs of the PSP administered by the same rater or different raters in individuals with schizophrenia. Participants with schizophrenia were recruited from two psychiatric community rehabilitation centers to complete the PSP assessments twice, 2 weeks apart, by the same rater or 2 different raters. MDC values were calculated from the coefficients of intra- and inter-rater reliability (i.e., intraclass correlation coefficients). Forty patients (mean age 36.9 years, SD 9.7) from one center participated in the intra-rater reliability study. Another 40 patients (mean age 44.3 years, SD 11.1) from the other center participated in the inter-rater study. The MDCs (MDC%) of the PSP were 10.7 (17.1%) for the same rater and 16.2 (24.1%) for different raters. The MDCs of the PSP appeared appropriate for clinical trials aiming to determine whether a real change in social functioning has occurred in people with schizophrenia. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  12. The Internal, External, and Diagnostic Validity of Sluggish Cognitive Tempo: A Meta-Analysis and Critical Review

    Science.gov (United States)

    Becker, Stephen P.; Leopold, Daniel R.; Burns, G. Leonard; Jarrett, Matthew A.; Langberg, Joshua M.; Marshall, Stephen A.; McBurnett, Keith; Waschbusch, Daniel A.; Willcutt, Erik G.

    2015-01-01

    Objective To conduct the first meta-analysis evaluating the internal and external validity of the sluggish cognitive tempo (SCT) construct as related to or distinct from attention-deficit/hyperactivity disorder (ADHD) and as associated with functional impairment and neuropsychological functioning. Method Electronic databases were searched through September 2015 for studies examining the factor structure and/or correlates of SCT in children or adults. The search procedures identified 73 papers. The core SCT behaviors included across studies, as well as factor loadings and reliability estimates, were reviewed to evaluate internal validity. Pooled correlation effect sizes using random effects models were used to evaluate SCT in relation to external validity domains (i.e., demographics, other psychopathologies, functional impairment, and neuropsychological functioning). Results Strong support was found for the internal validity of the SCT construct. Specifically, across factor analytic studies including over 19,000 individuals, 13 SCT items loaded consistently on an SCT factor as opposed to an ADHD factor. Findings also support the reliability (i.e., internal consistency, test-retest reliability, inter-rater reliability) of SCT. In terms of external validity, there is some indication that SCT may increase with age (r = 0.11) and be associated with lower socioeconomic status (r = 0.10). Modest (potentially negligible) support was found for SCT symptoms being higher in males than females in children (r = 0.05) but not adults. SCT is more strongly associated with ADHD inattention (r = 0.63 in children, r = 0.72 in adults) than with ADHD hyperactivity-impulsivity (r = 0.32 in children, r = 0.46 in adults), and it likewise appears that SCT is more strongly associated with internalizing symptoms than with externalizing symptoms. SCT is associated with significant global, social, and academic impairment (rs = 0.38–0.44). Effects for neuropsychological functioning are mixed

  13. Recommendations for translation and reliability testing of International Spinal Cord Injury Data Sets.

    Science.gov (United States)

    Biering-Sørensen, F; Alexander, M S; Burns, S; Charlifue, S; DeVivo, M; Dietz, V; Krassioukov, A; Marino, R; Noonan, V; Post, M W M; Stripling, T; Vogel, L; Wing, P

    2011-03-01

    To provide recommendations regarding translation and reliability testing of International Spinal Cord Injury (SCI) Data Sets. The Executive Committee for the International SCI Standards and Data Sets. Translations of any specific International SCI Data Set can be accomplished by translation from the English version into the target language, and be followed by a back-translation into English, to confirm that the original meaning has been preserved. Another approach is to have the initial translation performed by translators who have knowledge of SCI, and afterwards controlled by other person(s) with the same kind of knowledge. The translation process includes both language translation and cultural adaptation, and therefore shall not be made word for word, but will strive to include conceptual equivalence. At a minimum, the inter-rater reliability should be tested by no less than two independent observers, and preferably in multiple countries. Translations must include information on the name, role and background of everyone involved in the translation process, and shall be dated and noted with a version number. By following the proposed guidelines, translated data sets should assure comparability of data acquisition across countries and cultures. If the translation process identifies irregularities or misrepresentation in either the original English version or the target language, the working group for the particular International SCI Data Set shall revise the data set accordingly, which may include re-wording of the original English version in order to accomplish a compromise in the content of the data set.

  14. The reliability of three psoriasis assessment tools: Psoriasis area and severity index, body surface area and physician global assessment.

    Science.gov (United States)

    Bożek, Agnieszka; Reich, Adam

    2017-08-01

    A wide variety of psoriasis assessment tools have been proposed to evaluate the severity of psoriasis in clinical trials and daily practice. The most frequently used clinical instrument is the psoriasis area and severity index (PASI); however, none of the currently published severity scores used for psoriasis meets all the validation criteria required for an ideal score. The aim of this study was to compare and assess the reliability of 3 commonly used assessment instruments for psoriasis severity: the psoriasis area and severity index (PASI), body surface area (BSA) and physician global assessment (PGA). On the scoring day, 10 trained dermatologists evaluated 9 adult patients with plaque-type psoriasis using the PASI, BSA and PGA. All the subjects were assessed twice by each physician. Correlations between the assessments were analyzed using the Pearson correlation coefficient. Intra-class correlation coefficient (ICC) was calculated to analyze intra-rater reliability, and the coefficient of variation (CV) was used to assess inter-rater variability. Significant correlations were observed among the 3 scales in both assessments. In all 3 scales the ICCs were > 0.75, indicating high intra-rater reliability. The highest ICC was for the BSA (0.96) and the lowest one for the PGA (0.87). The CV for the PGA and PASI were 29.3 and 36.9, respectively, indicating moderate inter-rater variability. The CV for the BSA was 57.1, indicating high inter-rater variability. Comparing the PASI, PGA and BSA, it was shown that the PGA had the highest inter-rater reliability, whereas the BSA had the highest intra-rater reliability. The PASI showed intermediate values in terms of interand intra-rater reliability. None of the 3 assessment instruments showed a significant advantage over the other. A reliable assessment of psoriasis severity requires the use of several independent evaluations simultaneously.

  15. The Internal Consistency and Validity of the Vaccination Attitudes Examination Scale: A Replication Study.

    Science.gov (United States)

    Wood, Louise; Smith, Michael; Miller, Christopher B; O'Carroll, Ronan E

    2018-06-19

    Vaccinations are important preventative health behaviors. The recently developed Vaccination Attitudes Examination (VAX) Scale aims to measure the reasons behind refusal/hesitancy regarding vaccinations. The aim of this replication study is to conduct an independent test of the newly developed VAX Scale in the UK. We tested (a) internal consistency (Cronbach's α); (b) convergent validity by assessing its relationships with beliefs about medication, medical mistrust, and perceived sensitivity to medicines; and (c) construct validity by testing how well the VAX Scale discriminated between vaccinators and nonvaccinators. A sample of 243 UK adults completed the VAX Scale, the Beliefs About Medicines Questionnaire, the Perceived Sensitivity to Medicines Scale, and the Medical Mistrust Index, in addition to demographics of age, gender, education levels, and social deprivation. Participants were asked (a) whether they received an influenza vaccination in the past year and (b) if they had a young child, whether they had vaccinated the young child against influenza in the past year. The VAX (a) demonstrated high internal consistency (α = .92); (b) was positively correlated with medical mistrust and beliefs about medicines, and less strongly correlated with perceived sensitivity to medicines; and (c) successfully differentiated parental influenza vaccinators from nonvaccinators. The VAX demonstrated good internal consistency, convergent validity, and construct validity in an independent UK sample. It appears to be a useful measure to help us understand the health beliefs that promote or deter vaccination behavior.

  16. The development of a semi-structured home interview (CHIF) to directly assess function in cognitively impaired elderly people in two cultures

    Science.gov (United States)

    Hendrie, H. C.; Lane, K. A.; Ogunniyi, A.; Baiyewu, O.; Gureje, O.; Evans, R.; Smith-Gamble, V.; Pettaway, M.; Unverzagt, F. W.; Gao, S.; Hall, K. S.

    2010-01-01

    Background Assessing function is a crucial element in the diagnosis of dementia. This information is usually obtained from key informants. However, reliable informants are not always available. Methods A 10-item semi-structured home interview (the CHIF, or Clinician Home-based Interview to assess Function) to assess function primarily by measuring instrumental activities of daily living directly was developed and tested for inter-rater reliability and validity as part of the Indianapolis–Ibadan dementia project. The primary validity measurements were correlations between scores on the CHIF and independently gathered scores on the Blessed Dementia Scale (from informants) and the Mini-mental State Examination (MMSE). Sensitivities and specificities of scores on the CHIF and receiver operator characteristic (ROC) curves were constructed with dementia as the dependent variable. Results Inter-rater reliability for the CHIF was high (Pearson’s correlation coefficient 0.99 in Indianapolis and 0.87 in Ibadan). Internal consistency, in both samples, was good (Cronbach’s α 0.95 in Indianapolis and 0.83 in Ibadan). Scores on the CHIF correlated well with the Blessed Dementia scores at both sites (−0.71, p < 0.0001 for Indianapolis and −0.56, p < 0.0001 for Ibadan) and with the MMSE (0.75, p < 0.0001 for Indianapolis and 0.44, p < 0.0001 for Ibadan). For all items at both sites, the subjects without dementia performed significantly better than those with dementia. The area under the ROC curve for dementia diagnosis was 0.965 for Indianapolis and 0.925 for Ibadan. Conclusion The CHIF is a useful instrument to assess function directly in elderly participants in international studies, particularly in the absence of reliable informants. PMID:16640794

  17. Analysis of underlying causes of inter-expert disagreement in retinopathy of prematurity diagnosis. Application of machine learning principles.

    Science.gov (United States)

    Ataer-Cansizoglu, E; Kalpathy-Cramer, J; You, S; Keck, K; Erdogmus, D; Chiang, M F

    2015-01-01

    Inter-expert variability in image-based clinical diagnosis has been demonstrated in many diseases including retinopathy of prematurity (ROP), which is a disease affecting low birth weight infants and is a major cause of childhood blindness. In order to better understand the underlying causes of variability among experts, we propose a method to quantify the variability of expert decisions and analyze the relationship between expert diagnoses and features computed from the images. Identification of these features is relevant for development of computer-based decision support systems and educational systems in ROP, and these methods may be applicable to other diseases where inter-expert variability is observed. The experiments were carried out on a dataset of 34 retinal images, each with diagnoses provided independently by 22 experts. Analysis was performed using concepts of Mutual Information (MI) and Kernel Density Estimation. A large set of structural features (a total of 66) were extracted from retinal images. Feature selection was utilized to identify the most important features that correlated to actual clinical decisions by the 22 study experts. The best three features for each observer were selected by an exhaustive search on all possible feature subsets and considering joint MI as a relevance criterion. We also compared our results with the results of Cohen's Kappa [36] as an inter-rater reliability measure. The results demonstrate that a group of observers (17 among 22) decide consistently with each other. Mean and second central moment of arteriolar tortuosity is among the reasons of disagreement between this group and the rest of the observers, meaning that the group of experts consider amount of tortuosity as well as the variation of tortuosity in the image. Given a set of image-based features, the proposed analysis method can identify critical image-based features that lead to expert agreement and disagreement in diagnosis of ROP. Although tree

  18. An inter-religious humanitarian response in the Central African Republic

    Directory of Open Access Journals (Sweden)

    Catherine Mahony

    2014-11-01

    Full Text Available Inter-religious action has played a key role in ensuring that social cohesion and inter-religious mediation remain on the international agenda in relation to response in the Central African Republic, where people’s faith is an integral part of their identity but where it has been manipulated in a horrific way.

  19. Hippocampal MR volumetry

    Science.gov (United States)

    Haller, John W.; Botteron, K.; Brunsden, Barry S.; Sheline, Yvette I.; Walkup, Ronald K.; Black, Kevin J.; Gado, Mokhtar; Vannier, Michael W.

    1994-09-01

    Goal: To estimate hippocampal volumes from in vivo 3D magnetic resonance (MR) brain images and determine inter-rater and intra- rater repeatability. Objective: The precision and repeatability of hippocampal volume estimates using stereologic measurement methods is sought. Design: Five normal control and five schizophrenic subjects were MR scanned using a MPRAGE protocol. Fixed grid stereologic methods were used to estimate hippocampal volumes on a graphics workstation. The images were preprocessed using histogram analysis to standardize 3D MR image scaling from 16 to 8 bits and image volumes were interpolated to 0.5 mm3 isotropic voxels. The following variables were constant for the repeated stereologic measures: grid size, inter-slice distance (1.5 mm), voxel dimensions (0.5 mm3), number of hippocampi measured (10), total number of measurements per rater (40), and number of raters (5). Two grid sizes were tested to determine the coefficient of error associated with the number of sampled 'hits' (approximately 140 and 280) on the hippocampus. Starting slice and grid position were randomly varied to assure unbiased volume estimates. Raters were blind to subject identity, diagnosis, and side of the brain from which the image volumes were extracted and the order of subject presentation was randomized for each of the raters. Inter- and intra-rater intraclass correlation coefficients (ICC) were determined. Results: The data indicate excellent repeatability of fixed grid stereologic hippocampal volume measures when using an inter-slice distance of 1.5 mm and a 6.25 mm2 grid (inter-rater ICCs equals 0.86 - 0.97, intra- rater ICCs equals 0.85 - 0.97). One major advantage of the current study was the use of 3D MR data which significantly improved visualization of hippocampal boundaries by providing the ability to access simultaneous orthogonal views while counting stereological marks within the hippocampus. Conclusion: Stereological estimates of 3D volumes from 2D MR

  20. Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.

    Science.gov (United States)

    Lou, Xiaoying; Lee, Richard; Feins, Richard H; Enter, Daniel; Hicks, George L; Verrier, Edward D; Fann, James I

    2014-12-01

    Previous work has demonstrated high inter-rater reliability in the objective assessment of simulated anastomoses among experienced educators. We evaluated the inter-rater reliability of less-experienced educators and the impact of focused training with a video-embedded coronary anastomosis assessment tool. Nine less-experienced cardiothoracic surgery faculty members from different institutions evaluated 2 videos of simulated coronary anastomoses (1 by a medical student and 1 by a resident) at the Thoracic Surgery Directors Association Boot Camp. They then underwent a 30-minute training session using an assessment tool with embedded videos to anchor rating scores for 10 components of coronary artery anastomosis. Afterward, they evaluated 2 videos of a different student and resident performing the task. Components were scored on a 1 to 5 Likert scale, yielding an average composite score. Inter-rater reliabilities of component and composite scores were assessed using intraclass correlation coefficients (ICCs) and overall pass/fail ratings with kappa. All components of the assessment tool exhibited improvement in reliability, with 4 (bite, needle holder use, needle angles, and hand mechanics) improving the most from poor (ICC range, 0.09-0.48) to strong (ICC range, 0.80-0.90) agreement. After training, inter-rater reliabilities for composite scores improved from moderate (ICC, 0.76) to strong (ICC, 0.90) agreement, and for overall pass/fail ratings, from poor (kappa = 0.20) to moderate (kappa = 0.78) agreement. Focused, video-based anchor training facilitates greater inter-rater reliability in the objective assessment of simulated coronary anastomoses. Among raters with less teaching experience, such training may be needed before objective evaluation of technical skills. Published by Elsevier Inc.

  1. Participation, Democracy and Citizenship of Indigenous Peoples in the International Context Through the European systems and Inter-American Human Rights

    Directory of Open Access Journals (Sweden)

    Roberto Luiz Silva

    2016-06-01

    Full Text Available This article presents the global democratization for the protection of minorities through the global promotion of European systems and Inter-American Human Rights. At the level of international human rights law, these ideas are the basis of so-called “speci- fication process rights of individuals”, according to which, in addition to general universal rights extended to all, there is need to recognize specific rights to certain groups vulne- rable in society. With the goal of achieving real equality, or at least reduce the existing factual inequalities. Thus, consolidation of protection of minorities reflects the situation of need access to fair legal system to ensure the effectiveness of fundamental rights and the full consolidation of access to justice through the international courts of justice, aimed at protection of human rights the international context.

  2. [Upper limb functional assessment scale for children with Duchenne muscular dystrophy and Spinal muscular atrophy].

    Science.gov (United States)

    Escobar, Raúl G; Lucero, Nayadet; Solares, Carmen; Espinoza, Victoria; Moscoso, Odalie; Olguín, Polín; Muñoz, Karin T; Rosas, Ricardo

    2016-08-16

    Duchenne muscular dystrophy (DMD) and Spinal muscular atrophy (SMA) causes significant disability and progressive functional impairment. Readily available instruments that assess functionality, especially in advanced stages of the disease, are required to monitor the progress of the disease and the impact of therapeutic interventions. To describe the development of a scale to evaluate upper limb function (UL) in patients with DMD and SMA, and describe its validation process, which includes self-training for evaluators. The development of the scale included a review of published scales, an exploratory application of a pilot scale in healthy children and those with DMD, self-training of evaluators in applying the scale using a handbook and video tutorial, and assessment of a group of children with DMD and SMA using the final scale. Reliability was assessed using Cronbach and Kendall concordance and with intra and inter-rater test-retest, and validity with concordance and factorial analysis. A high level of reliability was observed, with high internal consistency (Cronbach α=0.97), and inter-rater (Kendall W=0.96) and intra-rater concordance (r=0.97 to 0.99). The validity was demonstrated by the absence of significant differences between results by different evaluators with an expert evaluator (F=0.023, P>.5), and by the factor analysis that showed that four factors account for 85.44% of total variance. This scale is a reliable and valid tool for assessing UL functionality in children with DMD and SMA. It is also easily implementable due to the possibility of self-training and the use of simple and inexpensive materials. Copyright © 2016 Sociedad Chilena de Pediatría. Publicado por Elsevier España, S.L.U. All rights reserved.

  3. The Koukopoulos Mixed Depression Rating Scale (KMDRS): An International Mood Network (IMN) validation study of a new mixed mood rating scale.

    Science.gov (United States)

    Sani, Gabriele; Vöhringer, Paul A; Barroilhet, Sergio A; Koukopoulos, Alexia E; Ghaemi, S Nassir

    2018-05-01

    It has been proposed that the broad major depressive disorder (MDD) construct is heterogenous. Koukopoulos has provided diagnostic criteria for an important subtype within that construct, "mixed depression" (MxD), which encompasses clinical pictures characterized by marked psychomotor or inner excitation and rage/anger, along with severe depression. This study provides psychometric validation for the first rating scale specifically designed to assess MxD symptoms cross-sectionally, the Koukopoulos Mixed Depression Rating Scale (KMDRS). 350 patients from the international mood network (IMN) completed three rating scales: the KMDRS, Montgomery-Asberg Depression Rating Scale (MADRS) and Young Mania Rating Scale (YMRS). KMDRS' psychometric properties assessed included Cronbach's alpha, inter-rater reliability, factor analysis, predictive validity, and Receiver Operator Curve analysis. Internal consistency (Cronbach's alpha = 0.76; 95% CI 0.57, 0.94) and interrater reliability (kappa = 0.73) were adequate. Confirmatory factor analysis identified 2 components: anger and psychomotor excitation (80% of total variance). Good predictive validity was seen (C-statistic = 0.82 95% CI 0.68, 0.93). Severity cut-off scores identified were as follows: none (0-4), possible (5-9), mild (10-15), moderate (16-20) and severe (> 21) MxD. Non DSM-based diagnosis of MxD may pose some difficulties in the initial use and interpretation of the scoring of the scale. Moreover, the cross-sectional nature of the evaluation does not verify the long-term stability of the scale. KMDRS was a reliable and valid instrument to assess MxD symptoms. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. Phenomena of Inter-age Manipulations in Interaction "Teacher-Student"

    Directory of Open Access Journals (Sweden)

    Miklyaeva A.V.,

    2017-01-01

    Full Text Available Thearticlepresentsthe results of studies of the phenomenon empirical inter-age manipulation in the pedagogical interaction. Inter-age manipulation is considered a form of manipulation carried out on the basis of an appeal to the participants in the interaction age roles. Based on the results of a survey 109 teenagers 13-15 years, using a questionnaire, color test of relations and projective drawing shows that inter-age manipulation is a common way to impact on the students, elected teacher. Teachers are the subjects of inter-age manipulation more often than students. It was revealed that the effectiveness of inter-age manipulation in pedagogical interaction increases if it is meaningful is consistent with the normative content of age roles, as well as «inter-age distance" between the teacher and the students. The greatest effectiveness of have inter-age manipulation undertaken for older teachers, and manipulation "from below" from young teachers

  5. Comparing the Effectiveness of Self-Paced and Collaborative Frame-of-Reference Training on Rater Accuracy in a Large-Scale Writing Assessment

    Science.gov (United States)

    Raczynski, Kevin R.; Cohen, Allan S.; Engelhard, George, Jr.; Lu, Zhenqiu

    2015-01-01

    There is a large body of research on the effectiveness of rater training methods in the industrial and organizational psychology literature. Less has been reported in the measurement literature on large-scale writing assessments. This study compared the effectiveness of two widely used rater training methods--self-paced and collaborative…

  6. Using Raters from India to Score a Large-Scale Speaking Test

    Science.gov (United States)

    Xi, Xiaoming; Mollaun, Pam

    2011-01-01

    We investigated the scoring of the Speaking section of the Test of English as a Foreign Language[TM] Internet-based (TOEFL iBT[R]) test by speakers of English and one or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the TOEFL examinees with mixed first languages…

  7. Factorial Validity and Internal Consistency of the Motivational Climate in Physical Education Scale

    Directory of Open Access Journals (Sweden)

    Markus Soini

    2014-03-01

    Full Text Available The aim of the study was to examine the construct validity and internal consistency of the Motivational Climate in Physical Education Scale (MCPES. A key element of the development process of the scale was establishing a theoretical framework that integrated the dimensions of task- and ego involving climates in conjunction with autonomy, and social relatedness supporting climates. These constructs were adopted from the self-determination and achievement goal theories. A sample of Finnish Grade 9 students, comprising 2,594 girls and 1,803 boys, completed the 18-item MCPES during one physical education class. The results of the study demonstrated that participants had highest mean in task-involving climate and the lowest in autonomy climate and ego-involving climate. Additionally, autonomy, social relatedness, and task- involving climates were significantly and strongly correlated with each other, whereas the ego- involving climate had low or negligible correlations with the other climate dimensions.The construct validity of the MCPES was analyzed using confirmatory factor analysis. The statistical fit of the four-factor model consisting of motivational climate factors supporting perceived autonomy, social relatedness, task-involvement, and ego-involvement was satisfactory. The results of the reliability analysis showed acceptable internal consistencies for all four dimensions. The Motivational Climate in Physical Education Scale can be considered as psychometrically valid tool to measure motivational climate in Finnish Grade 9 students.

  8. Is the Bayley Scales of Infant and Toddler Developmental Screening Test, Valid and Reliable for Persian Speaking Children?

    Science.gov (United States)

    Soleimani, Farin; Azari, Nadia; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud

    2016-10-01

    Advances in perinatal and neonatal care have substantially improved the survival of at-risk infants over the past two decades. The purpose of this study was to assess the reliability and validity of the Bayley Scales of infant and toddler developmental Screening test in Persian-speaking children. This was a cross-sectional prospective study of 403 children aged 1 - 42-months. The Bayley scales screening instrument, which consists of five domains (cognitive, receptive, and expressive communication and fine and gross motor items), was used to measure infants' and toddlers' development. The psychometric properties examined included the face and content validity of the scale, in addition to cultural and linguistic modifications to the scale and its test-retest and inter-rater reliability. An expert team changed some of the test items relating to cultural and linguistic issues. In almost all the age groups, cultural or linguistic changes were made to items in the communication domains. According to Cronbach's alpha for internal consistency, the reliability of the cognitive scale was r = 0.79, and the reliability of the receptive scale was r = 0.76. The reliability for expressive communication, fine motor, and gross motor scales was r = 0.81, r = 0.80, and r = 0.81, respectively. The construct validity of the tests was confirmed using a factor analysis and comparison of the mean scores of the age groups. The intra- and inter-rater reliabilities of the Bayley Scales were good-to-excellent. The results indicated that the Bayley Scales had a high level of reliability in the present study. Thus, the scale can be used in a Persian population.

  9. Internal consistency of a five-item form of the Francis Scale of Attitude Toward Christianity among adolescent students.

    Science.gov (United States)

    Campo-Arias, Adalberto; Oviedo, Heidi Celina; Cogollo, Zuleima

    2009-04-01

    The short form of the Francis Scale of Attitude Toward Christianity (L. J. Francis, 1992) is a 7-item Likert-type scale that shows high homogeneity among adolescents. The psychometric performance of a shorter version of this scale has not been explored. The authors aimed to determine the internal consistency of a 5-item form of the Francis Scale of Attitude Toward Christianity among 405 students from a school in Cartagena, Colombia. The authors computed the Cronbach's alpha coefficient for the 5 items with a greater corrected item-total punctuation correlation. The version without Items 2 and 7 showed internal consistency of .87. The 5-item version of the Francis Scale of Attitude Toward Christianity exhibited higher internal consistency than did the 7-item version. Future researchers should corroborate this finding.

  10. The reliability of a segmentation methodology for assessing intramuscular adipose tissue and other soft-tissue compartments of lower leg MRI images.

    Science.gov (United States)

    Karampatos, Sarah; Papaioannou, Alexandra; Beattie, Karen A; Maly, Monica R; Chan, Adrian; Adachi, Jonathan D; Pritchard, Janet M

    2016-04-01

    Determine the reliability of a magnetic resonance (MR) image segmentation protocol for quantifying intramuscular adipose tissue (IntraMAT), subcutaneous adipose tissue, total muscle and intermuscular adipose tissue (InterMAT) of the lower leg. Ten axial lower leg MRI slices were obtained from 21 postmenopausal women using a 1 Tesla peripheral MRI system. Images were analyzed using sliceOmatic™ software. The average cross-sectional areas of the tissues were computed for the ten slices. Intra-rater and inter-rater reliability were determined and expressed as the standard error of measurement (SEM) (absolute reliability) and intraclass coefficient (ICC) (relative reliability). Intra-rater and inter-rater reliability for IntraMAT were 0.991 (95% confidence interval [CI] 0.978-0.996, p soft tissue compartments, the ICCs were all >0.90 (p soft-tissue compartments of the lower leg. A standard operating procedure manual is provided to assist users, and SEM values can be used to estimate sample size and determine confidence in repeated measurements in future research.

  11. The effect of rater training on scoring performance and scale-specific expertise amongst occupational therapists participating in a multicentre study

    DEFF Research Database (Denmark)

    Hansen, Tina; Elholm Madsen, Esben; Sørensen, Annette

    2016-01-01

    Gill Ingestive Skills Assessment (MISA) they observe, interpret and record occupational performance of dysphagic clients participating in a meal. This is a highly complex task, which might introduce unwanted variability in measurement scores. A 2-day rater training programme was developed and this builds...... of the training on scoring performance and scale-specific expertise amongst raters. METHOD: During 2 days of rater training, 81 occupational therapists (OTs) were qualified to observe and score dysphagic clients' mealtime performance according to the criteria of 36 MISA-items. The training effects were evaluated...... deficient mealtime performance appeared most difficult to score. The OTs scale-specific expertise improved significantly (knowledge: Z = -7.857, p performance when using the Danish MISA as well as their perceived...

  12. Comparison of Danish dichotomous and BI-RADS classifications of mammographic density.

    Science.gov (United States)

    Hodge, Rebecca; Hellmann, Sophie Sell; von Euler-Chelpin, My; Vejborg, Ilse; Andersen, Zorana Jovanovic

    2014-06-01

    In the Copenhagen mammography screening program from 1991 to 2001, mammographic density was classified either as fatty or mixed/dense. This dichotomous mammographic density classification system is unique internationally, and has not been validated before. To compare the Danish dichotomous mammographic density classification system from 1991 to 2001 with the density BI-RADS classifications, in an attempt to validate the Danish classification system. The study sample consisted of 120 mammograms taken in Copenhagen in 1991-2001, which tested false positive, and which were in 2012 re-assessed and classified according to the BI-RADS classification system. We calculated inter-rater agreement between the Danish dichotomous mammographic classification as fatty or mixed/dense and the four-level BI-RADS classification by the linear weighted Kappa statistic. Of the 120 women, 32 (26.7%) were classified as having fatty and 88 (73.3%) as mixed/dense mammographic density, according to Danish dichotomous classification. According to BI-RADS density classification, 12 (10.0%) women were classified as having predominantly fatty (BI-RADS code 1), 46 (38.3%) as having scattered fibroglandular (BI-RADS code 2), 57 (47.5%) as having heterogeneously dense (BI-RADS 3), and five (4.2%) as having extremely dense (BI-RADS code 4) mammographic density. The inter-rater variability assessed by weighted kappa statistic showed a substantial agreement (0.75). The dichotomous mammographic density classification system utilized in early years of Copenhagen's mammographic screening program (1991-2001) agreed well with the BI-RADS density classification system.

  13. Reliability and validity of a Chinese version of the Diagnostic Interview for Borderlines-Revised.

    Science.gov (United States)

    Wang, Lanlan; Yuan, Chenmei; Qiu, Jianying; Gunderson, John; Zhang, Min; Jiang, Kaida; Leung, Freedom; Zhong, Jie; Xiao, Zeping

    2014-09-01

    Borderline personality disorder (BPD) is the most studied of the axis II disorders. One of the most widely used diagnostic instruments is the Diagnostic Interview for Borderline Patients-Revised (DIB-R). The aim of this study was to test the reliability and validity of DIB-R for use in the Chinese culture. The reliability and validity of the DIB-R Chinese version were assessed in a sample of 236 outpatients with a probable BPD diagnosis. The Structured Clinical Interview for DSM-IV Personality Disorders (SCID-II) was used as a standard. Test-retest reliability was tested six months later with 20 patients, and inter-rater reliability was tested on 32 patients. The Chinese version of the DIB-R showed good internal global consistency (Cronbach's α of 0.916), good test-retest reliability (Pearson correlation of 0.704), good inter-rater reliability (intra-class correlation coefficient of 0.892 and kappa of 0.861). When compared with the DSM-IV diagnosis as measured by the SCID-II, the DIB-R showed relatively good sensitivity (0.768) and specificity (0.891) at the cutoff of 7, moderate diagnostic convergence (kappa of 0.631), as well as good discriminating validity. The Chinese version of the DIB-R has good psychometric properties, which renders it a valuable method for examining the presence, the severity, and component phenotypes of BPD in Chinese samples. © 2013 Wiley Publishing Asia Pty Ltd.

  14. Hybrid method for consistent model of the Pacific absolute plate motion and a test for inter-hotspot motion since 70Ma

    Science.gov (United States)

    Harada, Y.; Wessel, P.; Sterling, A.; Kroenke, L.

    2002-12-01

    Inter-hotspot motion within the Pacific plate is one of the most controversial issues in recent geophysical studies. However, it is a fact that many geophysical and geological data including ages and positions of seamount chains in the Pacific plate can largely be explained by a simple model of absolute motion derived from assumptions of rigid plates and fixed hotspots. Therefore we take the stand that if a model of plate motion can explain the ages and positions of Pacific hotspot tracks, inter-hotspot motion would not be justified. On the other hand, if any discrepancies between the model and observations are found, the inter-hotspot motion may then be estimated from these discrepancies. To make an accurate model of the absolute motion of the Pacific plate, we combined two different approaches: the polygonal finite rotation method (PFRM) by Harada and Hamano (2000) and the hot-spotting technique developed by Wessel and Kroenke (1997). The PFRM can determine accurate positions of finite rotation poles for the Pacific plate if the present positions of hotspots are known. On the other hand, the hot-spotting technique can predict present positions of hotspots if the absolute plate motion is given. Therefore we can undertake iterative calculations using the two methods. This hybrid method enables us to determine accurate finite rotation poles for the Pacific plate solely from geometry of Hawaii, Louisville and Easter(Crough)-Line hotspot tracks from around 70 Ma to present. Information of ages can be independently assigned to the model after the poles and rotation angles are determined. We did not detect any inter-hotspot motion from the geometry of these Pacific hotspot tracks using this method. The Ar-Ar ages of Pacific seamounts including new age data of ODP Leg 197 are used to test the newly determined model of the Pacific plate motion. The ages of Hawaii, Louisville, Easter(Crough)-Line, and Cobb hotspot tracks are quite consistent with each other from 70 Ma to

  15. Atmospheric Correction Inter-Comparison Exercise

    Directory of Open Access Journals (Sweden)

    Georgia Doxani

    2018-02-01

    Full Text Available The Atmospheric Correction Inter-comparison eXercise (ACIX is an international initiative with the aim to analyse the Surface Reflectance (SR products of various state-of-the-art atmospheric correction (AC processors. The Aerosol Optical Thickness (AOT and Water Vapour (WV are also examined in ACIX as additional outputs of AC processing. In this paper, the general ACIX framework is discussed; special mention is made of the motivation to initiate the experiment, the inter-comparison protocol, and the principal results. ACIX is free and open and every developer was welcome to participate. Eventually, 12 participants applied their approaches to various Landsat-8 and Sentinel-2 image datasets acquired over sites around the world. The current results diverge depending on the sensors, products, and sites, indicating their strengths and weaknesses. Indeed, this first implementation of processor inter-comparison was proven to be a good lesson for the developers to learn the advantages and limitations of their approaches. Various algorithm improvements are expected, if not already implemented, and the enhanced performances are yet to be assessed in future ACIX experiments.

  16. Surveying for "artifacts": the susceptibility of the OCB-performance evaluation relationship to common rater, item, and measurement context effects.

    Science.gov (United States)

    Podsakoff, Nathan P; Whiting, Steven W; Welsh, David T; Mai, Ke Michael

    2013-09-01

    Despite the increased attention paid to biases attributable to common method variance (CMV) over the past 50 years, researchers have only recently begun to systematically examine the effect of specific sources of CMV in previously published empirical studies. Our study contributes to this research by examining the extent to which common rater, item, and measurement context characteristics bias the relationships between organizational citizenship behaviors and performance evaluations using a mixed-effects analytic technique. Results from 173 correlations reported in 81 empirical studies (N = 31,146) indicate that even after controlling for study-level factors, common rater and anchor point number similarity substantially biased the focal correlations. Indeed, these sources of CMV (a) led to estimates that were between 60% and 96% larger when comparing measures obtained from a common rater, versus different raters; (b) led to 39% larger estimates when a common source rated the scales using the same number, versus a different number, of anchor points; and (c) when taken together with other study-level predictors, accounted for over half of the between-study variance in the focal correlations. We discuss the implications for researchers and practitioners and provide recommendations for future research. PsycINFO Database Record (c) 2013 APA, all rights reserved

  17. Validity and reliability of a low-cost digital dynamometer for measuring isometric strength of lower limb.

    Science.gov (United States)

    Romero-Franco, Natalia; Jiménez-Reyes, Pedro; Montaño-Munuera, Juan A

    2017-11-01

    Lower limb isometric strength is a key parameter to monitor the training process or recognise muscle weakness and injury risk. However, valid and reliable methods to evaluate it often require high-cost tools. The aim of this study was to analyse the concurrent validity and reliability of a low-cost digital dynamometer for measuring isometric strength in lower limb. Eleven physically active and healthy participants performed maximal isometric strength for: flexion and extension of ankle, flexion and extension of knee, flexion, extension, adduction, abduction, internal and external rotation of hip. Data obtained by the digital dynamometer were compared with the isokinetic dynamometer to examine its concurrent validity. Data obtained by the digital dynamometer from 2 different evaluators and 2 different sessions were compared to examine its inter-rater and intra-rater reliability. Intra-class correlation (ICC) for validity was excellent in every movement (ICC > 0.9). Intra and inter-tester reliability was excellent for all the movements assessed (ICC > 0.75). The low-cost digital dynamometer demonstrated strong concurrent validity and excellent intra and inter-tester reliability for assessing isometric strength in the main lower limb movements.

  18. Inter-operability

    International Nuclear Information System (INIS)

    Plaziat, J.F.; Moulin, P.; Van Beurden, R.; Ballet, E.

    2005-01-01

    Building an internal gas market implies establishing harmonized rules for cross border trading between operators. To that effect, the European association EASEE-gas is carrying out standards and procedures, commonly called 'inter-operability'. Set up in 2002, the Association brings together all segments of the gas industry: producers, transporters, distributors, traders and shippers, suppliers, consumers and service providers. This workshop presents the latest status on issues such as barriers to gas trade in Europe, rules and procedures under preparation by EASEE-gas, and the implementation schedule of these rules by operators. This article gathers 5 presentations about this topic given at the gas conference

  19. Integration of Optical Coherence Tomography Scan Patterns to Augment Clinical Data Suite

    Science.gov (United States)

    Mason, S.; Patel, N.; Van Baalen, M.; Tarver, W.; Otto, C.; Samuels, B.; Koslovsky, M.; Schaefer, C.; Taiym, W.; Wear, M.; hide

    2018-01-01

    Vision changes identified in long duration spaceflight astronauts has led Space Medicine at NASA to adopt a more comprehensive clinical monitoring protocol. Optical Coherence Tomography (OCT) was recently implemented at NASA, including on board the International Space Station in 2013. NASA is collaborating with Heidelberg Engineering to increase the fidelity of the current OCT data set by integrating the traditional circumpapillary OCT image with radial and horizontal block images at the optic nerve head. The retinal nerve fiber layer was segmented by two experienced individuals. Intra-rater (N=4 subjects and 70 images) and inter-rater (N=4 subjects and 221 images) agreement was performed. The results of this analysis and the potential benefits will be presented.

  20. Reliability and concurrent validity of the iPhone® Compass application to measure thoracic rotation range of motion (ROM in healthy participants

    Directory of Open Access Journals (Sweden)

    James Furness

    2018-03-01

    Full Text Available Background Several water-based sports (swimming, surfing and stand up paddle boarding require adequate thoracic mobility (specifically rotation in order to perform the appropriate activity requirements. The measurement of thoracic spine rotation is problematic for clinicians due to a lack of convenient and reliable measurement techniques. More recently, smartphones have been used to quantify movement in various joints in the body; however, there appears to be a paucity of research using smartphones to assess thoracic spine movement. Therefore, the aim of this study is to determine the reliability (intra and inter rater and validity of the iPhone® app (Compass when assessing thoracic spine rotation ROM in healthy individuals. Methods A total of thirty participants were recruited for this study. Thoracic spine rotation ROM was measured using both the current clinical gold standard, a universal goniometer (UG and the Smart Phone Compass app. Intra-rater and inter-rater reliability was determined with a Intraclass Correlation Coefficient (ICC and associated 95% confidence intervals (CI. Validation of the Compass app in comparison to the UG was measured using Pearson’s correlation coefficient and levels of agreement were identified with Bland–Altman plots and 95% limits of agreement. Results Both the UG and Compass app measurements both had excellent reproducibility for intra-rater (ICC 0.94–0.98 and inter-rater reliability (ICC 0.72–0.89. However, the Compass app measurements had higher intra-rater reliability (ICC = 0.96 − 0.98; 95% CI [0.93–0.99]; vs. ICC = 0.94 − 0.98; 95% CI [0.88–0.99] and inter-rater reliability (ICC = 0.87 − 0.89; 95% CI [0.74–0.95] vs. ICC = 0.72 − 0.82; 95% CI [0.21–0.94]. A strong and significant correlation was found between the UG and the Compass app, demonstrating good concurrent validity (r = 0.835, p < 0.001. Levels of agreement between the two devices were 24.8° (LoA –9

  1. Towards an International Framework for Recommendations of Core Competencies in Nursing and Inter-Professional Informatics: The TIGER Competency Synthesis Project.

    Science.gov (United States)

    Hübner, Ursula; Shaw, Toria; Thye, Johannes; Egbert, Nicole; Marin, Heimar; Ball, Marion

    2016-01-01

    Informatics competencies of the health care workforce must meet the requirements of inter-professional process and outcome oriented provision of care. In order to help nursing education transform accordingly, the TIGER Initiative deployed an international survey, with participation from 21 countries, to evaluate and prioritise a broad list of core competencies for nurses in five domains: 1) nursing management, 2) information technology (IT) management in nursing, 3) interprofessional coordination of care, 4) quality management, and 5) clinical nursing. Informatics core competencies were found highly important for all domains. In addition, this project compiled eight national cases studies from Austria, Finland, Germany, Ireland, New Zealand, the Philippines, Portugal, and Switzerland that reflected the country specific perspective. These findings will lead us to an international framework of informatics recommendations.

  2. Utility of Angle Correction for Hemodynamic Measurements with Doppler Echocardiography.

    Science.gov (United States)

    Sigurdsson, Martin I; Eoh, Eun J; Chow, Vinca W; Waldron, Nathan H; Cleve, Jayne; Nicoara, Alina; Swaminathan, Madhav

    2018-04-06

    The routine application angle correction (AnC) in hemodynamic measurements with transesophageal echocardiography currently is not recommended but potentially could be beneficial. The authors hypothesized that AnC can be applied reliably and may change grading of aortic stenosis (AS). Retrospective analysis. Single institution, university hospital. During phase I, use of AnC was assessed in 60 consecutive patients with intraoperative transesophageal echocardiography. During phase II, 129 images from a retrospective cohort of 117 cases were used to quantify AS by mean pressure gradient. A panel of observers used custom-written software in Java to measure intra-individual and inter-individual correlation in AnC application, correlation with preoperative transthoracic echocardiography gradients, and regrading of AS after AnC. For phase I, the median AnC was 21 (16-35) degrees, and 17% of patients required no AnC. For phase II, the median AnC was 7 (0-15) degrees, and 37% of assessed images required no AnC. The mean inter-individual and intra-individual correlation for AnC was 0.50 (95% confidence interval [CI] 0.49-0.52) and 0.87 (95% CI 0.82-0.92), respectively. AnC did not improve agreement with the transthoracic echocardiography mean pressure gradient. The mean inter-rater and intra-rater agreement for grading AS severity was 0.82 (95% CI 0.81-0.83) and 0.95 (95% CI 0.91-0.95), respectively. A total of 241 (7%) AS gradings were reclassified after AnC was applied, mostly when the uncorrected mean gradient was within 5 mmHg of the severity classification cutoff. AnC can be performed with a modest inter-rater and intra-rater correlation and high degree of inter-rater and intra-rater agreement for AS severity grading. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Optimizing ADAS-Cog Worksheets: A Survey of Clinical Trial Rater s' Perceptions.

    Science.gov (United States)

    Meyer, Stephen M; Bertzos, Kristina A; Perez, Magdalena; Connor, Donald J; Schafer, Kimberly; Walter, Sarah

    2017-01-01

    The Alzheimer's Disease Assessment Scale-Cognitive subscale (ADASCog) remains the most widely used test of longitudinal cognitive functioning in Alzheimer's disease (AD) clinical trials. Unlike most neuropsychological tests, the ADAS-Cog source documentation worksheets are not uniform across clinical trials, and vary by document layout, inclusion of administration and/or scoring instructions, and documentation of subtest scoring (e.g., recording correct versus incorrect scores), among other differences. Many ADAS-Cog test administrators (raters) participate in multiple AD trials and switching between different ADAS-Cog worksheets may increase the likelihood of administration and/or scoring mistakes that lessen the reliability of the instrument. An anonymous online survey sought raters' experiences with ADAS-Cog worksheets and their opinions on the design and content of the worksheets. Results of the survey indicated preference for structure and standardization of the ADASCog worksheets, which has been considered in the development of a standard ADAS-Cog source document by the Alzheimer's Disease Cooperative Study (ADCS) Working Group. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  4. International cooperation for the development of consistent and stable transportation regulations to promote and enhance safety and security

    International Nuclear Information System (INIS)

    Strosnider, J.

    2004-01-01

    International commerce of radioactive materials crosses national boundaries, linking separate regulatory institutions with a common purpose and making it necessary for these institutions to work together in order to achieve common safety goals in a manner that does not place an undue burden on industry and commerce. Widespread and increasing use of radioactive materials across the world has led to increases in the transport of radioactive materials. The demand for consistency in the oversight of international transport has also increased to prevent unnecessary delays and costs associated with incongruent or redundant regulatory requirements by the various countries through which radioactive material is transported. The International Atomic Energy Agency (IAEA) is the authority for international regulation of transportation of radioactive materials responsible for promulgation of regulations and guidance for the establishment of acceptable methods of transportation for the international community. As such, the IAEA is seen as the focal point for consensus building between its Member States to develop consistency in transportation regulations and reviews and to ensure the safe and secure transport of radioactive material. International cooperation is also needed to ensure stability in our regulatory processes. Changes to transportation regulations should be based on an anticipated safety benefit supported by risk information and insights gained from continuing experience, evaluation, and research studies. If we keep safety as the principle basis for regulatory changes, regulatory stability will be enhanced. Finally, as we endeavour to maintain consistency and stability in our international regulations, we must be mindful of the new security challenges that lay before the international community as a result of a changing terrorist environment. Terrorism is a problem of global concern that also requires international cooperation and support, as we look for ways to

  5. Examining Rater Effects of the TGMD-2 on Children with Intellectual Disability

    Science.gov (United States)

    Kim, Youngdeok; Park, Ilhyeok; Kang, Minsoo

    2012-01-01

    The purpose of this study was to investigate rater effects on the TGMD-2 when it applied to children with intellectual disability. A total of 22 children with intellectual disabilities participated in this study. Children's performances in each of 12 subtests of the TGMD-2 were recorded via video and scored by three adapted physical activity…

  6. The reliability of language performance measurement in language sample analysis of children aged 5-6 years

    Directory of Open Access Journals (Sweden)

    Zahra Soleymani

    2014-04-01

    Full Text Available Background and Aim: The language sample analysis (LSA is more common in other languages than Persian to study language development and assess language pathology. We studied some psychometric properties of language sample analysis in this research such as content validity of written story and its pictures, test-retest reliability, and inter-rater reliability.Methods: We wrote a story based on Persian culture from Schneider’s study. The validity of written story and drawn pictures was approved by experts. To study test-retest reliability, 30 children looked at the pictures and told their own story twice with 7-10 days interval. Children generated the story themselves and tester did not give any cue about the story. Their audio-taped story was transcribed and analyzed. Sentence and word structures were detected in the analysis.Results: Mean of experts' agreement with the validity of written story was 92.28 percent. Experts scored the quality of pictures high and excellent. There was correlation between variables in sentence and word structure (p<0.05 in test-retest, except complex sentences (p=0.137. The agreement rate was 97.1 percent in inter-rater reliability assessment of transcription. The results of inter-rater reliability of language analysis showed that correlation coefficients were significant.Conclusion: The results confirmed that the tool was valid for eliciting language sample. The consistency of language performance in repeated measurement varied from mild to high in language sample analysis approach.

  7. The letter knowledge assessment tool.

    Science.gov (United States)

    Pedro, Cassandra; Lousada, Marisa; Pereira, Rita; Hall, Andreia; Jesus, Luis M T

    2017-10-10

    There is a need to develop letter knowledge assessment tools to characterise the letter knowledge in Portuguese pre-schoolers and to compare it with pre-schoolers from other countries, but there are no tools for this purpose in Portugal. The aim of this paper is to describe the development and validation procedures of the Prova de Avaliação de Competências de Pré-Literacia (PACPL), which assesses letter knowledge. This study includes data that has been gathered in two phases: pilot and main study. In the pilot study, an expert panel of six speech and language pathologists analysed the instrument. Children (n = 216) aged 5;0-7;11 participated in the main study that reports data related to the psychometric characteristics of the PACPL. Content validity, internal consistency, reliability and contributing factors to performance were examined statistically. A modified Bland-Altman method revealed good agreement amongst evaluators. The main study showed that the PACPL has a very good internal consistency and high inter-rater (96.2% of agreement and a Cohen's k value of 0.92) and intra-rater (95.6% of agreement and a Cohen's k value of 0.91) agreement. Construct validity of the PCAPL was also assured (Cronbach's α of 0.982). Significant differences were found between age groups with children increasing their letter knowledge with age. In addition, they were better at identifying than at producing both letter names and letter sounds. The PACPL is a valid and reliable instrument to assess letter knowledge in Portuguese children.

  8. Internal Consistency of Performance Evaluations as a Function of Music Expertise and Excerpt Familiarity

    Science.gov (United States)

    Kinney, Daryl W.

    2009-01-01

    The purpose of this study was to examine the effects of music experience and excerpt familiarity on the internal consistency of performance evaluations. Participants included nonmusic majors who had not participated in high school music ensembles, nonmusic majors who had participated in high school music ensembles, music majors, and experts…

  9. Intra-organisational accounting during negotiation processes for inter-organisational control

    DEFF Research Database (Denmark)

    Jakobsen, Morten

    . The study concludes that an important role of management accounting is to reveal the intra-organisational cost consequences of proposals made by suppliers during negotiation processes. Thereby cost information becomes an integrated part of the counter-proposals generated and actively used during......To date the literature on management and management accounting within inter-organisational relationships has mainly focussed on managing the interface between the supplier and the buyer. In contrast to most previous research, this study examines the internal practices of a company engaged in inter......-organisational relationships. It addresses the question of how intra-organisational management accounting practices affect the ability to conduct inter-organisational relationships. A qualitative case study is used to gather information from an electronics company. The company enters its inter-organisational relationships...

  10. Consistência interna da versão em português do Mini-Inventário de Fobia Social (Mini-SPIN Internal consistency of the Portuguese version of the Mini-Social Phobia Inventory (Mini-SPIN

    Directory of Open Access Journals (Sweden)

    Gustavo J. Fonseca D'El Rey

    2007-01-01

    Full Text Available CONTEXTO: A fobia social é um grave transtorno de ansiedade que traz incapacitação e sofrimento. OBJETIVOS: Investigar a consistência interna da versão em português do Mini-Inventário de Fobia Social (Mini-SPIN. MÉTODOS: Foi realizado um estudo da consistência interna do Mini-SPIN em uma amostra de 206 estudantes universitários da cidade de São Paulo, SP. RESULTADOS: A consistência interna do instrumento, analisada pelo coeficiente alfa de Cronbach, foi de 0,81. CONCLUSÕES: Esses achados permitiram concluir que a versão em português do Mini-SPIN exibiu resultados de boa consistência interna, semelhantes aos da versão original em inglês.BACKGROUND: Social phobia is a severe anxiety disorder that brings disability and distress. OBJECTIVES: To investigate the internal consistency of the Portuguese version of the Mini-Social Phobia Inventory (Mini-SPIN. METHODS: We conducted a study of internal consistency of the Mini-SPIN in a sample of 206 college students of the city of São Paulo, SP. RESULTS: The internal consistency of the instrument, analyzed by Cronbach's alpha coefficient, was 0.81. CONCLUSIONS: These findings suggest that the Portuguese version of the Mini-SPIN has a good internal consistency, similar to those obtained with the original English version.

  11. Inter-Cultural Communication in Student Research

    DEFF Research Database (Denmark)

    Hjaltadóttir, Rannveig Edda

    This article describes a project undertaken at the University of Southern Denmark designed to support active group work and inter-cultural communication between international students. The project is based on using group work and cooperative learning principles to do student research, therefore...... challenging the students to solve problems as a group. The main aim of the research is to investigate the possible effects of using integrated student research and group work using cooperative learning methods to develop international communication skills of students in multi-cultural higher education courses....

  12. Report on inter-noise 99; Inter-noise 99 sanka hokok

    Energy Technology Data Exchange (ETDEWEB)

    Koike, H. [Japan Automobile Research Institute Inc., Tsukuba (Japan)

    2000-04-01

    Inter-Noise (International Congress on Noise Control Engineering) is a society on noise/vibration and the control technology. Inter-Noise 99 was held on December 6, 7 and 8, 1999, at Fort Lauderdale, Florida, the U.S. The theme was Noise Control in the New Millennium. The number of the participants registered was 555 (151 from the U.S., 89 from Japan, 248 from European countries, and 69 from Asian/other countries). Dr. Harold Marshall gave a keynote lecture titled Noise Control by Design in the 21st Century - An Architectural Acoustic Perspective. From a standpoint of architectural acoustics, he stated the perspective, subjects, and course of the technical development pertaining to technologies needed in the 21st century. The papers read are mostly from the following fields: measuring technology, military exercise noise, modeling, forecast and simulation, aerodynamic/underwater sound, etc. In the session on the tire noise where the author read a paper, 14 papers were read. The number of the papers read was more than that in 1998, probably influenced by the tire noise regulation in Europe and Japan. (translated by NEDO)

  13. Translation, reliability, and clinical utility of the Melbourne Assessment 2.

    Science.gov (United States)

    Gerber, Corinna N; Plebani, Anael; Labruyère, Rob

    2017-10-12

    The aims were to (i) provide a German translation of the Melbourne Assessment 2 (MA2), a quantitative test to measure unilateral upper limb function in children with neurological disabilities and (ii) to evaluate its reliability and aspects of clinical utility. After its translation into German and approval of the back translation by the original authors, the MA2 was performed and videotaped twice with 30 children with neuromotor disorders. For each participant, two raters scored the video of the first test for inter-rater reliability. To determine test-retest reliability, one rater additionally scored the video of the second test while the other rater repeated the scoring of the first video to evaluate intra-rater reliability. Time needed for rater training, test administration, and scoring was recorded. The four subscale scores showed excellent intra-, inter-rater, and test-retest reliability with intraclass correlation coefficients of 0.90-1.00 (95%-confidence intervals 0.78-1.00). Score items revealed substantial to almost perfect intra-rater reliability (weighted kappa k w  = 0.66-1.00) for the more affected side. Score item inter-rater and test-retest reliability of the same extremity were, with one exception, moderate to almost perfect (k w  = 0.42-0.97; k w  = 0.40-0.89). Furthermore, the MA2 was feasible and acceptable for patients and clinicians. The MA2 showed excellent subscale and moderate to almost perfect score item reliability. Implications for Rehabilitation There is a lack of high-quality studies about psychometric properties of upper limb measurement tools in the neuropediatric population. The Melbourne Assessment 2 is a promising tool for reliable measurement of unilateral upper limb movement quality in the neuropediatric population. The Melbourne Assessment 2 is acceptable and practicable to therapists and patients for routine use in clinical care.

  14. Measuring theory of mind in children. Psychometric properties of the ToM Storybooks.

    Science.gov (United States)

    Blijd-Hoogewys, E M A; van Geert, P L C; Serra, M; Minderaa, R B

    2008-11-01

    Although research on Theory-of-Mind (ToM) is often based on single task measurements, more comprehensive instruments result in a better understanding of ToM development. The ToM Storybooks is a new instrument measuring basic ToM-functioning and associated aspects. There are 34 tasks, tapping various emotions, beliefs, desires and mental-physical distinctions. Four studies on the validity and reliability of the test are presented, in typically developing children (n = 324, 3-12 years) and children with PDD-NOS (n = 30). The ToM Storybooks have good psychometric qualities. A component analysis reveals five components corresponding with the underlying theoretical constructs. The internal consistency, test-retest reliability, inter-rater reliability, construct validity and convergent validity are good. The ToM Storybooks can be used in research as well as in clinical settings.

  15. Evaluating Sensory Processing in Fragile X Syndrome: Psychometric Analysis of the Brain Body Center Sensory Scales (BBCSS).

    Science.gov (United States)

    Kolacz, Jacek; Raspa, Melissa; Heilman, Keri J; Porges, Stephen W

    2018-06-01

    Individuals with fragile X syndrome (FXS), especially those co-diagnosed with autism spectrum disorder (ASD), face many sensory processing challenges. However, sensory processing measures informed by neurophysiology are lacking. This paper describes the development and psychometric properties of a parent/caregiver report, the Brain-Body Center Sensory Scales (BBCSS), based on Polyvagal Theory. Parents/guardians reported on 333 individuals with FXS, 41% with ASD features. Factor structure using a split-sample exploratory-confirmatory design conformed to neurophysiological predictions. Internal consistency, test-retest, and inter-rater reliability were good to excellent. BBCSS subscales converged with the Sensory Profile and Sensory Experiences Questionnaire. However, data also suggest that BBCSS subscales reflect unique features related to sensory processing. Individuals with FXS and ASD features displayed more sensory challenges on most subscales.

  16. Comparison of Danish dichotomous and BI-RADS classifications of mammographic density

    DEFF Research Database (Denmark)

    Hodge, Rebecca; Hellmann, Sophie Sell; von Euler-Chelpin, My

    2014-01-01

    BACKGROUND: In the Copenhagen mammography screening program from 1991 to 2001, mammographic density was classified either as fatty or mixed/dense. This dichotomous mammographic density classification system is unique internationally, and has not been validated before. PURPOSE: To compare the Danish...... dichotomous mammographic density classification system from 1991 to 2001 with the density BI-RADS classifications, in an attempt to validate the Danish classification system. MATERIAL AND METHODS: The study sample consisted of 120 mammograms taken in Copenhagen in 1991-2001, which tested false positive......, and which were in 2012 re-assessed and classified according to the BI-RADS classification system. We calculated inter-rater agreement between the Danish dichotomous mammographic classification as fatty or mixed/dense and the four-level BI-RADS classification by the linear weighted Kappa statistic. RESULTS...

  17. [Quality Assurance in Sociomedical Evaluation by Peer Review: A Pilot Project of the German Statutory Pension Insurance].

    Science.gov (United States)

    Strahl, A; Gerlich, C; Wolf, H-D; Gehrke, J; Müller-Garnn, A; Vogel, H

    2016-03-01

    The sociomedical evaluation by the German Pension Insurance serves the purpose of determining entitlement to disability pensions. A quality assurance concept for the sociomedical evaluation was developed, which is based on a peer Review process. Peer review is an established process of external quality assurance in health care. The review is based on a hierarchically constructed manual that was evaluated in this pilot project. The database consists of 260 medical reports for disability pension of 12 pension insurance agencies. 771 reviews from 19 peers were included in the evaluation of the inter-rater reliability. Kendall's coefficient of concordance W for more than 2 raters is used as primary measure of inter-rater reliability. Reliability appeared to be heterogeneous. Kendalls W varies for the particular criteria from 0.09 to 0.88 and reached for primary criterion reproducibility a value of 0.37. The reliability of the manual seemed acceptable in the context of existing research data and is in line with existing peer review research outcomes. Nevertheless, the concordance is limited and requires optimisation. Starting points for improvement can be seen in a systematic training and regular user meetings of the peers involved. © Georg Thieme Verlag KG Stuttgart · New York.

  18. Reliability of the imaging software in the preoperative planning of the open-wedge high tibial osteotomy.

    Science.gov (United States)

    Lee, Yong Seuk; Kim, Min Kyu; Byun, Hae Won; Kim, Sang Bum; Kim, Jin Goo

    2015-03-01

    The purpose of this study was to verify a recently developed picture-archiving and communications system-photoshop method by comparing reliabilities between real-size paper template and the PACS-photoshop methods in preoperative planning of open-wedge high tibial osteotomy. A prospective case series was conducted, including patients with medial osteoarthritis undergoing open-wedge high tibial osteotomy. In the preoperative planning, the picture-archiving and communications system-photoshop method and real-size paper template method were used simultaneously in all patients. Preoperative hip-knee-ankle angle, height, and angle of the osteotomy were evaluated. The reliability of this newly devised method was evaluated, and the consistency between the two methods was also evaluated using intra-class correlation coefficient. Using the picture-archiving and communications system-photoshop method, the mean correction angle and height of osteotomy gap of rater-1 were 11.7° ± 3.6° and 10.7 ± 3.6 mm, respectively. The mean correction angle and height of osteotomy gap of rater-2 were 12.0 ± 2.6 and 10.8 ± 3.6, respectively. The inter- and intra-rater reliabilities of the correction angle were 0.956 ~ 0.979 and 0.980 ~ 0.992, respectively. The inter- and intra-rater reliabilities of the height of the osteotomy gap were 0.968 ~ 0.985 and 0.971 ~ 0.994, respectively (p photoshop method, mean values of the correction angle and height of the osteotomy gap were 11.9° ± 3.6° and 10.8 ± 3.6 mm, respectively. Consistency between the two methods by comparing the means of the correction angle and the height of the osteotomy gap were 0.985 and 0.985, respectively (p photoshop method enables direct measurement of the height of the osteotomy gap with high reliability.

  19. International Endometrial Tumor Analysis (IETA) terminology in women with postmenopausal bleeding and sonographic endometrial thickness ≥ 4.5 mm: agreement and reliability study.

    Science.gov (United States)

    Sladkevicius, P; Installé, A; Van Den Bosch, T; Timmerman, D; Benacerraf, B; Jokubkiene, L; Di Legge, A; Votino, A; Zannoni, L; De Moor, B; De Cock, B; Van Calster, B; Valentin, L

    2018-02-01

    To estimate intra- and interrater agreement and reliability with regard to describing ultrasound images of the endometrium using the International Endometrial Tumor Analysis (IETA) terminology. Four expert and four non-expert raters assessed videoclips of transvaginal ultrasound examinations of the endometrium obtained from 99 women with postmenopausal bleeding and sonographic endometrial thickness ≥ 4.5 mm but without fluid in the uterine cavity. The following features were rated: endometrial echogenicity, endometrial midline, bright edge, endometrial-myometrial junction, color score, vascular pattern, irregularly branching vessels and color splashes. The color content of the endometrial scan was estimated using a visual analog scale graded from 0 to 100. To estimate intrarater agreement and reliability, the same videoclips were assessed twice with a minimum of 2 months' interval. The raters were blinded to their own results and to those of the other raters. Interrater differences in the described prevalence of most IETA variables were substantial, and some variable categories were observed rarely. Specific agreement was poor for variables with many categories. For binary variables, specific agreement was better for absence than for presence of a category. For variables with more than two outcome categories, specific agreement for expert and non-expert raters was best for not-defined endometrial midline (93% and 96%), regular endometrial-myometrial junction (72% and 70%) and three-layer endometrial pattern (67% and 56%). The grayscale ultrasound variable with the best reliability was uniform vs non-uniform echogenicity (multirater kappa (κ), 0.55 for expert and 0.52 for non-expert raters), and the variables with the lowest reliability were appearance of the endometrial-myometrial junction (κ, 0.25 and 0.16) and the nine-category endometrial echogenicity variable (κ, 0.29 and 0.28). The most reliable color Doppler variable was color score (mean weighted

  20. Hierarchical fault diagnosis for discrete-event systems under local consistency

    NARCIS (Netherlands)

    Su, Rong; Wonham, W.M.

    2006-01-01

    In previous work the authors proposed a distributed diagnosis approach consisting of two phases—preliminary diagnosis in each local diagnoser and inter-diagnoser communication. The objective of communication is to achieve either global or local consistency among local diagnoses, where global

  1. Descapitalización de las tasas efectivas para calcular el interés simple

    OpenAIRE

    Avelino Sánchez, Esteban Marino; Cerna Maguiña, Héctor Félix

    2014-01-01

    La tasa de interés nominal, se utiliza normalmente para calcular interés simple e interés compuesto (bajo ciertos supuestos o condiciones). En sentido contrario, utilizar la tasa efectiva para calcular el interés simple no es usual. Sin embargo, en el Perú se utiliza para calcular interés legal laboral; entonces el problema consiste en descapitalizar la tasa de interés legal efectiva, como su factor acumulado, a través de experimentos de cálculo en Excel, observar las propiedades de la potenc...

  2. Internal consistency, reliability, and temporal stability of the Oxford Happiness Questionnaire short-form: Test-retest data over two weeks

    OpenAIRE

    MCGUCKIN, CONOR

    2006-01-01

    PUBLISHED The Oxford Happiness Questionnaire short-form is a recently developed eight-item measure of happiness. This study evaluated the internal consistency reliability and test-retest reliability of the Oxford Happiness Questionnaire short-form among 55 Northern Irish undergraduate university students who completed the measure on two occasions separated by two weeks. Internal consistency of the measure on both occasions was satisfactory at both Time 1 (alpha = .62) and Time 2 (alpha = ....

  3. SU-F-J-103: Assessment of Liver Tumor Contrast for Radiation Therapy: Inter-Patient and Inter-Sequence Variability

    Energy Technology Data Exchange (ETDEWEB)

    Moore, B [Duke University Medical Physics Graduate Program, Durham, NC (United States); Yin, F; Cai, J [Duke University Medical Physics Graduate Program, Durham, NC (United States); Duke University Medical Center, Radiation Oncology, Durham, NC (United States); Czito, B; Palta, M [Duke University Medical Center, Radiation Oncology, Durham, NC (United States)

    2016-06-15

    Purpose: To determine the variation in tumor contrast between different MRI sequences and between patients for the purpose of MRI-based treatment planning. Methods: Multiple MRI scans of 11 patients with cancer(s) in the liver were included in this IRB-approved study. Imaging sequences consisted of T1W MRI, Contrast-Enhanced T1W MRI, T2W MRI, and T2*/T1W MRI. MRI images were acquired on a 1.5T GE Signa scanner with a four-channel torso coil. We calculated the tumor-to-tissue contrast to noise ratio (CNR) for each MR sequence by contouring the tumor and a region of interest (ROI) in a homogeneous region of the liver using the Eclipse treatment planning software. CNR was calculated (I-Tum-I-ROI)/SD-ROI, where I-Tum and I-ROI are the mean values of the tumor and the ROI respectively, and SD-ROI is the standard deviation of the ROI. The same tumor and ROI structures were used in all measurements for different MR sequences. Inter-patient Coefficient of variation (CV), and inter-sequence CV was determined. In addition, mean and standard deviation of CNR were calculated and compared between different MR sequences. Results: Our preliminary results showed large inter-patient CV (range: 37.7% to 88%) and inter-sequence CV (range 5.3% to 104.9%) of liver tumor CNR, indicating great variations in tumor CNR between MR sequences and between patients. Tumor CNR was found to be largest in CE-T1W (8.5±7.5), followed by T2W (4.2±2.4), T1W (3.4±2.2), and T2*/T1W (1.7±0.6) MR scans. The inter-patient CV of tumor CNR was also the largest in CE-T1W (88%), followed by T1W (64.3%), T1W (56.2%), and T2*/T1W (37.7) MR scans. Conclusion: Large inter-sequence and inter-patient variations were observed in liver tumor CNR. CE-T1W MR images on average provided the best tumor CNR. Efforts are needed to optimize tumor contrast and its consistency for MRI-based treatment planning of cancer in the liver. This project is supported by NIH grant: 1R21CA165384.

  4. SU-F-J-103: Assessment of Liver Tumor Contrast for Radiation Therapy: Inter-Patient and Inter-Sequence Variability

    International Nuclear Information System (INIS)

    Moore, B; Yin, F; Cai, J; Czito, B; Palta, M

    2016-01-01

    Purpose: To determine the variation in tumor contrast between different MRI sequences and between patients for the purpose of MRI-based treatment planning. Methods: Multiple MRI scans of 11 patients with cancer(s) in the liver were included in this IRB-approved study. Imaging sequences consisted of T1W MRI, Contrast-Enhanced T1W MRI, T2W MRI, and T2*/T1W MRI. MRI images were acquired on a 1.5T GE Signa scanner with a four-channel torso coil. We calculated the tumor-to-tissue contrast to noise ratio (CNR) for each MR sequence by contouring the tumor and a region of interest (ROI) in a homogeneous region of the liver using the Eclipse treatment planning software. CNR was calculated (I_Tum-I_ROI)/SD_ROI, where I_Tum and I_ROI are the mean values of the tumor and the ROI respectively, and SD_ROI is the standard deviation of the ROI. The same tumor and ROI structures were used in all measurements for different MR sequences. Inter-patient Coefficient of variation (CV), and inter-sequence CV was determined. In addition, mean and standard deviation of CNR were calculated and compared between different MR sequences. Results: Our preliminary results showed large inter-patient CV (range: 37.7% to 88%) and inter-sequence CV (range 5.3% to 104.9%) of liver tumor CNR, indicating great variations in tumor CNR between MR sequences and between patients. Tumor CNR was found to be largest in CE-T1W (8.5±7.5), followed by T2W (4.2±2.4), T1W (3.4±2.2), and T2*/T1W (1.7±0.6) MR scans. The inter-patient CV of tumor CNR was also the largest in CE-T1W (88%), followed by T1W (64.3%), T1W (56.2%), and T2*/T1W (37.7) MR scans. Conclusion: Large inter-sequence and inter-patient variations were observed in liver tumor CNR. CE-T1W MR images on average provided the best tumor CNR. Efforts are needed to optimize tumor contrast and its consistency for MRI-based treatment planning of cancer in the liver. This project is supported by NIH grant: 1R21CA165384

  5. Validation of the spiritual distress assessment tool in older hospitalized patients

    Directory of Open Access Journals (Sweden)

    Monod Stefanie

    2012-03-01

    Full Text Available Abstract Background The Spiritual Distress Assessment Tool (SDAT is a 5-item instrument developed to assess unmet spiritual needs in hospitalized elderly patients and to determine the presence of spiritual distress. The objective of this study was to investigate the SDAT psychometric properties. Methods This cross-sectional study was performed in a Geriatric Rehabilitation Unit. Patients (N = 203, aged 65 years and over with Mini Mental State Exam score ≥ 20, were consecutively enrolled over a 6-month period. Data on health, functional, cognitive, affective and spiritual status were collected upon admission. Interviews using the SDAT (score from 0 to 15, higher scores indicating higher distress were conducted by a trained chaplain. Factor analysis, measures of internal consistency (inter-item and item-to-total correlations, Cronbach α, and reliability (intra-rater and inter-rater were performed. Criterion-related validity was assessed using the Functional Assessment of Chronic Illness Therapy-Spiritual well-being (FACIT-Sp and the question "Are you at peace?" as criterion-standard. Concurrent and predictive validity were assessed using the Geriatric Depression Scale (GDS, occurrence of a family meeting, hospital length of stay (LOS and destination at discharge. Results SDAT scores ranged from 1 to 11 (mean 5.6 ± 2.4. Overall, 65.0% (132/203 of the patients reported some spiritual distress on SDAT total score and 22.2% (45/203 reported at least one severe unmet spiritual need. A two-factor solution explained 60% of the variance. Inter-item correlations ranged from 0.11 to 0.41 (eight out of ten with P Conclusions SDAT has acceptable psychometrics properties and appears to be a valid and reliable instrument to assess spiritual distress in elderly hospitalized patients.

  6. Water baths for farmed mink: intra-individual consistency and inter-individual variation in swimming behaviour, and effects on stereotyped behaviour

    Directory of Open Access Journals (Sweden)

    J. MONONEN

    2008-12-01

    Full Text Available Swimming behaviour and effects of water baths on stereotyped behaviour in farmed mink (Mustela vison were studied in three experiments. The singly-housed mink had access from their home cages to extra cages with 20.5 litre water baths. Two short-term experiments aimed to investigate how quickly adult and juvenile mink start using and how consistently they use water baths over 10 days, and whether the extent of the use correlates between dams and their females kits. A four-month experiment was designed to compare the development of stereotyped behaviour in juvenile mink housed with and without swimming opportunity. The behavioural analyses were based on several 24-hour video recordings carried out in all three experiments. There were obvious inter-individual differences and intra-individual consistency in swimming frequency and time. Farmed mink’s motivation to swim can be assessed in short-term experiments, and measurement of water losses from the swimming baths and use of instantaneous sampling with 10 min sampling intervals provide quite reliable measures of the amount of swimming. The bath use of the juveniles correlated with that of their dams, indicating that an individual mink’s eagerness to swim may have a genetic component. The lower amount of stereotyped behaviour in mink housed with water baths indicates that long-term access to baths may alleviate frustration in singly-housed juvenile farmed mink.;

  7. COMPETITION AND FACILITATION EFFECTS OF DIFFERENTIAL INTRA-AND INTER-ROW WEED MANAGEMENT IN SUGARCANE

    OpenAIRE

    Martin , J; Chabalier , M; Letourmy , P; Chopart , J.-L; Arhiman , E; Marion , D

    2013-01-01

    International audience; Differential intra-and inter-row weed management can be a mean to reduce herbicide use in sugarcane. In 2011, a field experiment was conducted in La Reunion Island to assess inter-row weed competition. Four inter-row weed competition treatments for a duration of one (T1), two (T2), three (T3) and four (T4) months after planting were compared in a randomized complete block design with 5 replicates; treatment plots were paired with non-weeded inter-row control plots. All...

  8. Guidelines for Inter-Enterprise Management (IEM), GLOBEMEN Deliverable D23

    DEFF Research Database (Denmark)

    Tølle, Martin; Vesterager, Johan

    2002-01-01

    This document is a deliverable of Work package 2 of the IMS Globemen (GMN) project: D23 Guidelines for Inter-Enterprise Management (IEM). IMS Globemen is an inter-regional project aiming to develop methods, tools and architectures to support inter-enterprise operations in one-of-kind industries......-Project, the developed solution for Inter-Enterprise Management. The structure of the deliverable is as follows: - Chapter 1 introduces the guidelines and outlines the structure of the deliverable - Chapter 2 defines key terms along with a list of acronyms used in the deliverable - Chapter 3 gives a general introduction...... for inter-enterprise management (IEM). - Chapter 5 contains the actual Guidelines The chapter contains guidelines for how to prepare enterprise network in being able to set up and manage virtual enterprises. The section consists of a set of activities an enterprise should/could consider when preparing...

  9. Factor Structure, Internal Consistency, and Screening Sensitivity of the GARS-2 in a Developmental Disabilities Sample

    Directory of Open Access Journals (Sweden)

    Martin A. Volker

    2016-01-01

    Full Text Available The Gilliam Autism Rating Scale-Second Edition (GARS-2 is a widely used screening instrument that assists in the identification and diagnosis of autism. The purpose of this study was to examine the factor structure, internal consistency, and screening sensitivity of the GARS-2 using ratings from special education teaching staff for a sample of 240 individuals with autism or other significant developmental disabilities. Exploratory factor analysis yielded a correlated three-factor solution similar to that found in 2005 by Lecavalier for the original GARS. Though the three factors appeared to be reasonably consistent with the intended constructs of the three GARS-2 subscales, the analysis indicated that more than a third of the GARS-2 items were assigned to the wrong subscale. Internal consistency estimates met or exceeded standards for screening and were generally higher than those in previous studies. Screening sensitivity was .65 and specificity was .81 for the Autism Index using a cut score of 85. Based on these findings, recommendations are made for instrument revision.

  10. SU-C-210-07: Assessment of Intra-/Inter-Fractional Internal Tumor and Organ Movement in Radiotherapy of Head and Neck Cancer Using On-Board Cine MRI

    Energy Technology Data Exchange (ETDEWEB)

    Chen, H; Dolly, S; Anastasio, M; Fischer-Valuck, B; Kashani, R; Green, O; Rodriguez, V; Mutic, S; Gay, H; Thorstad, W; Li, H [Washington University School of Medicine, Saint Louis, MO (United States); Victoria, J; Dempsey, J [ViewRay Incorporated, Oakwood Village, OH (United States); Ruan, S [University of Rouen, QuantIF - EA 4108 LITIS, Rouen (France); Low, D [University of California Los Angeles, Los Angeles, CA (United States)

    2015-06-15

    Purpose: Head and neck (H&N) internal organ motion has previously been determined with low frequency and temporary nature based on population-based pre- and post-treatment studies. Using immobilization masks and adding a 4–6 mm planning-tumor-volume margin, geometric uncertainties of patients are routinely considered clinically inconsequential in H&N radiotherapy. Using the first commercially-available MR-IGRT system, we conducted the first quantitative study on inter-patient, intra- and inter-fractional H&N internal motion patterns to evaluate the necessity of individualized asymmetric internal margins. Methods: Ninety cine sagittal MR image sequences were acquired during the entire treatment course (6–7 weeks) of three H&N cancer patients using the ViewRay™ MR-IGRT system. The images were 5 mm thick and acquired at 4 frames/per second. One of the patients had a tracheostomy tube. The cross-sectional H&N airway (nasopharynx, oropharynx, and laryngopharynx portions) movement was analyzed comprehensively using in-house developed motion detection software. Results: Large inter-patient variations of swallowing frequency (0–1 times/per fraction), swallowing duration (1–3 seconds), and pharyngeal cross-sectional area (238–2516 mm2) were observed. Extensive pharyngeal motion occurred during swallowing, while nonzero and periodic change of airway geometry was observed in resting. For patient 1 with tracheostomy tube replacement, 30.3%, 30.0%, 48.7% and 0.3% of total frames showed ≥ 4 mm displacements in the anterior, posterior, inferior, and superior airway boundaries, respectively; similarly, (5.7%, 0.0%, 0.0%, 0.3%) and (23.3%, 0.0%, 35.7%, 1.7%) occurred for patients 2 and 3. Area overlapping coefficients with respect to the first frame were 76.3+/−6.4%, 90.3+/−0.6%, and 92.3+/−1.2% for the three patients, respectively. Conclusion: Both the resting and swallowing motions varied in frequency and amplitude among the patients and across fractions of a

  11. SU-C-210-07: Assessment of Intra-/Inter-Fractional Internal Tumor and Organ Movement in Radiotherapy of Head and Neck Cancer Using On-Board Cine MRI

    International Nuclear Information System (INIS)

    Chen, H; Dolly, S; Anastasio, M; Fischer-Valuck, B; Kashani, R; Green, O; Rodriguez, V; Mutic, S; Gay, H; Thorstad, W; Li, H; Victoria, J; Dempsey, J; Ruan, S; Low, D

    2015-01-01

    Purpose: Head and neck (H&N) internal organ motion has previously been determined with low frequency and temporary nature based on population-based pre- and post-treatment studies. Using immobilization masks and adding a 4–6 mm planning-tumor-volume margin, geometric uncertainties of patients are routinely considered clinically inconsequential in H&N radiotherapy. Using the first commercially-available MR-IGRT system, we conducted the first quantitative study on inter-patient, intra- and inter-fractional H&N internal motion patterns to evaluate the necessity of individualized asymmetric internal margins. Methods: Ninety cine sagittal MR image sequences were acquired during the entire treatment course (6–7 weeks) of three H&N cancer patients using the ViewRay™ MR-IGRT system. The images were 5 mm thick and acquired at 4 frames/per second. One of the patients had a tracheostomy tube. The cross-sectional H&N airway (nasopharynx, oropharynx, and laryngopharynx portions) movement was analyzed comprehensively using in-house developed motion detection software. Results: Large inter-patient variations of swallowing frequency (0–1 times/per fraction), swallowing duration (1–3 seconds), and pharyngeal cross-sectional area (238–2516 mm2) were observed. Extensive pharyngeal motion occurred during swallowing, while nonzero and periodic change of airway geometry was observed in resting. For patient 1 with tracheostomy tube replacement, 30.3%, 30.0%, 48.7% and 0.3% of total frames showed ≥ 4 mm displacements in the anterior, posterior, inferior, and superior airway boundaries, respectively; similarly, (5.7%, 0.0%, 0.0%, 0.3%) and (23.3%, 0.0%, 35.7%, 1.7%) occurred for patients 2 and 3. Area overlapping coefficients with respect to the first frame were 76.3+/−6.4%, 90.3+/−0.6%, and 92.3+/−1.2% for the three patients, respectively. Conclusion: Both the resting and swallowing motions varied in frequency and amplitude among the patients and across fractions of a

  12. Critical thinking evaluation in reflective writing: Development and testing of Carter Assessment of Critical Thinking in Midwifery (Reflection).

    Science.gov (United States)

    Carter, Amanda G; Creedy, Debra K; Sidebotham, Mary

    2017-11-01

    develop and test a tool designed for use by academics to evaluate pre-registration midwifery students' critical thinking skills in reflective writing. a descriptive cohort design was used. a random sample (n = 100) of archived student reflective writings based on a clinical event or experience during 2014 and 2015. a staged model for tool development was used to develop a fifteen item scale involving item generation; mapping of draft items to critical thinking concepts and expert review to test content validity; inter-rater reliability testing; pilot testing of the tool on 100 reflective writings; and psychometric testing. Item scores were analysed for mean, range and standard deviation. Internal reliability, content and construct validity were assessed. expert review of the tool revealed a high content validity index score of 0.98. Using two independent raters to establish inter-rater reliability, good absolute agreement of 72% was achieved with a Kappa coefficient K = 0.43 (pcritical thinking in reflective writing. Validation with large diverse samples is warranted. reflective practice is a key learning and teaching strategy in undergraduate Bachelor of Midwifery programmes and essential for safe, competent practice. There is the potential to enhance critical thinking development by assessingreflective writing with the CACTiM (reflection) tool to provide formative and summative feedback to students and inform teaching strategies. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.

  13. Reliability of the CMT neuropathy score (second version) in Charcot-Marie-Tooth disease.

    LENUS (Irish Health Repository)

    Murphy, Sinéad M

    2011-09-01

    The Charcot-Marie-Tooth neuropathy score (CMTNS) is a reliable and valid composite score comprising symptoms, signs, and neurophysiological tests, which has been used in natural history studies of CMT1A and CMT1X and as an outcome measure in treatment trials of CMT1A. Following an international workshop on outcome measures in Charcot-Marie-Tooth disease (CMT), the CMTNS was modified to attempt to reduce floor and ceiling effects and to standardize patient assessment, aiming to improve its sensitivity for detecting change over time and the effect of an intervention. After agreeing on the modifications made to the CMTNS (CMTNS2), three examiners evaluated 16 patients to determine inter-rater reliability; one examiner evaluated 18 patients twice within 8 weeks to determine intra-rater reliability. Three examiners evaluated 63 patients using the CMTNS and the CMTNS2 to determine how the modifications altered scoring. For inter- and intra-rater reliability, intra-class correlation coefficients (ICCs) were ≥0.96 for the CMT symptom score and the CMT examination score. There were small but significant differences in some of the individual components of the CMTNS compared with the CMTNS2, mainly in the components that had been modified the most. A longitudinal study is in progress to determine whether the CMTNS2 is more sensitive than the CMTNS for detecting change over time.

  14. Dental students consistency in applying the ICDAS system within paediatric dentistry.

    Science.gov (United States)

    Foley, J I

    2012-12-01

    To examine dental students' consistency in utilising the International Caries Detection and Assessment System (ICDAS) one and three months after training. A prospective study. All clinical dental students (Year Two: BDS2; Year Three: BDS3; Year Four: BDS4) as part of their education in Paediatric Dentistry at Aberdeen Dental School (n = 56) received baseline training by two "gold-standard" examiners and were advised to complete the 90-minute ICDAS e-learning program. Study One: One month later, the occlusal surface of 40 extracted primary and permanent molar teeth were examined and assigned both a caries (0-6 scale) and restorative code (0-9 scale). Study Two: The same teeth were examined three months later. Kappa statistics were used to determine inter- and intra-examiner reliability at baseline and after three months. In total, 31 students (BDS2: n = 9; BDS3: n = 8; BDS4: n = 14) completed both examinations. The inter-examiner reliability kappa scores for restoration codes for Study One and Study Two were: BDS2: 0.47 and 0.38; BDS3: 0.61 and 0.52 and BDS4: 0.56 and 0.52. The caries scores for the two studies were: BDS2: 0.31 and 0.20; BDS3: 0.45 and 0.32 and BDS4: 0.35 and 0.34. The intra-examiner reliability range for restoration codes were: BDS2: 0.20 to 0.55; BDS3: 0.34 to 0.72 and BDS4: 0.28 to 0.80. The intra-examiner reliability range for caries codes were: BDS2: 0.35 to 0.62; BDS3: 0.22 to 0.53 and BDS4: 0.22 to 0.65. The consistency of ICDAS codes varied between students and also, between year groups. In general, consistency was greater for restoration codes.

  15. An inter-laboratory comparison of Si isotope reference materials

    NARCIS (Netherlands)

    Reynolds, B.C.; Aggarwal, J.; André, L.; Baxter, B.; Beucher, C.; Brzezinski, M.A.; Engström, E.; Georg, R.B.; Land, M.; Leng, M.J.; Opfergelt, S.; Rodushkin, I.; Sloane, H.J.; Van den Boorn, S.H.J.M.; Vroon, P.Z.; Cardinal, D.

    2007-01-01

    Three Si isotope materials have been used for an inter-laboratory comparison exercise to ensure reproducibility between international laboratories investigating natural Si isotope variations using a variety of chemical preparation methods and mass spectrometric techniques. These proposed standard

  16. Ageing midface: The impact of surgeon's experience on the consistency in the assessment and proposed management.

    Science.gov (United States)

    Hazrati, Ali; Izadpanah, Ali; Zadeh, Teanoosh; Gosman, Amanda; Chao, James J; Dobke, Marek K

    2011-02-01

    An individual's face undergoes numerous changes throughout life. Since mid-face aesthetic units are key areas for rejuvenation procedures, their comprehensive assessment is essential for the development of any aesthetic management plan. Despite the availability of many evaluation criteria for treatment of mid-face ageing, there are discrepancies existing in both assessment and management approaches. The goal of this study was to determine if there are any identifiable profiles of clinical judgements and approaches related to the level of surgeon's experience. Forty seven standardised non-digital and not altered natural size photographic images of patients' faces (front and profile) were presented to eight senior board certified plastic surgeons, eight junior non-board certified plastic surgeons and eight plastic surgery residents from an independent program. Surveyed physicians were 'blinded' from each other and asked to assess five different major features characterising ageing mid-face. An interclass correlation data analysis was performed and the Cronbach coefficient alpha values were computed for each category. Responses obtained from senior plastic surgeons were consistently characterised by higher Cronbach coefficient alpha values indicating higher concordance. The highest agreement levels were obtained for the assessment of rhytids and jowls across all groups and the lowest agreement levels were obtained for the assessment and recommendation of upper lip management. This study illustrated that discrepancies in clinical assessments and surgical management exist among surgeons involved in the aesthetic surgery of the mid-face ageing. It appears that the level of surgeon's experience significantly impacts the inter-rater reliability and consensus in assessment and treatment of mid-face ageing. The most senior plastic surgeons' assessment and recommendations had the highest level of concordance while the junior non-board certified plastic surgeons and the

  17. The reliability and validity of the Turkish version of Fullerton Advanced Balance (FAB-T) scale.

    Science.gov (United States)

    Iyigun, Gozde; Kirmizigil, Berkiye; Angin, Ender; Oksuz, Sevim; Can, Filiz; Eker, Levent; Rose, Debra J

    2018-06-04

    The aim of this study was to evaluate the reliability and validity of the Turkish version of the FAB(FAB-T) scale in the older Turkish adults. The reliability and validity of the scale was tested on 200 community-dwelling older adults. FAB-T scale was scored by different physiotherapists on different days to evaluate inter-rater and intrarater reliability. The Berg Balance Scale (BBS) was used for the evaluation of convergent validity, and the content validity of the FAB-T scale was investigated. The FAB-T scale showed very high inter- and intra-rater reliability. For inter-rater agreement, on the individual test items and total score ICC values were 0.92 (95 %CI; 0.90-0.94) and 0.96 (95% CI; 0.95-0.97) respectively. The intra-rater agreement, on the individual test items and total score ICC values were 0.93 (95 %CI; 0.91- 0.95) and 0.96 (95% CI; 0.95- 0.97) respectively. There was a good agreement between the FAB-T and BBS scales. A high correlation was found between the BBS and FAB-T scales [rho = 0.70 (%95 CI; 0.62-0.76)] indicating good convergent validity. Considering the content validity of the FAB-T scale, no floor (floor score: 0%) or ceiling (ceiling score: 6.5%) effect was detected. The FAB-T scale was successfully translated from the original English version (FAB) and demonstrated strong psychometric features. It was found that the FAB-T scale has very high inter-rater and intra-rater reliability. Considering the convergent validity, the scale has high correlation with the BBS. The FAB-T has no floor and ceiling effect. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. OPTION(5) versus OPTION(12) instruments to appreciate the extent to which healthcare providers involve patients in decision-making.

    Science.gov (United States)

    Stubenrouch, Fabienne E; Pieterse, Arwen H; Falkenberg, Rijan; Santema, T Katrien B; Stiggelbout, Anne M; van der Weijden, Trudy; Aarts, J Annemijn W M; Ubbink, Dirk T

    2016-06-01

    The 12-item "observing patient involvement" (OPTION(12))-instrument is commonly used to assess the extent to which healthcare providers involve patients in health-related decision-making. The five-item version (OPTION(5)) claims to be a more efficient measure. In this study we compared the Dutch versions of the OPTION-instruments in terms of inter-rater agreement and correlation in outpatient doctor-patient consultations in various settings, to learn if we can safely switch to the shorter OPTION(5)-instrument. Two raters coded 60 audiotaped vascular surgery and oncology patient consultations using OPTION(12) and OPTION(5). Unweighted Cohen's kappa was used to compute inter-rater agreement on item-level. The association between the total scores of the two OPTION-instruments was investigated using Pearson's correlation coefficient (r) and a Bland & Altman plot. After fine-tuning the OPTION-manuals, inter-rater agreement for OPTION(12) and OPTION(5) was good to excellent (kappa range 0.69-0.85 and 0.63-0.72, respectively). Mean total scores were 23.7 (OPTION(12); SD=7.8) and 39.3 (OPTION(5); SD=12.7). Correlation between the total scores was high (r=0.71; p=0.01). OPTION(5) scored systematically higher with a wider range than OPTION(12). Both OPTION-instruments had a good inter-rater agreement and correlated well. OPTION(5) seems to differentiate better between various levels of patient involvement. The OPTION(5)-instrument is recommended for clinical application. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  19. Reliability of widefield capillary microscopy to measure nailfold capillary density in systemic sclerosis.

    Science.gov (United States)

    Hudson, M; Masetto, A; Steele, R; Arthurs, E; Baron, M

    2010-01-01

    To determine intra- and inter-observer reliability of widefield microscopy to measure nailfold capillary density in patients with systemic sclerosis (SSc). Five SSc patients were examined with a STEMV-8 Zeiss biomicroscope with 50x magnification. The nailfold of the second, third, fourth and fifth fingers of both hands of each patient were photographed twice by each of two observers, once in the morning and again in the afternoon (total of 32 pictures). Two raters reviewed the photographs to produce capillary density readings. Intra- and inter-rater reliability of the readings were computed using intra-class correlations (ICC). Additional analyses were undertaken to determine the impact of other sources of variability in the data, namely patient, finger, technician and time. Intra-and inter-rater reliability were substantial (ICC 0.72-0.84) when raters were reading the same photographs or photographs taken at the same time of day. Agreement was only fair between morning and afternoon density readings (ICC 0.30-0.37). Patients, individual fingers and technician accounted for a large part of the variability in the data (combined variance component of 7.69 out of the total 12.23). The coefficient of variation of widefield microscopy was 24%. Although intra- and inter-rater reliability of nailfold capillary density measurements using widefield microscopy are good, proper standardisation of the conditions under which capillaroscopy is done and better imaging of nailfold capillary abnormalities should be considered if nailfold capillary density is to be used as an outcome measure in multi-centre clinical trials in SSc.

  20. Diagnosis of Esophageal Motility Disorders: Esophageal Pressure Topography vs. Conventional Line Tracing.

    Science.gov (United States)

    Carlson, Dustin A; Ravi, Karthik; Kahrilas, Peter J; Gyawali, C Prakash; Bredenoord, Arjan J; Castell, Donald O; Spechler, Stuart J; Halland, Magnus; Kanuri, Navya; Katzka, David A; Leggett, Cadman L; Roman, Sabine; Saenz, Jose B; Sayuk, Gregory S; Wong, Alan C; Yadlapati, Rena; Ciolino, Jody D; Fox, Mark R; Pandolfino, John E

    2015-07-01

    Enhanced characterization of esophageal peristaltic and sphincter function provided by esophageal pressure topography (EPT) offers a potential diagnostic advantage over conventional line tracings (CLT). However, high-resolution manometry (HRM) and EPT require increased equipment costs over conventional systems and evidence demonstrating a significant diagnostic advantage of EPT over CLT is limited. Our aim was to investigate whether the inter-rater agreement and/or accuracy of esophageal motility diagnosis differed between EPT and CLT. Forty previously completed patient HRM studies were selected for analysis using a customized software program developed to perform blinded independent interpretation in either EPT or CLT (six pressure sensors) format. Six experienced gastroenterologists with a clinical focus in esophageal disease (attendings) and six gastroenterology trainees with minimal manometry experience (fellows) from three academic centers interpreted each of the 40 studies using both EPT and CLT formats. Rater diagnoses were assessed for inter-rater agreement and diagnostic accuracy, both for exact diagnosis and for correct identification of a major esophageal motility disorder. The total group agreement was moderate (κ=0.57; 95% CI: 0.56-0.59) for EPT and fair (κ=0.32; 0.30-0.33) for CLT. Inter-rater agreement between attendings was good (κ=0.68; 0.65-0.71) for EPT and moderate (κ=0.46; 0.43-0.50) for CLT. Inter-rater agreement between fellows was moderate (κ=0.48; 0.45-0.50) for EPT and poor to fair (κ=0.20; 0.17-0.24) for CLT. Among all raters, the odds of an incorrect exact esophageal motility diagnosis were 3.3 times higher with CLT assessment than with EPT (OR: 3.3; 95% CI: 2.4-4.5; PCLT than with EPT (OR: 3.4; 2.4-5.0; PCLT among our selected raters. On the basis of these findings, EPT may be the preferred assessment modality of esophageal motility.

  1. Emotional Bias in Classroom Observations: Within-Rater Positive Emotion Predicts Favorable Assessments of Classroom Quality

    Science.gov (United States)

    Floman, James L.; Hagelskamp, Carolin; Brackett, Marc A.; Rivers, Susan E.

    2017-01-01

    Classroom observations increasingly inform high-stakes decisions and research in education, including the allocation of school funding and the evaluation of school-based interventions. However, trends in rater scoring tendencies over time may undermine the reliability of classroom observations. Accordingly, the present investigations, grounded in…

  2. An overview of coefficient alpha and a reliability matrix for estimating adequacy of internal consistency coefficients with psychological research measures.

    Science.gov (United States)

    Ponterotto, Joseph G; Ruckdeschel, Daniel E

    2007-12-01

    The present article addresses issues in reliability assessment that are often neglected in psychological research such as acceptable levels of internal consistency for research purposes, factors affecting the magnitude of coefficient alpha (alpha), and considerations for interpreting alpha within the research context. A new reliability matrix anchored in classical test theory is introduced to help researchers judge adequacy of internal consistency coefficients with research measures. Guidelines and cautions in applying the matrix are provided.

  3. The Student Actions Coding Sheet (SACS): An instrument for illuminating the shifts toward student-centered science classrooms

    Science.gov (United States)

    Erdogan, Ibrahim; Campbell, Todd; Hashidah Abd-Hamid, Nor

    2011-07-01

    This study describes the development of an instrument to investigate the extent to which student-centered actions are occurring in science classrooms. The instrument was developed through the following five stages: (1) student action identification, (2) use of both national and international content experts to establish content validity, (3) refinement of the item pool based on reviewer comments, (4) pilot testing of the instrument, and (5) statistical reliability and item analysis leading to additional refinement and finalization of the instrument. In the field test, the instrument consisted of 26 items separated into four categories originally derived from student-centered instruction literature and used by the authors to sort student actions in previous research. The SACS was administered across 22 Grade 6-8 classrooms by 22 groups of observers, with a total of 67 SACS ratings completed. The finalized instrument was found to be internally consistent, with acceptable estimates from inter-rater intraclass correlation reliability coefficients at the p Observation Protocol. Based on the analyses completed, the SACS appears to be a useful instrument for inclusion in comprehensive assessment packages for illuminating the extent to which student-centered actions are occurring in science classrooms.

  4. Models of Inter-Organizational Logistics Management in Slovenia

    Directory of Open Access Journals (Sweden)

    Sašo Murtič

    2015-03-01

    Full Text Available Throughout the history, the transportation of goods and related logistics have played an important role in human development and existence. This pertains to numerous interlinked processes, whose management is often linked to social system, international linkages, development of industry, market and market specifics. In modern times, the management of these processes is increasingly bound to globalization of production and market, moving of production to countries with cheaper labour force, environmental protection. The present Slovenian economy depends to a large extent on economies and corporate relations of the European Union and the world. Such inter-connectedness demands frequent transportation of semi-finished and finished goods. By providing timely delivery of goods, transportation consequently enables inter-organizational linkages and individual production, economic, market and other processes. Organizational and inter-organizational management of transport logistics demands profound understanding of transport flows, freight forwarding expertise and knowledge of transport, tax, environmental and other related regulations. Adequate knowledge and mastering of cultural, linguistic, national and other differences is important as well. The presented analysis and evaluation form the basis of the construction of inter-organizational model of logistics management in Slovenia.

  5. ICD-11 (JLMMS) and SCT Inter-Operation.

    Science.gov (United States)

    Mamou, Marzouk; Rector, Alan; Schulz, Stefan; Campbell, James; Solbrig, Harold; Rodrigues, Jean-Marie

    2016-01-01

    The goal of this work is to contribute to a smooth and semantically sound inter-operability between the ICD-11 (International Classification of Diseases-11th revision Joint Linearization for Mortality, Morbidity and Statistics) and SNOMED CT (SCT). To guarantee such inter-operation between a classification, characterized by a single hierarchy of mutually exclusive and exhaustive classes, as is the JLMMS successor of ICD-10 on the one hand, and the multi-hierarchical, ontology-based clinical terminology SCT on the other hand, we use ontology axioms that logically express generalizable truths. This is expressed by the compositional grammar of SCT, together with queries on axiomsof SCT. We test the feasibility of the method on the circulatory chapter of ICD-11 JLMMS and present limitations and results.

  6. Inter-decadal change of the lagged inter-annual relationship between local sea surface temperature and tropical cyclone activity over the western North Pacific

    Science.gov (United States)

    Zhao, Haikun; Wu, Liguang; Raga, G. B.

    2018-02-01

    This study documents the inter-decadal change of the lagged inter-annual relationship between the TC frequency (TCF) and the local sea surface temperature (SST) in the western North Pacific (WNP) during 1979-2014. An abrupt shift of the lagged relationship between them is observed to occur in 1998. Before the shift (1979-1997), a moderately positive correlation (0.35) between previous-year local SST and TCF is found, while a significantly negative correlation (- 0.71) is found since the shift (1998-2014). The inter-decadal change of the lagged relationship between TCF and local SST over the WNP is also accompanied by an inter-decadal change in the lagged inter-annual relationship between large-scale factors affecting TCs and local SST over the WNP. During 1998-2014, the previous-year local SST shows a significant negative correlation with the mid-level moisture and a significant positive correlation with the vertical wind shear over the main development region of WNP TC genesis. Almost opposite relationships are seen during 1979-1997, with a smaller magnitude of the correlation coefficients. These changes are consistent with the changes of the lagged inter-annual relationship between upper- and lower-level winds and local SST over the WNP. Analyses further suggests that the inter-decadal shift of the lagged inter-annual relationship between WNP TCF and local SST may be closely linked to the inter-decadal change of inter-annual SST transition over the tropical central-eastern Pacific associated with the climate regime shift in the late 1990s. Details on the underlying physical process need further investigation using observations and simulations.

  7. The development of the Quality Indicator for Rehabilitative Care (QuIRC: a measure of best practice for facilities for people with longer term mental health problems

    Directory of Open Access Journals (Sweden)

    Visser Ellen

    2011-03-01

    Full Text Available Abstract Background Despite the progress over recent decades in developing community mental health services internationally, many people still receive treatment and care in institutional settings. Those most likely to reside longest in these facilities have the most complex mental health problems and are at most risk of potential abuses of care and exploitation. This study aimed to develop an international, standardised toolkit to assess the quality of care in longer term hospital and community based mental health units, including the degree to which human rights, social inclusion and autonomy are promoted. Method The domains of care included in the toolkit were identified from a systematic literature review, international expert Delphi exercise, and review of care standards in ten European countries. The draft toolkit comprised 154 questions for unit managers. Inter-rater reliability was tested in 202 units across ten countries at different stages of deinstitutionalisation and development of community mental health services. Exploratory factor analysis was used to corroborate the allocation of items to domains. Feedback from those using the toolkit was collected about its usefulness and ease of completion. Results The toolkit had excellent inter-rater reliability and few items with narrow spread of response. Unit managers found the content highly relevant and were able to complete it in around 90 minutes. Minimal refinement was required and the final version comprised 145 questions assessing seven domains of care. Conclusions Triangulation of qualitative and quantitative evidence directed the development of a robust and comprehensive international quality assessment toolkit for units in highly variable socioeconomic and political contexts.

  8. The interrater and test-retest reliability of the Home Falls and Accidents Screening Tool (HOME FAST) in Malaysia: Using raters with a range of professional backgrounds.

    Science.gov (United States)

    Romli, Muhammad Hibatullah; Mackenzie, Lynette; Lovarini, Meryl; Tan, Maw Pin; Clemson, Lindy

    2017-06-01

    Falls can be a devastating issue for older people living in the community, including those living in Malaysia. Health professionals and community members have a responsibility to ensure that older people have a safe home environment to reduce the risk of falls. Using a standardised screening tool is beneficial to intervene early with this group. The Home Falls and Accidents Screening Tool (HOME FAST) should be considered for this purpose; however, its use in Malaysia has not been studied. Therefore, the aim of this study was to evaluate the interrater and test-retest reliability of the HOME FAST with multiple professionals in the Malaysian context. A cross-sectional design was used to evaluate interrater reliability where the HOME FAST was used simultaneously in the homes of older people by 2 raters and a prospective design was used to evaluate test-retest reliability with a separate group of older people at different times in their homes. Both studies took place in an urban area of Kuala Lumpur. Professionals from 9 professional backgrounds participated as raters in this study, and a group of 51 community older people were recruited for the interrater reliability study and another group of 30 for the test-retest reliability study. The overall agreement was moderate for interrater reliability and good for test-retest reliability. The HOME FAST was consistently rated by different professionals, and no bias was found among the multiple raters. The HOME FAST can be used with confidence by a variety of professionals across different settings. The HOME FAST can become a universal tool to screen for home hazards related to falls. © 2017 John Wiley & Sons, Ltd.

  9. Internal Consistency of the easyCBM© CCSS Reading Measures: Grades 3-8. Technical Report #1407

    Science.gov (United States)

    Guerreiro, Meg; Alonzo, Julie; Tindal, Gerald

    2014-01-01

    This technical report documents findings from a study of the internal consistency and split-half reliability of the easyCBM© CCSS Reading measures, grades 3-8. Data, drawn from an extant data set gathered in school year 2013-2014, include scores from over 150,000 students' fall and winter benchmark assessments. Findings suggest that the easyCBM©…

  10. Evaluation of inter-fraction error during prostate radiotherapy

    International Nuclear Information System (INIS)

    Komiyama, Takafumi; Nakamura, Koji; Motoyama, Tsuyoshi; Onishi, Hiroshi; Sano, Naoki

    2008-01-01

    The purpose of this study was to evaluate inter-fraction error (inter-fraction set-up error+inter-fraction internal organ motion) between treatment planning and delivery during radiotherapy for localized prostate cancer. Twenty three prostate cancer patients underwent image-guided radical irradiation with the CT-linac system. All patients were treated in the supine position. After set-up with external skin markers, using CT-linac system, pretherapy CT images were obtained and isocenter displacement was measured. The mean displacement of the isocenter was 1.8 mm, 3.3 mm, and 1.7 mm in the left-right, ventral-dorsal, and cranial-caudal directions, respectively. The maximum displacement of the isocenter was 7 mm, 12 mm, and 9 mm in the left-right, ventral-dorsal, and cranial-caudal directions, respectively. The mean interquartile range of displacement of the isocenter was 1.8 mm, 3.7 mm, and 2.0 mm in the left-right, ventral-dorsal, and cranial-caudal directions, respectively. In radiotherapy for localized prostate cancer, inter-fraction error was largest in the ventral-dorsal directions. Errors in the ventral-dorsal directions influence both local control and late adverse effects. Our study suggested the set-up with external skin markers was not enough for radical radiotherapy for localized prostate cancer, thereby those such as a CT-linac system for correction of inter-fraction error being required. (author)

  11. How Well Does the Sum Score Summarize the Test? Summability as a Measure of Internal Consistency

    NARCIS (Netherlands)

    Goeman, J.J.; De, Jong N.H.

    2018-01-01

    Many researchers use Cronbach's alpha to demonstrate internal consistency, even though it has been shown numerous times that Cronbach's alpha is not suitable for this. Because the intention of questionnaire and test constructers is to summarize the test by its overall sum score, we advocate

  12. Psychometric properties of a sign language version of the Mini International Neuropsychiatric Interview (MINI).

    Science.gov (United States)

    Øhre, Beate; Saltnes, Hege; von Tetzchner, Stephen; Falkum, Erik

    2014-05-22

    There is a need for psychiatric assessment instruments that enable reliable diagnoses in persons with hearing loss who have sign language as their primary language. The objective of this study was to assess the validity of the Norwegian Sign Language (NSL) version of the Mini International Neuropsychiatric Interview (MINI). The MINI was translated into NSL. Forty-one signing patients consecutively referred to two specialised psychiatric units were assessed with a diagnostic interview by clinical experts and with the MINI. Inter-rater reliability was assessed with Cohen's kappa and "observed agreement". There was 65% agreement between MINI diagnoses and clinical expert diagnoses. Kappa values indicated fair to moderate agreement, and observed agreement was above 76% for all diagnoses. The MINI diagnosed more co-morbid conditions than did the clinical expert interview (mean diagnoses: 1.9 versus 1.2). Kappa values indicated moderate to substantial agreement, and "observed agreement" was above 88%. The NSL version performs similarly to other MINI versions and demonstrates adequate reliability and validity as a diagnostic instrument for assessing mental disorders in persons who have sign language as their primary and preferred language.

  13. Psychometric properties of a sign language version of the Mini International Neuropsychiatric Interview (MINI)

    Science.gov (United States)

    2014-01-01

    Background There is a need for psychiatric assessment instruments that enable reliable diagnoses in persons with hearing loss who have sign language as their primary language. The objective of this study was to assess the validity of the Norwegian Sign Language (NSL) version of the Mini International Neuropsychiatric Interview (MINI). Methods The MINI was translated into NSL. Forty-one signing patients consecutively referred to two specialised psychiatric units were assessed with a diagnostic interview by clinical experts and with the MINI. Inter-rater reliability was assessed with Cohen’s kappa and “observed agreement”. Results There was 65% agreement between MINI diagnoses and clinical expert diagnoses. Kappa values indicated fair to moderate agreement, and observed agreement was above 76% for all diagnoses. The MINI diagnosed more co-morbid conditions than did the clinical expert interview (mean diagnoses: 1.9 versus 1.2). Kappa values indicated moderate to substantial agreement, and “observed agreement” was above 88%. Conclusion The NSL version performs similarly to other MINI versions and demonstrates adequate reliability and validity as a diagnostic instrument for assessing mental disorders in persons who have sign language as their primary and preferred language. PMID:24886297

  14. The Importance of Geographical Proximity for New Product Development Activities within Inter-firm Linkages

    DEFF Research Database (Denmark)

    Dahlgren, Johan Henrich

    important as a resource and where collaboration partners are important. Hypotheses are tested by means of a quantitative analysis of a data set containing information about 4842 domestic and international inter-firm linkages of Danish firms in manufacturing industries. The findings in this analysis exhibit...... for international linkages. It is further suggested closer geographical distance for inter-firm linkages with medium and high level of interaction, suppliers or customers accounting for more than one third of total purchases or sales, and for linkages lasting for at least 10 years.Key words: capabilities, economics...

  15. Psychometric analysis of the TRANSIT quality indicators for cardiovascular disease prevention in primary care.

    Science.gov (United States)

    Khanji, Cynthia; Bareil, Céline; Hudon, Eveline; Goudreau, Johanne; Duhamel, Fabie; Lussier, Marie-Thérèse; Perreault, Sylvie; Lalonde, Gilles; Turcotte, Alain; Berbiche, Djamal; Martin, Élisabeth; Lévesque, Lise; Gagnon, Marie-Mireille; Lalonde, Lyne

    2017-12-01

    To assess a selection of psychometric properties of the TRANSIT indicators. Using medical records, indicators were documented retrospectively during the 14 months preceding the end of the TRANSIT study. Primary care in Quebec, Canada. Indicators were documented in a random subsample (n = 123 patients) of the TRANSIT study population (n = 759). For every patient, the mean compliance to all indicators of a category (subscale score) and to the complete set of indicators (overall scale score) were established. To evaluate test-retest and inter-rater reliabilities, indicators were applied twice, two months apart, by the same evaluator and independently by different evaluators, respectively. To evaluate convergent validity, correlations between TRANSIT indicators, Burge et al. indicators and Institut national d'excellence en santé et en services sociaux (INESSS) indicators were examined. Test-retest reliability, inter-rater reliability, and convergent validity. Test-retest reliability, as measured by intraclass correlation coefficients (ICCs) was equal to 0.99 (0.99-0.99) for the overall scale score while inter-rater reliability was equal to 0.95 (0.93-0.97) for the overall scale score. Convergent validity, as measured by Pearson's correlation coefficients, was equal to 0.77 (P TRANSIT indicators were compared to Burge et al. indicators and to 0.82 (P TRANSIT indicators were compared to INESSS indicators. Reliability was excellent except for eleven indicators while convergent validity was strong except for domains related to the management of CVD risk factors. © The Author 2017. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  16. Spanish translation, cross-cultural adaptation, and validation of the Questionnaire for Diabetes-Related Foot Disease (Q-DFD).

    Science.gov (United States)

    Castillo-Tandazo, Wilson; Flores-Fortty, Adolfo; Feraud, Lourdes; Tettamanti, Daniel

    2013-01-01

    To translate, cross-culturally adapt, and validate the Questionnaire for Diabetes-Related Foot Disease (Q-DFD), originally created and validated in Australia, for its use in Spanish-speaking patients with diabetes mellitus. The translation and cross-cultural adaptation were based on international guidelines. The Spanish version of the survey was applied to a community-based (sample A) and a hospital clinic-based sample (samples B and C). Samples A and B were used to determine criterion and construct validity comparing the survey findings with clinical evaluation and medical records, respectively; while sample C was used to determine intra- and inter-rater reliability. After completing the rigorous translation process, only four items were considered problematic and required a new translation. In total, 127 patients were included in the validation study: 76 to determine criterion and construct validity and 41 to establish intra- and inter-rater reliability. For an overall diagnosis of diabetes-related foot disease, a substantial level of agreement was obtained when we compared the Q-DFD with the clinical assessment (kappa 0.77, sensitivity 80.4%, specificity 91.5%, positive likelihood ratio [LR+] 9.46, negative likelihood ratio [LR-] 0.21); while an almost perfect level of agreement was obtained when it was compared with medical records (kappa 0.88, sensitivity 87%, specificity 97%, LR+ 29.0, LR- 0.13). Survey reliability showed substantial levels of agreement, with kappa scores of 0.63 and 0.73 for intra- and inter-rater reliability, respectively. The translated and cross-culturally adapted Q-DFD showed good psychometric properties (validity, reproducibility, and reliability) that allow its use in Spanish-speaking diabetic populations.

  17. The memory failures of everyday questionnaire (MFE): internal consistency and reliability.

    Science.gov (United States)

    Montejo Carrasco, Pedro; Montenegro, Peña Mercedes; Sueiro, Manuel J

    2012-07-01

    The Memory Failures of Everyday Questionnaire (MFE) is one of the most widely-used instruments to assess memory failures in daily life. The original scale has nine response options, making it difficult to apply; we created a three-point scale (0-1-2) with response choices that make it easier to administer. We examined the two versions' equivalence in a sample of 193 participants between 19 and 64 years of age. The test-retest reliability and internal consistency of the version we propose were also computed in a sample of 113 people. Several indicators attest to the two forms' equivalence: the correlation between the items' means (r = .94; p MFE 1-9. The MFE 0-2 provides a brief, simple evaluation, so we recommend it for use in clinical practice as well as research.

  18. An initial limited biodosimetry inter-comparison exercise: FOI and DRDC Ottawa

    International Nuclear Information System (INIS)

    Stricklin, D.; Wilkinson, D.; Arvidsson, E.; Prud'homme-Lalonde, L.; Thorleifson, E.; Mullins, D.; Lachapelle, S.

    2007-01-01

    While biodosimetry is a valuable tool in radiation dose assessment, the dicentric assay, which is the most validated method to date, requires some degree of technical competence. Recently published ISO guidelines indicate the need for documenting competence and establishment of quality control programs. Inter-laboratory comparisons are required to document the ability to perform reproducible and accurate assessments. FOI and DRDC Ottawa have conducted an initial limited biodosimetry exercise inter-comparison for quality assurance purposes. The exercise involved blinded exchange of three previously prepared slides from each laboratory from samples that had been evaluated for each lab's dose-response curve. Approximately 100 cells from each slide were evaluated and aberration frequencies reported and compared to the expected frequencies. The limited number of cells evaluated for each sample could not permit statistically distinguishing a 20% difference in all the samples. However, the results indicated reasonable agreement in analyses for all samples for triage purposes. Comparison of aberration frequencies, rather than dose estimates, further illustrates consistent scoring criteria between the two laboratories. The exercise conducted by FOI and DRDC Ottawa provided an efficient means of documenting expertise. Such cooperation further establishes the international biodosimetry network and ensures our readiness for emergency response

  19. Questionnaire-based assessment of executive functioning: Psychometrics.

    Science.gov (United States)

    Castellanos, Irina; Kronenberger, William G; Pisoni, David B

    2018-01-01

    The psychometric properties of the Learning, Executive, and Attention Functioning (LEAF) scale were investigated in an outpatient clinical pediatric sample. As a part of clinical testing, the LEAF scale, which broadly measures neuropsychological abilities related to executive functioning and learning, was administered to parents of 118 children and adolescents referred for psychological testing at a pediatric psychology clinic; 85 teachers also completed LEAF scales to assess reliability across different raters and settings. Scores on neuropsychological tests of executive functioning and academic achievement were abstracted from charts. Psychometric analyses of the LEAF scale demonstrated satisfactory internal consistency, parent-teacher inter-rater reliability in the small to large effect size range, and test-retest reliability in the large effect size range, similar to values for other executive functioning checklists. Correlations between corresponding subscales on the LEAF and other behavior checklists were large, while most correlations with neuropsychological tests of executive functioning and achievement were significant but in the small to medium range. Results support the utility of the LEAF as a reliable and valid questionnaire-based assessment of delays and disturbances in executive functioning and learning. Applications and advantages of the LEAF and other questionnaire measures of executive functioning in clinical neuropsychology settings are discussed.

  20. [Discomfort associated with dental extraction surgery and development of a questionnaire (QCirDental). Part I: Impacts and internal consistency].

    Science.gov (United States)

    Bortoluzzi, Marcelo Carlos; Martins, Luciana Dorochenko; Takahashi, André; Ribeiro, Bianca; Martins, Ligiane; Pinto, Marcia Helena Baldani

    2018-01-01

    The scope of this study was to develop and validate a questionnaire (QCirDental) to measure the impacts associated with dental extraction surgery. The QCirDental questionnaire was developed in two steps; (1) question and item generation and selection, and (2) pretest of the questionnaire with evaluation of the its measurement properties (internal consistency and responsiveness). The sample was composed of 123 patients. None of the patients had any difficulty in understanding the QCirDental. The instrument was found to have excellent internal consistency with Cronbach's alpha reliability coefficient of 0.83. The principal component analysis (Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0,72 and Bartlett's Test of Sphericity with p < 0.001) showed six (6) dimensions explaining 67.5% of the variance. The QCirDental presented excellent internal consistency, being a questionnaire that is easy to read and understand with adequate semantic and content validity. More than 80% of the patients who underwent dental extraction reported some degree of discomfort within the perioperative period which highlights the necessity to assess the quality of care and impacts of dental extraction surgery.