WorldWideScience

Sample records for single item assessing

  1. Assessing the validity of single-item life satisfaction measures: results from three large samples.

    Science.gov (United States)

    Cheung, Felix; Lucas, Richard E

    2014-12-01

    The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS)-a more psychometrically established measure. Two large samples from Washington (N = 13,064) and Oregon (N = 2,277) recruited by the Behavioral Risk Factor Surveillance System and a representative German sample (N = 1,312) recruited by the Germany Socio-Economic Panel were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62-0.64; disattenuated r = 0.78-0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001-0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS was very small (average absolute difference = 0.015-0.042). Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use.

  2. Assessing the Validity of Single-item Life Satisfaction Measures: Results from Three Large Samples

    Science.gov (United States)

    Cheung, Felix; Lucas, Richard E.

    2014-01-01

    Purpose The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS) - a more psychometrically established measure. Methods Two large samples from Washington (N=13,064) and Oregon (N=2,277) recruited by the Behavioral Risk Factor Surveillance System (BRFSS) and a representative German sample (N=1,312) recruited by the Germany Socio-Economic Panel (GSOEP) were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Results Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62 – 0.64; disattenuated r = 0.78 – 0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001 – 0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS were very small (average absolute difference = 0.015 −0.042). Conclusions Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use. PMID:24890827

  3. Single-Item Measurement of Suicidal Behaviors: Validity and Consequences of Misclassification.

    Directory of Open Access Journals (Sweden)

    Alexander J Millner

    Full Text Available Suicide is a leading cause of death worldwide. Although research has made strides in better defining suicidal behaviors, there has been less focus on accurate measurement. Currently, the widespread use of self-report, single-item questions to assess suicide ideation, plans and attempts may contribute to measurement problems and misclassification. We examined the validity of single-item measurement and the potential for statistical errors. Over 1,500 participants completed an online survey containing single-item questions regarding a history of suicidal behaviors, followed by questions with more precise language, multiple response options and narrative responses to examine the validity of single-item questions. We also conducted simulations to test whether common statistical tests are robust against the degree of misclassification produced by the use of single-items. We found that 11.3% of participants that endorsed a single-item suicide attempt measure engaged in behavior that would not meet the standard definition of a suicide attempt. Similarly, 8.8% of those who endorsed a single-item measure of suicide ideation endorsed thoughts that would not meet standard definitions of suicide ideation. Statistical simulations revealed that this level of misclassification substantially decreases statistical power and increases the likelihood of false conclusions from statistical tests. Providing a wider range of response options for each item reduced the misclassification rate by approximately half. Overall, the use of single-item, self-report questions to assess the presence of suicidal behaviors leads to misclassification, increasing the likelihood of statistical decision errors. Improving the measurement of suicidal behaviors is critical to increase understanding and prevention of suicide.

  4. Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)

    Science.gov (United States)

    Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn

    2018-01-01

    The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…

  5. A psychometric comparison of three scales and a single-item measure to assess sexual satisfaction.

    Science.gov (United States)

    Mark, Kristen P; Herbenick, Debby; Fortenberry, J Dennis; Sanders, Stephanie; Reece, Michael

    2014-01-01

    This study was designed to systematically compare and contrast the psychometric properties of three scales developed to measure sexual satisfaction and a single-item measure of sexual satisfaction. The Index of Sexual Satisfaction (ISS), Global Measure of Sexual Satisfaction (GMSEX), and the New Sexual Satisfaction Scale-Short (NSSS-S) were compared to one another and to a single-item measure of sexual satisfaction. Conceptualization of the constructs, distribution of scores, internal consistency, convergent validity, test-retest reliability, and factor structure were compared between the measures. A total of 211 men and 214 women completed the scales and a measure of relationship satisfaction, with 33% (n = 139) of the sample reassessed two months later. All scales demonstrated appropriate distribution of scores and adequate internal consistency. The GMSEX, NSSS-S, and the single-item measure demonstrated convergent validity. Test-retest reliability was demonstrated by the ISS, GMSEX, and NSSS-S, but not the single-item measure. Taken together, the GMSEX received the strongest psychometric support in this sample for a unidimensional measure of sexual satisfaction and the NSSS-S received the strongest psychometric support in this sample for a bidimensional measure of sexual satisfaction.

  6. Single-item measure for assessing quality of life in children with drug-resistant epilepsy.

    Science.gov (United States)

    Conway, Lauryn; Widjaja, Elysa; Smith, Mary Lou

    2018-03-01

    The current study investigated the psychometric properties of a single-item quality of life (QOL) measure, the Global Quality of Life in Childhood Epilepsy question (G-QOLCE), in children with drug-resistant epilepsy. Data came from the Impact of Pediatric Epilepsy Surgery on Health-Related Quality of Life Study (PESQOL), a multicenter prospective cohort study (n = 118) with observations collected at baseline and at 6 months of follow-up on children aged 4-18 years. QOL was measured with the QOLCE-76 and KIDSCREEN-27. The G-QOLCE was an overall QOL question derived from the QOLCE-76. Construct validity and reliability were assessed with Spearman's correlation and intraclass correlation coefficient (ICC). Responsiveness was examined through distribution-based and anchor-based methods. The G-QOLCE showed moderate (r ≥ 0.30) to strong (r ≥ 0.50) correlations with composite scores, and most subscales of the QOLCE-76 and KIDSCREEN-27 at baseline and 6-month follow-up. The G-QOLCE had moderate test-retest reliability (ICC range: 0.49-0.72) and was able to detect clinically important change in patients' QOL (standardized response mean: 0.38; probability of change: 0.65; Guyatt's responsiveness statistics: 0.62 and 0.78). Caregiver anxiety and family functioning contributed most strongly to G-QOLCE scores over time. Results offer promising preliminary evidence regarding the validity, reliability, and responsiveness of the proposed single-item QOL measure. The G-QOLCE is a potentially useful tool that can be feasibly administered in a busy clinical setting to evaluate clinical status and impact of treatment outcomes in pediatric epilepsy.

  7. The utility of single-item readiness screeners in middle school.

    Science.gov (United States)

    Lewis, Crystal G; Herman, Keith C; Huang, Francis L; Stormont, Melissa; Grossman, Caroline; Eddy, Colleen; Reinke, Wendy M

    2017-10-01

    This study examined the benefit of utilizing one-item academic and one-item behavior readiness teacher-rated screeners at the beginning of the school year to predict end-of-school year outcomes for middle school students. The Middle School Academic and Behavior Readiness (M-ABR) screeners were developed to provide an efficient and effective way to assess readiness in students. Participants included 889 students in 62 middle school classrooms in an urban Missouri school district. Concurrent validity with the M-ABR items and other indicators of readiness in the fall were evaluated using Pearson product-moment correlation coefficients, with the academic readiness item having medium to strong correlations with other baseline academic indicators (r=±0.56 to 0.91) and the behavior readiness item having low to strong correlations with baseline behavior items (r=±0.20 to 0.79). Next, the predictive validity of the M-ABR items was analyzed with hierarchical linear regressions using end-of-year outcomes as the dependent variable. The academic and behavior readiness items demonstrated adequate validity for all outcomes with moderate effects (β=±0.31 to 0.73 for academic outcomes and β=±0.24 to 0.59 for behavioral outcomes) after controlling for baseline demographics. Even after controlling for baseline scores, the M-ABR items predicted unique variance in almost all outcome variables. Four conditional probability indices were calculated to obtain an optimal cut score, to determine ready vs. not ready, for both single-item M-ABR scales. The cut point of "fair" yielded the most acceptable values for the indices. The odd ratios (OR) of experiencing negative outcomes given a "fair" or lower readiness rating (2 or below on the M-ABR screeners) at the beginning of the year were significant and strong for all outcomes (OR=2.29 to OR=14.46), except for internalizing problems. These findings suggest promise for using single readiness items to screen for varying negative end

  8. Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

    Science.gov (United States)

    Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

    2013-07-01

    Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

  9. Work-related stress assessed by a text message single-item stress question.

    Science.gov (United States)

    Arapovic-Johansson, B; Wåhlin, C; Kwak, L; Björklund, C; Jensen, I

    2017-12-02

    Given the prevalence of work stress-related ill-health in the Western world, it is important to find cost-effective, easy-to-use and valid measures which can be used both in research and in practice. To examine the validity and reliability of the single-item stress question (SISQ), distributed weekly by short message service (SMS) and used for measurement of work-related stress. The convergent validity was assessed through associations between the SISQ and subscales of the Job Demand-Control-Support model, the Effort-Reward Imbalance model and scales measuring depression, exhaustion and sleep. The predictive validity was assessed using SISQ data collected through SMS. The reliability was analysed by the test-retest procedure. Correlations between the SISQ and all the subscales except for job strain and esteem reward were significant, ranging from -0.186 to 0.627. The SISQ could also predict sick leave, depression and exhaustion at 12-month follow-up. The analysis on reliability revealed a satisfactory stability with a weighted kappa between 0.804 and 0.868. The SISQ, administered through SMS, can be used for the screening of stress levels in a working population. © The Author 2017. Published by Oxford University Press on behalf of the Society of Occupational Medicine. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  10. Work ability as prognostic risk marker of disability pension : Single-item work ability score versus multi-item work ability index

    NARCIS (Netherlands)

    Roelen, C.A.M.; Rhenen, van W.; Groothoff, J.W.; Klink, van der J.J.L.; Twisk, W.R.; Heymans, M.W.

    2014-01-01

    Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP.

  11. Work ability as prognostic risk marker of disability pension: single-item work ability score versus multi-item work ability index

    NARCIS (Netherlands)

    Roelen, C.A.M.; van Rhenen, W.; Groothoff, J.W.; van der Klink, J.J.L.; Twisk, J.W.R.; Heymans, M.W.

    2014-01-01

    Objectives Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP. Methods This

  12. Work ability as prognostic risk marker of disability pension : single-item work ability score versus multi-item work ability index

    NARCIS (Netherlands)

    Roelen, Corne A. M.; van Rhenen, Willem; Groothoff, Johan W.; van der Klink, Jac J. L.; Twisk, Jos W. R.; Heymans, Martijn W.

    Objectives Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP. Methods This

  13. The development of a single-item Food Choice Questionnaire

    NARCIS (Netherlands)

    Onwezen, M.C.; Reinders, M.J.; Verain, M.C.D.; Snoek, H.M.

    2019-01-01

    Based on the multi-item Food Choice Questionnaire (FCQ) originally developed by Steptoe and colleagues (1995), the current study developed a single-item FCQ that provides an acceptable balance between practical needs and psychometric concerns. Studies 1 (N = 1851) and 2 (2a (N = 3290), 2b (N =

  14. Applying Item Response Theory methods to design a learning progression-based science assessment

    Science.gov (United States)

    Chen, Jing

    Learning progressions are used to describe how students' understanding of a topic progresses over time and to classify the progress of students into steps or levels. This study applies Item Response Theory (IRT) based methods to investigate how to design learning progression-based science assessments. The research questions of this study are: (1) how to use items in different formats to classify students into levels on the learning progression, (2) how to design a test to give good information about students' progress through the learning progression of a particular construct and (3) what characteristics of test items support their use for assessing students' levels. Data used for this study were collected from 1500 elementary and secondary school students during 2009--2010. The written assessment was developed in several formats such as the Constructed Response (CR) items, Ordered Multiple Choice (OMC) and Multiple True or False (MTF) items. The followings are the main findings from this study. The OMC, MTF and CR items might measure different components of the construct. A single construct explained most of the variance in students' performances. However, additional dimensions in terms of item format can explain certain amount of the variance in student performance. So additional dimensions need to be considered when we want to capture the differences in students' performances on different types of items targeting the understanding of the same underlying progression. Items in each item format need to be improved in certain ways to classify students more accurately into the learning progression levels. This study establishes some general steps that can be followed to design other learning progression-based tests as well. For example, first, the boundaries between levels on the IRT scale can be defined by using the means of the item thresholds across a set of good items. Second, items in multiple formats can be selected to achieve the information criterion at all

  15. Modeling Composite Assessment Data Using Item Response Theory

    Science.gov (United States)

    Ueckert, Sebastian

    2018-01-01

    Composite assessments aim to combine different aspects of a disease in a single score and are utilized in a variety of therapeutic areas. The data arising from these evaluations are inherently discrete with distinct statistical properties. This tutorial presents the framework of the item response theory (IRT) for the analysis of this data type in a pharmacometric context. The article considers both conceptual (terms and assumptions) and practical questions (modeling software, data requirements, and model building). PMID:29493119

  16. Forced-Choice Assessment of Work-Related Maladaptive Personality Traits: Preliminary Evidence From an Application of Thurstonian Item Response Modeling.

    Science.gov (United States)

    Guenole, Nigel; Brown, Anna A; Cooper, Andrew J

    2018-06-01

    This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.

  17. MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items

    Science.gov (United States)

    Wang, Wen-Chung; Shih, Ching-Lin

    2010-01-01

    Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…

  18. Cross-National Prevalence of Traditional Bullying, Traditional Victimization, Cyberbullying and Cyber-Victimization: Comparing Single-Item and Multiple-Item Approaches of Measurement

    Science.gov (United States)

    Yanagida, Takuya; Gradinger, Petra; Strohmeier, Dagmar; Solomontos-Kountouri, Olga; Trip, Simona; Bora, Carmen

    2016-01-01

    Many large-scale cross-national studies rely on a single-item measurement when comparing prevalence rates of traditional bullying, traditional victimization, cyberbullying, and cyber-victimization between countries. However, the reliability and validity of single-item measurement approaches are highly problematic and might be biased. Data from…

  19. Diagnostic Value of Subjective Memory Complaints Assessed with a Single Item in Dominantly Inherited Alzheimer’s Disease: Results of the DIAN Study

    Directory of Open Access Journals (Sweden)

    Christoph Laske

    2015-01-01

    Full Text Available Objective. We examined the diagnostic value of subjective memory complaints (SMCs assessed with a single item in a large cross-sectional cohort consisting of families with autosomal dominant Alzheimer’s disease (ADAD participating in the Dominantly Inherited Alzheimer Network (DIAN. Methods. The baseline sample of 183 mutation carriers (MCs and 117 noncarriers (NCs was divided according to Clinical Dementia Rating (CDR scale into preclinical (CDR 0; MCs: n=107; NCs: n=109, early symptomatic (CDR 0.5; MCs: n=48; NCs: n=8, and dementia stage (CDR ≥ 1; MCs: n=28; NCs: n=0. These groups were subdivided by the presence or absence of SMCs. Results. At CDR 0, SMCs were present in 12.1% of MCs and 9.2% of NCs (P=0.6. At CDR 0.5, SMCs were present in 66.7% of MCs and 62.5% of NCs (P=1.0. At CDR ≥ 1, SMCs were present in 96.4% of MCs. SMCs in MCs were significantly associated with CDR, logical memory scores, Geriatric Depression Scale, education, and estimated years to onset. Conclusions. The present study shows that SMCs assessed by a single-item scale have no diagnostic value to identify preclinical ADAD in asymptomatic individuals. These results demonstrate the need of further improvement of SMC measures that should be examined in large clinical trials.

  20. Evaluation of a single-item screening question to detect limited health literacy in peritoneal dialysis patients.

    Science.gov (United States)

    Jain, Deepika; Sheth, Heena; Bender, Filitsa H; Weisbord, Steven D; Green, Jamie A

    2014-01-01

    Studies have shown that a single-item question might be useful in identifying patients with limited health literacy. However, the utility of the approach has not been studied in patients receiving maintenance peritoneal dialysis (PD). We assessed health literacy in a cohort of 31 PD patients by administering the Rapid Estimate of Adult Literacy in Medicine (REALM) and a single-item health literacy (SHL) screening question "How confident are you filling out medical forms by yourself?" (Extremely, Quite a bit, Somewhat, A little bit, or Not at all). To determine the accuracy of the single-item question for detecting limited health literacy, we performed sensitivity and specificity analyses of the SHL and plotted the area under the receiver operating characteristic (AUROC) curve using the REALM as a reference standard. Using a cut-off of "Somewhat" or less confident, the sensitivity of the SHL for detecting limited health literacy was 80%, and the specificity was 88%. The positive likelihood ratio was 6.9. The SHL had an AUROC of 0.79 (95% confidence interval: 0.52 to 1.00). Our results show that the SHL could be effective in detecting limited health literacy in PD patients.

  1. A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means

    Science.gov (United States)

    Polak, Marike; De Rooij, Mark; Heiser, Willem J.

    2012-01-01

    In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…

  2. Assessing difference between classical test theory and item ...

    African Journals Online (AJOL)

    Assessing difference between classical test theory and item response theory methods in scoring primary four multiple choice objective test items. ... All research participants were ranked on the CTT number correct scores and the corresponding IRT item pattern scores from their performance on the PRISMADAT. Wilcoxon ...

  3. Item Response Theory for Peer Assessment

    Science.gov (United States)

    Uto, Masaki; Ueno, Maomi

    2016-01-01

    As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…

  4. The Single-Item Math Anxiety Scale: An Alternative Way of Measuring Mathematical Anxiety

    Science.gov (United States)

    Núñez-Peña, M. Isabel; Guilera, Georgina; Suárez-Pellicioni, Macarena

    2014-01-01

    This study examined whether the Single-Item Math Anxiety Scale (SIMA), based on the item suggested by Ashcraft, provided valid and reliable scores of mathematical anxiety. A large sample of university students (n = 279) was administered the SIMA and the 25-item Shortened Math Anxiety Rating Scale (sMARS) to evaluate the relation between the scores…

  5. The work ability index and single-item question: associations with sick leave, symptoms, and health--a prospective study of women on long-term sick leave.

    Science.gov (United States)

    Ahlstrom, Linda; Grimby-Ekman, Anna; Hagberg, Mats; Dellve, Lotta

    2010-09-01

    This study investigated the association between the work ability index (WAI) and the single-item question on work ability among women working in human service organizations (HSO) currently on long-term sick leave. It also examined the association between the WAI and the single-item question in relation to sick leave, symptoms, and health. Predictive values of the WAI, the changed WAI, the single-item question and the changed single-item question were investigated for degree of sick leave, symptoms, and health. This cohort study comprised 324 HSO female workers on long-term (>60 days) sick leave, with follow-ups at 6 and 12 months. Participants responded to questionnaires. Data on work ability, sick leave, health, and symptoms were analyzed with regard to associations and predictability. Spearman correlation and mixed-model analysis were performed for repeated measurements over time. The study showed a very strong association between the WAI and the single-item question among all participants. Both the WAI and the single-item question showed similar patterns of associations with sick leave, health, and symptoms. The predictive value for the degree of sick leave and health-related quality of life (HRQoL) was strong for both the WAI and the single-item question, and slightly less strong for vitality, neck pain, both self-rated general and mental health, and behavioral and current stress. This study suggests that the single-item question on work ability could be used as a simple indicator for assessing the status and progress of work ability among women on long-term sick leave.

  6. Work ability as prognostic risk marker of disability pension: single-item work ability score versus multi-item work ability index.

    Science.gov (United States)

    Roelen, Corné A M; van Rhenen, Willem; Groothoff, Johan W; van der Klink, Jac J L; Twisk, Jos W R; Heymans, Martijn W

    2014-07-01

    Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP. This prospective cohort study comprised 11 537 male construction workers, who completed the WAI at baseline and reported DP after a mean 2.3 years of follow-up. WAS and WAI were calibrated for DP risk predictions with the Hosmer-Lemeshow (H-L) test and their ability to discriminate between high- and low-risk construction workers was investigated with the area under the receiver operating characteristic curve (AUC). At follow-up, 336 (3%) construction workers reported DP. Both WAS [odds ratio (OR) 0.72, 95% confidence interval (95% CI) 0.66-0.78] and WAI (OR 0.57, 95% CI 0.52-0.63) scores were associated with DP at follow-up. The WAS showed miscalibration (H-L model χ (�)=10.60; df=3; P=0.01) and poorly discriminated between high- and low-risk construction workers (AUC 0.67, 95% CI 0.64-0.70). In contrast, calibration (H-L model χ �=8.20; df=8; P=0.41) and discrimination (AUC 0.78, 95% CI 0.75-0.80) were both adequate for the WAI. Although associated with the risk of future DP, the single-item WAS poorly identified male construction workers at risk of DP. We recommend using the multi-item WAI to screen for risk of DP in occupational health practice.

  7. Item reduction and psychometric validation of the Oily Skin Self Assessment Scale (OSSAS) and the Oily Skin Impact Scale (OSIS).

    Science.gov (United States)

    Arbuckle, Robert; Clark, Marci; Harness, Jane; Bonner, Nicola; Scott, Jane; Draelos, Zoe; Rizer, Ronald; Yeh, Yating; Copley-Merriman, Kati

    2009-01-01

    Developed using focus groups, the Oily Skin Self Assessment Scale (OSSAS) and Oily Skin Impact Scale (OSIS) are patient-reported outcome measures of oily facial skin. The aim of this study was to finalize the item-scale structure of the instruments and perform psychometric validation in adults with self-reported oily facial skin. The OSSAS and OSIS were administered to 202 adult subjects with oily facial skin in the United States. A subgroup of 152 subjects returned, 4 to 10 days later, for test–retest reliability evaluation. Of the 202 participants, 72.8% were female; 64.4% had self-reported nonsevere acne. Item reduction resulted in a 14-item OSSAS with Sensation (five items), Tactile (four items) and Visual (four items) domains, a single blotting item, and an overall oiliness item. The OSIS was reduced to two three-item domains assessing Annoyance and Self-Image. Confirmatory factor analysis supported the construct validity of the final item-scale structures. The OSSAS and OSIS scales had acceptable item convergent validity (item-scale correlations >0.40) and floor and ceiling effects (skin severity (P skin (P skin), as assessments of self-reported oily facial skin severity and its emotional impact, respectively.

  8. Better assessment of physical function: item improvement is neglected but essential.

    Science.gov (United States)

    Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

    2009-01-01

    Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models

  9. Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

    Science.gov (United States)

    Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

    2016-01-01

    In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

  10. Writing, Evaluating and Assessing Data Response Items in Economics.

    Science.gov (United States)

    Trotman-Dickenson, D. I.

    1989-01-01

    Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…

  11. Missouri Assessment Program (MAP), Spring 2000: Secondary Science, Released Items, Grade 10.

    Science.gov (United States)

    Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

    This assessment sample provides information on the Missouri Assessment Program (MAP) for grade 10 science. The sample consists of six items taken from the test booklet and scoring guides for the six items. The items assess ecosystems, mechanics, and data analysis. (MM)

  12. Developing a Model for Optimizing Inventory of Repairable Items at Single Operating Base

    OpenAIRE

    Le, Tin

    2016-01-01

    The use of EOQ model in inventory management is popular. However, EOQ models has many disadvantages, especially, when the model is applied to manage repairable items. In order to deal with high-cost and repairable items, Craig C. Sherbrooke introduced a model in his book “Optimal Inventory Modeling of Systems: Multi-Echelon Techniques”. The research focus is to implement and develop a program to execute the single-site in-ventory model for repairable items. The model helps to significantl...

  13. Concurrent Validity and Sensitivity to Change of Direct Behavior Rating Single-Item Scales (DBR-SIS) within an Elementary Sample

    Science.gov (United States)

    Smith, Rhonda L.; Eklund, Katie; Kilgus, Stephen P.

    2018-01-01

    The purpose of this study was to evaluate the concurrent validity, sensitivity to change, and teacher acceptability of Direct Behavior Rating single-item scales (DBR-SIS), a brief progress monitoring measure designed to assess student behavioral change in response to intervention. Twenty-four elementary teacher-student dyads implemented a daily…

  14. The 12-item World Health Organization Disability Assessment Schedule II (WHO-DAS II: a nonparametric item response analysis

    Directory of Open Access Journals (Sweden)

    Fernandez Ana

    2010-05-01

    Full Text Available Abstract Background Previous studies have analyzed the psychometric properties of the World Health Organization Disability Assessment Schedule II (WHO-DAS II using classical omnibus measures of scale quality. These analyses are sample dependent and do not model item responses as a function of the underlying trait level. The main objective of this study was to examine the effectiveness of the WHO-DAS II items and their options in discriminating between changes in the underlying disability level by means of item response analyses. We also explored differential item functioning (DIF in men and women. Methods The participants were 3615 adult general practice patients from 17 regions of Spain, with a first diagnosed major depressive episode. The 12-item WHO-DAS II was administered by the general practitioners during the consultation. We used a non-parametric item response method (Kernel-Smoothing implemented with the TestGraf software to examine the effectiveness of each item (item characteristic curves and their options (option characteristic curves in discriminating between changes in the underliying disability level. We examined composite DIF to know whether women had a higher probability than men of endorsing each item. Results Item response analyses indicated that the twelve items forming the WHO-DAS II perform very well. All items were determined to provide good discrimination across varying standardized levels of the trait. The items also had option characteristic curves that showed good discrimination, given that each increasing option became more likely than the previous as a function of increasing trait level. No gender-related DIF was found on any of the items. Conclusions All WHO-DAS II items were very good at assessing overall disability. Our results supported the appropriateness of the weights assigned to response option categories and showed an absence of gender differences in item functioning.

  15. Developing an African youth psychosocial assessment: an application of item response theory.

    Science.gov (United States)

    Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise

    2014-06-01

    This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.

  16. An Investigation of Item Type in a Standards-Based Assessment.

    Directory of Open Access Journals (Sweden)

    Liz Hollingworth

    2007-12-01

    Full Text Available Large-scale state assessment programs use both multiple-choice and open-ended items on tests for accountability purposes. Certainly, there is an intuitive belief among some educators and policy makers that open-ended items measure something different than multiple-choice items. This study examined two item formats in custom-built, standards-based tests of achievement in Reading and Mathematics at grades 3-8. In this paper, we raise questions about the value of including open-ended items, given scoring costs, time constraints, and the higher probability of missing data from test-takers.

  17. Single-item memory, associative memory, and the human hippocampus

    OpenAIRE

    Gold, Jeffrey J.; Hopkins, Ramona O.; Squire, Larry R.

    2006-01-01

    We tested recognition memory for items and associations in memory-impaired patients with bilateral lesions thought to be limited to the hippocampal region. In Experiment 1 (Combined memory test), participants studied words and then took a memory test in which studied words, new words, studied word pairs, and recombined word pairs were presented in a mixed order. In Experiment 2 (Separated memory test), participants studied single words and then took a memory test involving studied word and ne...

  18. A confirmative clinimetric analysis of the 36-item Family Assessment Device.

    Science.gov (United States)

    Timmerby, Nina; Cosci, Fiammetta; Watson, Maggie; Csillag, Claudio; Schmitt, Florence; Steck, Barbara; Bech, Per; Thastum, Mikael

    2018-02-07

    The Family Assessment Device (FAD) is a 60-item questionnaire widely used to evaluate self-reported family functioning. However, the factor structure as well as the number of items has been questioned. A shorter and more user-friendly version of the original FAD-scale, the 36-item FAD, has therefore previously been proposed, based on findings in a nonclinical population of adults. We aimed in this study to evaluate the brief 36-item version of the FAD in a clinical population. Data from a European multinational study, examining factors associated with levels of family functioning in adult cancer patients' families, were used. Both healthy and ill parents completed the 60-item version FAD. The psychometric analyses conducted were Principal Component Analysis and Mokken-analysis. A total of 564 participants were included. Based on the psychometric analysis we confirmed that the 36-item version of the FAD has robust psychometric properties and can be used in clinical populations. The present analysis confirmed that the 36-item version of the FAD (18 items assessing 'well-being' and 18 items assessing 'dysfunctional' family function) is a brief scale where the summed total score is a valid measure of the dimensions of family functioning. This shorter version of the FAD is, in accordance with the concept of 'measurement-based care', an easy to use scale that could be considered when the aim is to evaluate self-reported family functioning.

  19. 48 CFR 245.7101-3 - DD Form 1348-1, DoD Single Line Item Release/Receipt Document.

    Science.gov (United States)

    2010-10-01

    ... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false DD Form 1348-1, DoD Single Line Item Release/Receipt Document. 245.7101-3 Section 245.7101-3 Federal Acquisition Regulations... PROPERTY Plant Clearance Forms 245.7101-3 DD Form 1348-1, DoD Single Line Item Release/Receipt Document...

  20. Development of the Assessment Items of Debris Flow Using the Delphi Method

    Science.gov (United States)

    Byun, Yosep; Seong, Joohyun; Kim, Mingi; Park, Kyunghan; Yoon, Hyungkoo

    2016-04-01

    In recent years in Korea, Typhoon and the localized extreme rainfall caused by the abnormal climate has increased. Accordingly, debris flow is becoming one of the most dangerous natural disaster. This study aimed to develop the assessment items which can be used for conducting damage investigation of debris flow. Delphi method was applied to classify the realms of assessment items. As a result, 29 assessment items which can be classified into 6 groups were determined.

  1. Psychometric properties of a single-item scale to assess sleep quality among individuals with fibromyalgia

    Directory of Open Access Journals (Sweden)

    Sadosky Alesia B

    2009-06-01

    Full Text Available Abstract Background Sleep disturbances are a common and bothersome symptom of fibromyalgia (FM. This study reports psychometric properties of a single-item scale to assess sleep quality among individuals with FM. Methods Analyses were based on data from two randomized, double-blind, placebo-controlled trials of pregabalin (studies 1056 and 1077. In a daily diary, patients reported the quality of their sleep on a numeric rating scale ranging from 0 ("best possible sleep" to 10 ("worst possible sleep". Test re-test reliability of the Sleep Quality Scale was evaluated by computing intraclass correlation coefficients. Pearson correlation coefficients were computed between baseline Sleep Quality scores and baseline pain diary and Medical Outcomes Study (MOS Sleep scores. Responsiveness to treatment was evaluated by standardized effect sizes computed as the difference between least squares mean changes in Sleep Quality scores in the pregabalin and placebo groups divided by the standard deviation of Sleep Quality scores across all patients at baseline. Results Studies 1056 and 1077 included 748 and 745 patients, respectively. Most patients were female (study 1056: 94.4%; study 1077: 94.5% and white (study 1056: 90.2%; study 1077: 91.0%. Mean ages were 48.8 years (study 1056 and 50.1 years (study 1077. Test re-test reliability coefficients of the Sleep Quality Scale were 0.91 and 0.90 in the 1056 and 1077 studies, respectively. Pearson correlation coefficients between baseline Sleep Quality scores and baseline pain diary scores were 0.64 (p Conclusion These results provide evidence of the reproducibility, convergent validity, and responsiveness to treatment of the Sleep Quality Scale and provide a foundation for its further use and evaluation in FM patients.

  2. Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

    Science.gov (United States)

    Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

    2015-06-01

    This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.

  3. Assessment of Differential Item Functioning in the Experiences of Discrimination Index

    Science.gov (United States)

    Cunningham, Timothy J.; Berkman, Lisa F.; Gortmaker, Steven L.; Kiefe, Catarina I.; Jacobs, David R.; Seeman, Teresa E.; Kawachi, Ichiro

    2011-01-01

    The psychometric properties of instruments used to measure self-reported experiences of discrimination in epidemiologic studies are rarely assessed, especially regarding construct validity. The authors used 2000–2001 data from the Coronary Artery Risk Development in Young Adults (CARDIA) Study to examine differential item functioning (DIF) in 2 versions of the Experiences of Discrimination (EOD) Index, an index measuring self-reported experiences of racial/ethnic and gender discrimination. DIF may confound interpretation of subgroup differences. Large DIF was observed for 2 of 7 racial/ethnic discrimination items: White participants reported more racial/ethnic discrimination for the “at school” item, and black participants reported more racial/ethnic discrimination for the “getting housing” item. The large DIF by race/ethnicity in the index for racial/ethnic discrimination probably reflects item impact and is the result of valid group differences between blacks and whites regarding their respective experiences of discrimination. The authors also observed large DIF by race/ethnicity for 3 of 7 gender discrimination items. This is more likely to have been due to item bias. Users of the EOD Index must consider the advantages and disadvantages of DIF adjustment (omitting items, constructing separate measures, and retaining items). The EOD Index has substantial usefulness as an instrument that can assess self-reported experiences of discrimination. PMID:22038104

  4. The role of attention in item-item binding in visual working memory.

    Science.gov (United States)

    Peterson, Dwight J; Naveh-Benjamin, Moshe

    2017-09-01

    An important yet unresolved question regarding visual working memory (VWM) relates to whether or not binding processes within VWM require additional attentional resources compared with processing solely the individual components comprising these bindings. Previous findings indicate that binding of surface features (e.g., colored shapes) within VWM is not demanding of resources beyond what is required for single features. However, it is possible that other types of binding, such as the binding of complex, distinct items (e.g., faces and scenes), in VWM may require additional resources. In 3 experiments, we examined VWM item-item binding performance under no load, articulatory suppression, and backward counting using a modified change detection task. Binding performance declined to a greater extent than single-item performance under higher compared with lower levels of concurrent load. The findings from each of these experiments indicate that processing item-item bindings within VWM requires a greater amount of attentional resources compared with single items. These findings also highlight an important distinction between the role of attention in item-item binding within VWM and previous studies of long-term memory (LTM) where declines in single-item and binding test performance are similar under divided attention. The current findings provide novel evidence that the specific type of binding is an important determining factor regarding whether or not VWM binding processes require attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  5. Development and validation of the Single Item Trait Empathy Scale (SITES).

    Science.gov (United States)

    Konrath, Sara; Meier, Brian P; Bushman, Brad J

    2018-04-01

    Empathy involves feeling compassion for others and imagining how they feel. In this article, we develop and validate the Single Item Trait Empathy Scale (SITES), which contains only one item that takes seconds to complete. In seven studies (N=5,724), the SITES was found to be both reliable and valid. It correlated in expected ways with a wide variety of intrapersonal outcomes. For example, it is negatively correlated with narcissism, depression, anxiety, and alexithymia. In contrast, it is positively correlated with other measures of empathy, self-esteem, subjective well-being, and agreeableness. The SITES also correlates with a wide variety of interpersonal outcomes, especially compassion for others and helping others. The SITES is recommended in situations when time or question quantity is constrained.

  6. Robustness of two single-item self-esteem measures: cross-validation with a measure of stigma in a sample of psychiatric patients.

    Science.gov (United States)

    Bagley, Christopher

    2005-08-01

    Robins' Single-item Self-esteem Inventory was compared with a single item from the Coopersmith Self-esteem. Although a new scoring format was used, there was good evidence of cross-validation in 83 current and former psychiatric patients who completed Harvey's adapted measure of stigma felt and experienced by users of mental health services. Scores on the two single-item self-esteem measures correlated .76 (p self-esteem in users of mental health services.

  7. A single-item global job satisfaction measure is associated with quantitative blood immune indices in white-collar employees.

    Science.gov (United States)

    Nakata, Akinori; Irie, Masahiro; Takahashi, Masaya

    2013-01-01

    Although a single-item job satisfaction measure has been shown to be reliable and inclusive as multiple-item scales in relation to health, studies including immunological data are few. The purpose of this study was to evaluate the validity of single-item job and family life satisfaction based on its association with immune indices. A total of 189 white-collar employees (70% men) underwent a blood draw for the measurement of natural killer (NK), total T, and B cell counts as well as plasma immunoglobulin (Ig) G concentrations and completed single-item job and family life satisfaction measures, respectively. The response options for satisfaction measures were 'dissatisfied' (coded 1) to 'satisfied' (coded 4). Spearman's partial correlations controlling for cofactors revealed that increased job satisfaction was positively associated with NK cells (rsp=0.201, p=0.007) and IgG (rsp=0.178, p=0.018), while family life satisfaction was unrelated to immune indices. Those who reported a combination of low job/low family life satisfaction had significantly lower NK and higher B cell counts than those with a high job/high family life satisfaction. Our study suggests that the single-item summary measure of job satisfaction, but not family life satisfaction, may be a valid tool to evaluate immune status in healthy white-collar employees.

  8. Recommended core items to assess e-cigarette use in population-based surveys.

    Science.gov (United States)

    Pearson, Jennifer L; Hitchman, Sara C; Brose, Leonie S; Bauld, Linda; Glasser, Allison M; Villanti, Andrea C; McNeill, Ann; Abrams, David B; Cohen, Joanna E

    2018-05-01

    A consistent approach using standardised items to assess e-cigarette use in both youth and adult populations will aid cross-survey and cross-national comparisons of the effect of e-cigarette (and tobacco) policies and improve our understanding of the population health impact of e-cigarette use. Focusing on adult behaviour, we propose a set of e-cigarette use items, discuss their utility and potential adaptation, and highlight e-cigarette constructs that researchers should avoid without further item development. Reliable and valid items will strengthen the emerging science and inform knowledge synthesis for policy-making. Building on informal discussions at a series of international meetings of 65 experts from 15 countries, the authors provide recommendations for assessing e-cigarette use behaviour, relative perceived harm, device type, presence of nicotine, flavours and reasons for use. We recommend items assessing eight core constructs: e-cigarette ever use, frequency of use and former daily use; relative perceived harm; device type; primary flavour preference; presence of nicotine; and primary reason for use. These items should be standardised or minimally adapted for the policy context and target population. Researchers should be prepared to update items as e-cigarette device characteristics change. A minimum set of e-cigarette items is proposed to encourage consensus around items to allow for cross-survey and cross-jurisdictional comparisons of e-cigarette use behaviour. These proposed items are a starting point. We recognise room for continued improvement, and welcome input from e-cigarette users and scientific colleagues. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  9. Normative data for the 12 item WHO Disability Assessment Schedule 2.0.

    Directory of Open Access Journals (Sweden)

    Gavin Andrews

    Full Text Available BACKGROUND: The World Health Organization Disability Assessment Schedule (WHODAS 2.0 measures disability due to health conditions including diseases, illnesses, injuries, mental or emotional problems, and problems with alcohol or drugs. METHOD: The 12 Item WHODAS 2.0 was used in the second Australian Survey of Mental Health and Well-being. We report the overall factor structure and the distribution of scores and normative data (means and SDs for people with any physical disorder, any mental disorder and for people with neither. FINDINGS: A single second order factor justifies the use of the scale as a measure of global disability. People with mental disorders had high scores (mean 6.3, SD 7.1, people with physical disorders had lower scores (mean 4.3, SD 6.1. People with no disorder covered by the survey had low scores (mean 1.4, SD 3.6. INTERPRETATION: The provision of normative data from a population sample of adults will facilitate use of the WHODAS 2.0 12 item scale in clinical and epidemiological research.

  10. A single-item self-report medication adherence question predicts hospitalisation and death in patients with heart failure.

    Science.gov (United States)

    Wu, Jia-Rong; DeWalt, Darren A; Baker, David W; Schillinger, Dean; Ruo, Bernice; Bibbins-Domingo, Kristen; Macabasco-O'Connell, Aurelia; Holmes, George M; Broucksou, Kimberly A; Erman, Brian; Hawk, Victoria; Cene, Crystal W; Jones, Christine DeLong; Pignone, Michael

    2014-09-01

    To determine whether a single-item self-report medication adherence question predicts hospitalisation and death in patients with heart failure. Poor medication adherence is associated with increased morbidity and mortality. Having a simple means of identifying suboptimal medication adherence could help identify at-risk patients for interventions. We performed a prospective cohort study in 592 participants with heart failure within a four-site randomised trial. Self-report medication adherence was assessed at baseline using a single-item question: 'Over the past seven days, how many times did you miss a dose of any of your heart medication?' Participants who reported no missing doses were defined as fully adherent, and those missing more than one dose were considered less than fully adherent. The primary outcome was combined all-cause hospitalisation or death over one year and the secondary endpoint was heart failure hospitalisation. Outcomes were assessed with blinded chart reviews, and heart failure outcomes were determined by a blinded adjudication committee. We used negative binomial regression to examine the relationship between medication adherence and outcomes. Fifty-two percent of participants were 52% male, mean age was 61 years, and 31% were of New York Heart Association class III/IV at enrolment; 72% of participants reported full adherence to their heart medicine at baseline. Participants with full medication adherence had a lower rate of all-cause hospitalisation and death (0·71 events/year) compared with those with any nonadherence (0·86 events/year): adjusted-for-site incidence rate ratio was 0·83, fully adjusted incidence rate ratio 0·68. Incidence rate ratios were similar for heart failure hospitalisations. A single medication adherence question at baseline predicts hospitalisation and death over one year in heart failure patients. Medication adherence is associated with all-cause and heart failure-related hospitalisation and death in heart

  11. Development and evaluation of CAHPS survey items assessing how well healthcare providers address health literacy.

    Science.gov (United States)

    Weidmer, Beverly A; Brach, Cindy; Hays, Ron D

    2012-09-01

    The complexity of health information often exceeds patients' skills to understand and use it. To develop survey items assessing how well healthcare providers communicate health information. Domains and items for the Consumer Assessment of Healthcare Providers and Systems (CAHPS) Item Set for Addressing Health Literacy were identified through an environmental scan and input from stakeholders. The draft item set was translated into Spanish and pretested in both English and Spanish. The revised item set was field tested with a randomly selected sample of adult patients from 2 sites using mail and telephonic data collection. Item-scale correlations, confirmatory factor analysis, and internal consistency reliability estimates were estimated to assess how well the survey items performed and identify composite measures. Finally, we regressed the CAHPS global rating of the provider item on the CAHPS core communication composite and the new health literacy composites. A total of 601 completed surveys were obtained (52% response rate). Two composite measures were identified: (1) Communication to Improve Health Literacy (16 items); and (2) How Well Providers Communicate About Medicines (6 items). These 2 composites were significantly uniquely associated with the global rating of the provider (communication to improve health literacy: PLiteracy composite accounted for 90% of the variance of the original 16-item composite. This study provides support for reliability and validity of the CAHPS Item Set for Addressing Health Literacy. These items can serve to assess whether healthcare providers have communicated effectively with their patients and as a tool for quality improvement.

  12. Psychometric properties of the Global Operative Assessment of Laparoscopic Skills (GOALS) using item response theory.

    Science.gov (United States)

    Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C

    2017-02-01

    The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Comparison of Alternate and Original Items on the Montreal Cognitive Assessment.

    Science.gov (United States)

    Lebedeva, Elena; Huang, Mei; Koski, Lisa

    2016-03-01

    The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.

  14. Do people with and without medical conditions respond similarly to the short health anxiety inventory? An assessment of differential item functioning using item response theory.

    Science.gov (United States)

    LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G

    2015-04-01

    Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Combining item response theory with multiple imputation to equate health assessment questionnaires.

    Science.gov (United States)

    Gu, Chenyang; Gutman, Roee

    2017-09-01

    The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.

  16. Maslach Burnout Inventory and a Self-Defined, Single-Item Burnout Measure Produce Different Clinician and Staff Burnout Estimates.

    Science.gov (United States)

    Knox, Margae; Willard-Grace, Rachel; Huang, Beatrice; Grumbach, Kevin

    2018-06-04

    Clinicians and healthcare staff report high levels of burnout. Two common burnout assessments are the Maslach Burnout Inventory (MBI) and a single-item, self-defined burnout measure. Relatively little is known about how the measures compare. To identify the sensitivity, specificity, and concurrent validity of the self-defined burnout measure compared to the more established MBI measure. Cross-sectional survey (November 2016-January 2017). Four hundred forty-four primary care clinicians and 606 staff from three San Francisco Aarea healthcare systems. The MBI measure, calculated from a high score on either the emotional exhaustion or cynicism subscale, and a single-item measure of self-defined burnout. Concurrent validity was assessed using a validated, 7-item team culture scale as reported by Willard-Grace et al. (J Am Board Fam Med 27(2):229-38, 2014) and a standard question about workplace atmosphere as reported by Rassolian et al. (JAMA Intern Med 177(7):1036-8, 2017) and Linzer et al. (Ann Intern Med 151(1):28-36, 2009). Similar to other nationally representative burnout estimates, 52% of clinicians (95% CI: 47-57%) and 46% of staff (95% CI: 42-50%) reported high MBI emotional exhaustion or high MBI cynicism. In contrast, 29% of clinicians (95% CI: 25-33%) and 31% of staff (95% CI: 28-35%) reported "definitely burning out" or more severe symptoms on the self-defined burnout measure. The self-defined measure's sensitivity to correctly identify MBI-assessed burnout was 50.4% for clinicians and 58.6% for staff; specificity was 94.7% for clinicians and 92.3% for staff. Area under the receiver operator curve was 0.82 for clinicians and 0.81 for staff. Team culture and atmosphere were significantly associated with both self-defined burnout and the MBI, confirming concurrent validity. Point estimates of burnout notably differ between the self-defined and MBI measures. Compared to the MBI, the self-defined burnout measure misses half of high-burnout clinicians and more

  17. Rats Remember Items in Context Using Episodic Memory.

    Science.gov (United States)

    Panoz-Brown, Danielle; Corbin, Hannah E; Dalecki, Stefan J; Gentry, Meredith; Brotheridge, Sydney; Sluka, Christina M; Wu, Jie-En; Crystal, Jonathon D

    2016-10-24

    Vivid episodic memories in people have been characterized as the replay of unique events in sequential order [1-3]. Animal models of episodic memory have successfully documented episodic memory of a single event (e.g., [4-8]). However, a fundamental feature of episodic memory in people is that it involves multiple events, and notably, episodic memory impairments in human diseases are not limited to a single event. Critically, it is not known whether animals remember many unique events using episodic memory. Here, we show that rats remember many unique events and the contexts in which the events occurred using episodic memory. We used an olfactory memory assessment in which new (but not old) odors were rewarded using 32 items. Rats were presented with 16 odors in one context and the same odors in a second context. To attain high accuracy, the rats needed to remember item in context because each odor was rewarded as a new item in each context. The demands on item-in-context memory were varied by assessing memory with 2, 3, 5, or 15 unpredictable transitions between contexts, and item-in-context memory survived a 45 min retention interval challenge. When the memory of item in context was put in conflict with non-episodic familiarity cues, rats relied on item in context using episodic memory. Our findings suggest that rats remember multiple unique events and the contexts in which these events occurred using episodic memory and support the view that rats may be used to model fundamental aspects of human cognition. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. Single-item screening for agoraphobic symptoms : validation of a web-based audiovisual screening instrument

    NARCIS (Netherlands)

    van Ballegooijen, Wouter; Riper, Heleen; Donker, Tara; Martin Abello, Katherina; Marks, Isaac; Cuijpers, Pim

    2012-01-01

    The advent of web-based treatments for anxiety disorders creates a need for quick and valid online screening instruments, suitable for a range of social groups. This study validates a single-item multimedia screening instrument for agoraphobia, part of the Visual Screener for Common Mental Disorders

  19. Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

    Science.gov (United States)

    Alsadaawi, Abdullah Saleh

    2017-01-01

    The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…

  20. Assessment of Preference for Edible and Leisure Items in Individuals with Dementia

    Science.gov (United States)

    Ortega, Javier Virues; Iwata, Brian A.; Nogales-Gonzalez, Celia; Frades, Belen

    2012-01-01

    We conducted 2 studies on reinforcer preference in patients with dementia. Results of preference assessments yielded differential selections by 14 participants. Unlike prior studies with individuals with intellectual disabilities, all participants showed a noticeable preference for leisure items over edible items. Results of a subsequent analysis…

  1. Assessing Differential Item Functioning on the Test of Relational Reasoning

    Directory of Open Access Journals (Sweden)

    Denis Dumas

    2018-03-01

    Full Text Available The test of relational reasoning (TORR is designed to assess the ability to identify complex patterns within visuospatial stimuli. The TORR is designed for use in school and university settings, and therefore, its measurement invariance across diverse groups is critical. In this investigation, a large sample, representative of a major university on key demographic variables, was collected, and the resulting data were analyzed using a multi-group, multidimensional item-response theory model-comparison procedure. No significant differential item functioning was found on any of the TORR items across any of the demographic groups of interest. This finding is interpreted as evidence of the cultural fairness of the TORR, and potential test-development choices that may have contributed to that cultural fairness are discussed.

  2. Assessment of the Item Selection and Weighting in the Birmingham Vasculitis Activity Score for Wegener's Granulomatosis

    Science.gov (United States)

    MAHR, ALFRED D.; NEOGI, TUHINA; LAVALLEY, MICHAEL P.; DAVIS, JOHN C.; HOFFMAN, GARY S.; MCCUNE, W. JOSEPH; SPECKS, ULRICH; SPIERA, ROBERT F.; ST.CLAIR, E. WILLIAM; STONE, JOHN H.; MERKEL, PETER A.

    2013-01-01

    Objective To assess the Birmingham Vasculitis Activity Score for Wegener's Granulomatosis (BVAS/WG) with respect to its selection and weighting of items. Methods This study used the BVAS/WG data from the Wegener's Granulomatosis Etanercept Trial. The scoring frequencies of the 34 predefined items and any “other” items added by clinicians were calculated. Using linear regression with generalized estimating equations in which the physician global assessment (PGA) of disease activity was the dependent variable, we computed weights for all predefined items. We also created variables for clinical manifestations frequently added as other items, and computed weights for these as well. We searched for the model that included the items and their generated weights yielding an activity score with the highest R2 to predict the PGA. Results We analyzed 2,044 BVAS/WG assessments from 180 patients; 734 assessments were scored during active disease. The highest R2 with the PGA was obtained by scoring WG activity based on the following items: the 25 predefined items rated on ≥5 visits, the 2 newly created fatigue and weight loss variables, the remaining minor other and major other items, and a variable that signified whether new or worse items were present at a specific visit. The weights assigned to the items ranged from 1 to 21. Compared with the original BVAS/WG, this modified score correlated significantly more strongly with the PGA. Conclusion This study suggests possibilities to enhance the item selection and weighting of the BVAS/WG. These changes may increase this instrument's ability to capture the continuum of disease activity in WG. PMID:18512722

  3. Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

    Science.gov (United States)

    Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

    2015-01-01

    Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…

  4. Face validity of the single work ability item

    DEFF Research Database (Denmark)

    Gupta, Nidhi; Jensen, Bjørn Søvsø; Søgaard, Karen

    2014-01-01

    with a total of 5,810 h, including 2,640 working hours. RESULTS: A significant moderate correlation between work ability and %HRR was observed among males (R = -0.33, P = 0.005), but not among females (R = 0.11, P = 0.431). In a gender-stratified multi-adjusted logistic regression analysis, males with high...... %HRR were more likely to report a reduced work ability compared to males with low %HRR [OR = 4.75, 95% confidence interval (95% CI) = 1.31 to 17.25]. However, this association was not found among females (OR = 0.26, 95% CI 0.03 to 2.16), and a significant interaction between work ability, %HRR......PURPOSE: The purpose of this study was to investigate the face validity of the self-reported single item work ability with objectively measured heart rate reserve (%HRR) among blue-collar workers. METHODS: We utilized data from 127 blue-collar workers (Female = 53; Male = 74) aged 18-65 years from...

  5. The impact of item order on ratings of cancer risk perception.

    Science.gov (United States)

    Taylor, Kathryn L; Shelby, Rebecca A; Schwartz, Marc D; Ackerman, Josh; LaSalle, V Holland; Gelmann, Edward P; McGuire, Colleen

    2002-07-01

    Although perceived risk is central to most theories of health behavior, there is little consensus on its measurement with regard to item wording, response set, or the number of items to include. In a methodological assessment of perceived risk, we assessed the impact of changing the order of three commonly used perceived risk items: quantitative personal risk, quantitative population risk, and comparative risk. Participants were 432 men and women enrolled in an ancillary study of the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Three groups of consecutively enrolled participants responded to the three items in one of three question orders. Results indicated that item order was related to the perceived risk ratings of both ovarian (P Perceptions of risk were significantly lower when the comparative rating was made first. The findings suggest that compelling participants to consider their own risk relative to the risk of others results in lower ratings of perceived risk. Although the use of multiple items may provide more information than when only a single method is used, different conclusions may be reached depending on the context in which an item is assessed.

  6. Communicating Quantitative Literacy: An Examination of Open-Ended Assessment Items in TIMSS, NALS, IALS, and PISA

    Directory of Open Access Journals (Sweden)

    Karl W. Kosko

    2011-07-01

    Full Text Available Quantitative Literacy (QL has been described as the skill set an individual uses when interacting with the world in a quantitative manner. A necessary component of this interaction is communication. To this end, assessments of QL have included open-ended items as a means of including communicative aspects of QL. The present study sought to examine whether such open-ended items typically measured aspects of quantitative communication, as compared to mathematical communication, or mathematical skills. We focused on public-released items and rubrics from four of the most widely referenced assessments: the Third International Mathematics and Science Study (TIMSS-95: the National Adult Literacy Survey (NALS; now the National Assessment of Adult Literacy, NAAL in 1985 and 1992, the International Adult Literacy Skills (IALS beginning in 1994; and the Program for International Student Assessment (PISA beginning in 2000. We found that open-ended item rubrics in these QL assessments showed a strong tendency to assess answer-only responses. Therefore, while some open-ended items may have required certain levels of quantitative reasoning to find a solution, it is the solution rather than the reasoning that was often assessed.

  7. Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

    Science.gov (United States)

    Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

    2014-01-01

    Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753

  8. Development of an assessment tool to measure students′ perceptions of respiratory care education programs: Item generation, item reduction, and preliminary validation

    Directory of Open Access Journals (Sweden)

    Ghazi Alotaibi

    2013-01-01

    Full Text Available Objectives: Students who perceived their learning environment positively are more likely to develop effective learning strategies, and adopt a deep learning approach. Currently, there is no validated instrument for measuring the educational environment of educational programs on respiratory care (RC. The aim of this study was to develop an instrument to measure students′ perception of the RC educational environment. Materials and Methods: Based on the literature review and an assessment of content validity by multiple focus groups of RC educationalists, potential items of the instrument relevant to RC educational environment construct were generated by the research group. The initial 71 item questionnaire was then field-tested on all students from the 3 RC programs in Saudi Arabia and was subjected to multi-trait scaling analysis. Cronbach′s alpha was used to assess internal consistency reliabilities. Results: Two hundred and twelve students (100% completed the survey. The initial instrument of 71 items was reduced to 65 across 5 scales. Convergent and discriminant validity assessment demonstrated that the majority of items correlated more highly with their intended scale than a competing one. Cronbach′s alpha exceeded the standard criterion of >0.70 in all scales except one. There was no floor or ceiling effect for scale or overall score. Conclusions: This instrument is the first assessment tool developed to measure the RC educational environment. There was evidence of its good feasibility, validity, and reliability. This first validation of the instrument supports its use by RC students to evaluate educational environment.

  9. Item level diagnostics and model - data fit in item response theory ...

    African Journals Online (AJOL)

    Item response theory (IRT) is a framework for modeling and analyzing item response data. Item-level modeling gives IRT advantages over classical test theory. The fit of an item score pattern to an item response theory (IRT) models is a necessary condition that must be assessed for further use of item and models that best fit ...

  10. Small group learning: effect on item analysis and accuracy of self-assessment of medical students.

    Science.gov (United States)

    Biswas, Shubho Subrata; Jain, Vaishali; Agrawal, Vandana; Bindra, Maninder

    2015-01-01

    Small group sessions are regarded as a more active and student-centered approach to learning. Item analysis provides objective evidence of whether such sessions improve comprehension and make the topic easier for students, in addition to assessing the relative benefit of the sessions to good versus poor performers. Self-assessment makes students aware of their deficiencies. Small group sessions can also help students develop the ability to self-assess. This study was carried out to assess the effect of small group sessions on item analysis and students' self-assessment. A total of 21 female and 29 male first year medical students participated in a small group session on topics covered by didactic lectures two weeks earlier. It was preceded and followed by two multiple choice question (MCQ) tests, in which students were asked to self-assess their likely score. The MCQs used were item analyzed in a previous group and were chosen of matching difficulty and discriminatory indices for the pre- and post-tests. The small group session improved the marks of both genders equally, but female performance was better. The session made the items easier; increasing the difficulty index significantly but there was no significant alteration in the discriminatory index. There was overestimation in the self-assessment of both genders, but male overestimation was greater. The session improved the self-assessment of students in terms of expected marks and expectation of passing. Small group session improved the ability of students to self-assess their knowledge and increased the difficulty index of items reflecting students' better performance.

  11. Advanced Marketing Core Curriculum. Test Items and Assessment Techniques.

    Science.gov (United States)

    Smith, Clifton L.; And Others

    This document contains duties and tasks, multiple-choice test items, and other assessment techniques for Missouri's advanced marketing core curriculum. The core curriculum begins with a list of 13 suggested textbook resources. Next, nine duties with their associated tasks are given. Under each task appears one or more citations to appropriate…

  12. Development of a questionnaire to assess patient satisfaction with allergen-specific immunotherapy in adults: item generation, item reduction, and preliminary validation

    Directory of Open Access Journals (Sweden)

    Justícia JL

    2011-05-01

    Full Text Available Jose Luis Justícia1, Eva Baró2, Victoria Cardona3, Pedro Guardia4, Pedro Ojeda5, José Maria Olaguíbel6, José Maria Vega7, Carmen Vidal81Medical Department, Stallergenes Ibérica, Barcelona, Spain; 2Health Outcomes Research Department, 3D Health Research, Barcelona, Spain; 3Hospital Vall d'Hebron, Barcelona, Spain; 4Hospital Virgen Macarena, Sevilla, Spain; 5Clínica de Asma y Alergia Dres. Ojeda, Madrid, Spain; 6Complejo Hospitalario de Navarra, Pamplona, Spain; 7Hospital Regional Universitario Carlos Haya Málaga, Spain; 8Complejo Hospitalario Universitario de Santiago, Santiago de Compostela, SpainBackground: Allergen-specific immunotherapy (SIT is a treatment capable of modifying the natural course of allergy, so ensuring good adherence to SIT is fundamental. Up until now there has not existed an instrument specifically developed to measure patient satisfaction with SIT, although its assessment could help us to comprehend better and improve treatment adherence and effectiveness. The aim of this study was to develop an instrument to measure adult patient satisfaction with SIT.Methods: Items were generated from a literature review, focus groups with allergic adult patients undergoing SIT, and a meeting with experts. Potential items were administered to allergic patients undergoing SIT in an observational, cross-sectional, multicenter study. Item reduction was based on quantitative and qualitative criteria. A preliminary assessment of feasibility, reliability, and validity of the retained items was performed.Results: An initial pool of 70 items was administered to 257 patients undergoing SIT. Fifty-four items were eliminated resulting in a provisional instrument with 16 items. Factor analysis yielded four factors that were identified as perceived efficacy, activities and environment, cost-benefit balance, and overall satisfaction, explaining 74.8% of variance. Ceiling and floor effects were negligible for overall score. Overall score was

  13. Item difficulty of multiple choice tests dependant on different item response formats – An experiment in fundamental research on psychological assessment

    Directory of Open Access Journals (Sweden)

    KLAUS D. KUBINGER

    2007-12-01

    Full Text Available Multiple choice response formats are problematical as an item is often scored as solved simply because the test-taker is a lucky guesser. Instead of applying pertinent IRT models which take guessing effects into account, a pragmatic approach of re-conceptualizing multiple choice response formats to reduce the chance of lucky guessing is considered. This paper compares the free response format with two different multiple choice formats. A common multiple choice format with a single correct response option and five distractors (“1 of 6” is used, as well as a multiple choice format with five response options, of which any number of the five is correct and the item is only scored as mastered if all the correct response options and none of the wrong ones are marked (“x of 5”. An experiment was designed, using pairs of items with exactly the same content but different response formats. 173 test-takers were randomly assigned to two test booklets of 150 items altogether. Rasch model analyses adduced a fitting item pool, after the deletion of 39 items. The resulting item difficulty parameters were used for the comparison of the different formats. The multiple choice format “1 of 6” differs significantly from “x of 5”, with a relative effect of 1.63, while the multiple choice format “x of 5” does not significantly differ from the free response format. Therefore, the lower degree of difficulty of items with the “1 of 6” multiple choice format is an indicator of relevant guessing effects. In contrast the “x of 5” multiple choice format can be seen as an appropriate substitute for free response format.

  14. Goodness-of-Fit Assessment of Item Response Theory Models

    Science.gov (United States)

    Maydeu-Olivares, Alberto

    2013-01-01

    The article provides an overview of goodness-of-fit assessment methods for item response theory (IRT) models. It is now possible to obtain accurate "p"-values of the overall fit of the model if bivariate information statistics are used. Several alternative approaches are described. As the validity of inferences drawn on the fitted model…

  15. Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

    Science.gov (United States)

    Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

    2014-05-01

    The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.

  16. Gender-Based Differential Item Performance in Mathematics Achievement Items.

    Science.gov (United States)

    Doolittle, Allen E.; Cleary, T. Anne

    1987-01-01

    Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)

  17. Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

    Science.gov (United States)

    Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

    2014-01-01

    Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.

  18. International Assessment: A Rasch Model and Teachers' Evaluation of TIMSS Science Achievement Items

    Science.gov (United States)

    Glynn, Shawn M.

    2012-01-01

    The Trends in International Mathematics and Science Study (TIMSS) is a comparative assessment of the achievement of students in many countries. In the present study, a rigorous independent evaluation was conducted of a representative sample of TIMSS science test items because item quality influences the validity of the scores used to inform…

  19. Gender differences in national assessment of educational progress science items: What does i don't know really mean?

    Science.gov (United States)

    Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth

    The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.

  20. Examining the Psychometric Quality of Multiple-Choice Assessment Items using Mokken Scale Analysis.

    Science.gov (United States)

    Wind, Stefanie A

    The concept of invariant measurement is typically associated with Rasch measurement theory (Engelhard, 2013). Concerned with the appropriateness of the parametric transformation upon which the Rasch model is based, Mokken (1971) proposed a nonparametric procedure for evaluating the quality of social science measurement that is theoretically and empirically related to the Rasch model. Mokken's nonparametric procedure can be used to evaluate the quality of dichotomous and polytomous items in terms of the requirements for invariant measurement. Despite these potential benefits, the use of Mokken scaling to examine the properties of multiple-choice (MC) items in education has not yet been fully explored. A nonparametric approach to evaluating MC items is promising in that this approach facilitates the evaluation of assessments in terms of invariant measurement without imposing potentially inappropriate transformations. Using Rasch-based indices of measurement quality as a frame of reference, data from an eighth-grade physical science assessment are used to illustrate and explore Mokken-based techniques for evaluating the quality of MC items. Implications for research and practice are discussed.

  1. Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

    Science.gov (United States)

    Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

    2015-08-19

    Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms

  2. The validity of the Satisfaction with Life Scale in adolescents and a comparison with single-item life satisfaction measures: a preliminary study.

    Science.gov (United States)

    Jovanović, Veljko

    2016-12-01

    The validity of the life satisfaction measures commonly used among adults has been rarely examined in adolescent samples. The present research had two main goals: (1) to evaluate the structural validity of the Satisfaction with Life Scale (SWLS) among adolescents and to test measurement invariance across gender; (2) to compare the criterion and convergent validity of the SWLS and single-item life satisfaction measures among adolescents. Three samples of Serbian adolescents were recruited for the present research. Study 1 (N = 481, M age  = 17.01 years) examined the structure of the SWLS via confirmatory factor analysis (CFA) and evaluated measurement invariance of the SWLS across gender by a multi-group CFA. Study 2 (N = 283, M age  = 17.34 years) and Study 3 (N = 220, M age  = 16.73 years) compared the convergent validity of the SWLS and single-item life satisfaction measures. The results of Study 1 supported the original one-factor model of the SWLS among adolescents and provided evidence for strong measurement invariance of the SWLS across gender. The findings of Study 2 and Study 3 showed that the SWLS and single-item measures were equally valid and strongly associated (r = .734 in Study 2 and r = .668 in Study 3). No substantial differences in correlations with school success and well-being indicators were found between the SWLS and single-item measures. Our findings support the use of the SWLS among adolescents and indicate that single-item life satisfaction measures perform as well as the SWLS in adolescent samples.

  3. Comparison of Classical Test Theory and Item Response Theory in Individual Change Assessment

    NARCIS (Netherlands)

    Jabrayilov, Ruslan; Emons, Wilco H. M.; Sijtsma, Klaas

    2016-01-01

    Clinical psychologists are advised to assess clinical and statistical significance when assessing change in individual patients. Individual change assessment can be conducted using either the methodologies of classical test theory (CTT) or item response theory (IRT). Researchers have been optimistic

  4. Modeling the World Health Organization Disability Assessment Schedule II using non-parametric item response models.

    Science.gov (United States)

    Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana

    2015-03-01

    The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology. Copyright © 2014 John Wiley & Sons, Ltd.

  5. Development and validation of a ten-item questionnaire with explanatory illustrations to assess upper extremity disorders: favorable effect of illustrations in the item reduction process.

    Science.gov (United States)

    Kurimoto, Shigeru; Suzuki, Mikako; Yamamoto, Michiro; Okui, Nobuyuki; Imaeda, Toshihiko; Hirata, Hitoshi

    2011-11-01

    The purpose of this study is to develop a short and valid measure for upper extremity disorders and to assess the effect of attached illustrations in item reduction of a self-administered disability questionnaire while retaining psychometric properties. A validated questionnaire used to assess upper extremity disorders, the Hand20, was reduced to ten items using two item-reduction techniques. The psychometric properties of the abbreviated form, the Hand10, were evaluated on an independent sample that was used for the shortening process. Validity, reliability, and responsiveness of the Hand10 were retained in the item reduction process. It was possible that the use of explanatory illustrations attached to the Hand10 helped with its reproducibility. The illustrations for the Hand10 promoted text comprehension and motivation to answer the items. These changes resulted in high acceptability; more than 99.3% of patients, including 98.5% of elderly patients, could complete the Hand10 properly. The illustrations had favorable effects on the item reduction process and made it possible to retain precision of the instrument. The Hand10 is a reliable and valid instrument for individual-level applications with the advantage of being compact and broadly applicable, even in elderly individuals.

  6. Assessing nicotine dependence in adolescent E-cigarette users: The 4-item Patient-Reported Outcomes Measurement Information System (PROMIS) Nicotine Dependence Item Bank for electronic cigarettes.

    Science.gov (United States)

    Morean, Meghan E; Krishnan-Sarin, Suchitra; S O'Malley, Stephanie

    2018-04-26

    Adolescent e-cigarette use (i.e., "vaping") likely confers risk for developing nicotine dependence. However, there have been no studies assessing e-cigarette nicotine dependence in youth. We evaluated the psychometric properties of the 4-item Patient-Reported Outcomes Measurement Information System Nicotine Dependence Item Bank for E-cigarettes (PROMIS-E) for assessing youth e-cigarette nicotine dependence and examined risk factors for experiencing stronger dependence symptoms. In 2017, 520 adolescent past-month e-cigarette users completed the PROMIS-E during a school-based survey (50.5% female, 84.8% White, 16.22[1.19] years old). Adolescents also reported on sex, grade, race, age at e-cigarette use onset, vaping frequency, nicotine e-liquid use, and past-month cigarette smoking. Analyses included conducting confirmatory factor analysis and examining the internal consistency of the PROMIS-E. Bivariate correlations and independent-samples t-tests were used to examine unadjusted relationships between e-cigarette nicotine dependence and the proposed risk factors. Regression models were run in which all potential risk factors were entered as simultaneous predictors of PROMIS-E scores. The single-factor structure of the PROMIS-E was confirmed and evidenced good internal consistency. Across models, larger PROMIS-E scores were associated with being in a higher grade, initiating e-cigarette use at an earlier age, vaping more frequently, using nicotine e-liquid (and higher nicotine concentrations), and smoking cigarettes. Adolescent e-cigarette users reported experiencing nicotine dependence, which was assessed using the psychometrically sound PROMIS-E. Experiencing stronger nicotine dependence symptoms was associated with characteristics that previously have been shown to confer risk for frequent vaping and tobacco cigarette dependence. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. Measuring single constructs by single items: Constructing an even shorter version of the "Short Five" personality inventory.

    Directory of Open Access Journals (Sweden)

    Kenn Konstabel

    Full Text Available The aim of this study was to construct a short, 30-item personality questionnaire that would be, in terms of content and meaning of the scores, as comparable as possible with longer, well-established inventories such as NEO PI-R and its clones. To do this, we shortened the formerly constructed 60-item "Short Five" (S5 by half so that each subscale would be represented by a single item. We compared all possibilities of selecting 30 items (preserving balanced keying within each domain of the five-factor model in terms of correlations with well-established scales, self-peer correlations, and clarity of meaning, and selected an optimal combination for each domain. The resulting shortened questionnaire, XS5, was compared to the original S5 using data from student samples in 6 different countries (Estonia, Finland, UK, Germany, Spain, and China, and a representative Finnish sample. The correlations between XS5 domain scales and their longer counterparts from well-established scales ranged from 0.74 to 0.84; the difference from the equivalent correlations for full version of S5 or from meta-analytic short-term dependability coefficients of NEO PI-R was not large. In terms of prediction of external criteria (emotional experience and self-reported behaviours, there were no important differences between XS5, S5, and the longer well-established scales. Controlling for acquiescence did not improve the prediction of criteria, self-peer correlations, or correlations with longer scales, but it did improve internal reliability and, in some analyses, comparability of the principal component structure. XS5 can be recommended as an economic measure of the five-factor model of personality at the level of domain scales; it has reasonable psychometric properties, fair correlations with longer well-established scales, and it can predict emotional experience and self-reported behaviours no worse than S5. When subscales are essential, we would still recommend using the

  8. Recommended core items to assess e-cigarette use in population-based surveys

    OpenAIRE

    Pearson, Jennifer L; Hitchman, Sara C; Brose, Leonie S; Bauld, Linda; Glasser, Allison M; Villanti, Andrea C; McNeill, Ann; Abrams, David B; Cohen, Joanna E

    2017-01-01

    Background: A consistent approach using standardized items to assess e-cigarette use in both youth and adult populations will aid cross-survey and cross-national comparisons of the effect of e-cigarette (and tobacco) policies and improve our understanding of the population health impact of e-cigarette use. Focusing on adult behavior, we propose a set of e-cigarette use items, discuss their utility and potential adaptation, and highlight e-cigarette constructs that researchers should avoid wit...

  9. Measuring everyday functional competence using the Rasch assessment of everyday activity limitations (REAL) item bank.

    Science.gov (United States)

    Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J

    2017-11-01

    Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.

  10. Identifying the most efficient items from the Mini-Mental State Examination for cognitive function assessment in older Taiwanese patients.

    Science.gov (United States)

    Lou, Meei-Fang; Dai, Yu-Tzu; Huang, Guey-Shiun; Yu, Po-Jui

    2007-03-01

    The purpose of the study was to identify the most efficient items from the Mini-Mental State Examination for assessment of cognitive function. The Mini-Mental State Examination is the most frequently used cognitive screening instrument. However, the Mini-Mental State Examination has been criticized for insensitivity to mild cognitive dysfunction, limited memory assessment and variability in level of difficulty of the individual items. This study used secondary data analysis. Item response theory two-parameter model was used to analyse the data from the admission assessment of mental status by the Mini-Mental State Examination for 801 patients. By using item response analysis, 16 items were selected from the original 30-item Mini-Mental State Examination. The 16 items included mainly the measures of orientation, recall and attention and calculation. The internal consistency of the 16-item Mini-Mental State Examination was 0.84. The proposed new cut-off point for the 16-item Mini-Mental State Examination was 11. The correct classification rate was 0.94, the sensitivity was 100% and the specificity was 97.4%, when compared with the original 30-item Mini-Mental State Examination from the cut-off point of 24. This new cut-off point was determined for the purpose of over-identifying patients at risk so as to ensure early detection of and prevention from the onset of cognitive disturbance. Only a few items are needed to describe the subject's cognitive status. Using item response theory analysis, the study found that the Mini-Mental State Examination could be simplified. Deleting the items with less variation makes this assessment tool not only shorter, easier to administer and less strenuous for respondents, but also enables one to maintain validity as a cognitive function test for clinical setting.

  11. Rating the methodological quality of single-subject designs and n-of-1 trials: introducing the Single-Case Experimental Design (SCED) Scale.

    Science.gov (United States)

    Tate, Robyn L; McDonald, Skye; Perdices, Michael; Togher, Leanne; Schultz, Regina; Savage, Sharon

    2008-08-01

    Rating scales that assess methodological quality of clinical trials provide a means to critically appraise the literature. Scales are currently available to rate randomised and non-randomised controlled trials, but there are none that assess single-subject designs. The Single-Case Experimental Design (SCED) Scale was developed for this purpose and evaluated for reliability. Six clinical researchers who were trained and experienced in rating methodological quality of clinical trials developed the scale and participated in reliability studies. The SCED Scale is an 11-item rating scale for single-subject designs, of which 10 items are used to assess methodological quality and use of statistical analysis. The scale was developed and refined over a 3-year period. Content validity was addressed by identifying items to reduce the main sources of bias in single-case methodology as stipulated by authorities in the field, which were empirically tested against 85 published reports. Inter-rater reliability was assessed using a random sample of 20/312 single-subject reports archived in the Psychological Database of Brain Impairment Treatment Efficacy (PsycBITE). Inter-rater reliability for the total score was excellent, both for individual raters (overall ICC = 0.84; 95% confidence interval 0.73-0.92) and for consensus ratings between pairs of raters (overall ICC = 0.88; 95% confidence interval 0.78-0.95). Item reliability was fair to excellent for consensus ratings between pairs of raters (range k = 0.48 to 1.00). The results were replicated with two independent novice raters who were trained in the use of the scale (ICC = 0.88, 95% confidence interval 0.73-0.95). The SCED Scale thus provides a brief and valid evaluation of methodological quality of single-subject designs, with the total score demonstrating excellent inter-rater reliability using both individual and consensus ratings. Items from the scale can also be used as a checklist in the design, reporting and critical

  12. Explanatory item response modelling of an abstract reasoning assessment: A case for modern test design

    OpenAIRE

    Helland, Fredrik

    2016-01-01

    Assessment is an integral part of society and education, and for this reason it is important to know what you measure. This thesis is about explanatory item response modelling of an abstract reasoning assessment, with the objective to create a modern test design framework for automatic generation of valid and precalibrated items of abstract reasoning. Modern test design aims to strengthen the connections between the different components of a test, with a stress on strong theory, systematic it...

  13. ITEM LEVEL DIAGNOSTICS AND MODEL - DATA FIT IN ITEM ...

    African Journals Online (AJOL)

    Global Journal

    Item response theory (IRT) is a framework for modeling and analyzing item response ... data. Though, there is an argument that the evaluation of fit in IRT modeling has been ... National Council on Measurement in Education ... model data fit should be based on three types of ... prediction should be assessed through the.

  14. The development and discussion of computerized visual perception assessment tool for Chinese characters structures - Concurrent estimation of the overall ability and the domain ability in item response theory approach.

    Science.gov (United States)

    Wu, Huey-Min; Lin, Chin-Kai; Yang, Yu-Mao; Kuo, Bor-Chen

    2014-11-12

    Visual perception is the fundamental skill required for a child to recognize words, and to read and write. There was no visual perception assessment tool developed for preschool children based on Chinese characters in Taiwan. The purposes were to develop the computerized visual perception assessment tool for Chinese Characters Structures and to explore the psychometrical characteristic of assessment tool. This study adopted purposive sampling. The study evaluated 551 kindergarten-age children (293 boys, 258 girls) ranging from 46 to 81 months of age. The test instrument used in this study consisted of three subtests and 58 items, including tests of basic strokes, single-component characters, and compound characters. Based on the results of model fit analysis, the higher-order item response theory was used to estimate the performance in visual perception, basic strokes, single-component characters, and compound characters simultaneously. Analyses of variance were used to detect significant difference in age groups and gender groups. The difficulty of identifying items in a visual perception test ranged from -2 to 1. The visual perception ability of 4- to 6-year-old children ranged from -1.66 to 2.19. Gender did not have significant effects on performance. However, there were significant differences among the different age groups. The performance of 6-year-olds was better than that of 5-year-olds, which was better than that of 4-year-olds. This study obtained detailed diagnostic scores by using a higher-order item response theory model to understand the visual perception of basic strokes, single-component characters, and compound characters. Further statistical analysis showed that, for basic strokes and compound characters, girls performed better than did boys; there also were differences within each age group. For single-component characters, there was no difference in performance between boys and girls. However, again the performance of 6-year-olds was better than

  15. Measuring single constructs by single items: Constructing an even shorter version of the “Short Five” personality inventory

    Science.gov (United States)

    Konstabel, Kenn; Lönnqvist, Jan-Erik; Leikas, Sointu; García Velázquez, Regina; Qin, Hiaying; Verkasalo, Markku; Walkowitz, Gari

    2017-01-01

    The aim of this study was to construct a short, 30-item personality questionnaire that would be, in terms of content and meaning of the scores, as comparable as possible with longer, well-established inventories such as NEO PI-R and its clones. To do this, we shortened the formerly constructed 60-item “Short Five” (S5) by half so that each subscale would be represented by a single item. We compared all possibilities of selecting 30 items (preserving balanced keying within each domain of the five-factor model) in terms of correlations with well-established scales, self-peer correlations, and clarity of meaning, and selected an optimal combination for each domain. The resulting shortened questionnaire, XS5, was compared to the original S5 using data from student samples in 6 different countries (Estonia, Finland, UK, Germany, Spain, and China), and a representative Finnish sample. The correlations between XS5 domain scales and their longer counterparts from well-established scales ranged from 0.74 to 0.84; the difference from the equivalent correlations for full version of S5 or from meta-analytic short-term dependability coefficients of NEO PI-R was not large. In terms of prediction of external criteria (emotional experience and self-reported behaviours), there were no important differences between XS5, S5, and the longer well-established scales. Controlling for acquiescence did not improve the prediction of criteria, self-peer correlations, or correlations with longer scales, but it did improve internal reliability and, in some analyses, comparability of the principal component structure. XS5 can be recommended as an economic measure of the five-factor model of personality at the level of domain scales; it has reasonable psychometric properties, fair correlations with longer well-established scales, and it can predict emotional experience and self-reported behaviours no worse than S5. When subscales are essential, we would still recommend using the full version

  16. Using automatic item generation to create multiple-choice test items.

    Science.gov (United States)

    Gierl, Mark J; Lai, Hollis; Turner, Simon R

    2012-08-01

    Many tests of medical knowledge, from the undergraduate level to the level of certification and licensure, contain multiple-choice items. Although these are efficient in measuring examinees' knowledge and skills across diverse content areas, multiple-choice items are time-consuming and expensive to create. Changes in student assessment brought about by new forms of computer-based testing have created the demand for large numbers of multiple-choice items. Our current approaches to item development cannot meet this demand. We present a methodology for developing multiple-choice items based on automatic item generation (AIG) concepts and procedures. We describe a three-stage approach to AIG and we illustrate this approach by generating multiple-choice items for a medical licensure test in the content area of surgery. To generate multiple-choice items, our method requires a three-stage process. Firstly, a cognitive model is created by content specialists. Secondly, item models are developed using the content from the cognitive model. Thirdly, items are generated from the item models using computer software. Using this methodology, we generated 1248 multiple-choice items from one item model. Automatic item generation is a process that involves using models to generate items using computer technology. With our method, content specialists identify and structure the content for the test items, and computer technology systematically combines the content to generate new test items. By combining these outcomes, items can be generated automatically. © Blackwell Publishing Ltd 2012.

  17. The Stanford Leisure-Time Activity Categorical Item (L-Cat): a single categorical item sensitive to physical activity changes in overweight/obese women.

    Science.gov (United States)

    Kiernan, M; Schoffman, D E; Lee, K; Brown, S D; Fair, J M; Perri, M G; Haskell, W L

    2013-12-01

    Physical activity is essential for chronic disease prevention, yet Cat) is a single item comprising six descriptive categories ranging from inactive to very active. This novel methodological approach assesses national activity recommendations as well as multiple clinically relevant categories below and above the recommendations, and incorporates critical methodological principles that enhance psychometrics (reliability, validity and sensitivity to change). We evaluated the L-Cat's psychometrics among 267 overweight/obese women who were asked to meet the national activity recommendations in a randomized behavioral weight-loss trial. The L-Cat had excellent test-retest reliability (κ=0.64, PCat category at 6 months was associated with 1059 more daily pedometer steps (95% CI 712-1407, β=0.38, PCat categories differentiated from each other in a dose-response gradient for steps and weight loss (PsCat was sensitive to change in response to the trial's activity component. Women increased one L-Cat category at 6 months (M=1.0±1.4, PCat categories at 6 months lost more weight than those who did not (M=-4.6%, 95% CI -6.7 to -2.5, PCat has timely potential for clinical use such as tracking activity changes via electronic medical records, especially among overweight/obese populations who are unable or unlikely to reach national recommendations.

  18. Development of Rasch-based item banks for the assessment of work performance in patients with musculoskeletal diseases.

    Science.gov (United States)

    Mueller, Evelyn A; Bengel, Juergen; Wirtz, Markus A

    2013-12-01

    This study aimed to develop a self-description assessment instrument to measure work performance in patients with musculoskeletal diseases. In terms of the International Classification of Functioning, Disability and Health (ICF), work performance is defined as the degree of meeting the work demands (activities) at the actual workplace (environment). To account for the fact that work performance depends on the work demands of the job, we strived to develop item banks that allow a flexible use of item subgroups depending on the specific work demands of the patients' jobs. Item development included the collection of work tasks from literature and content validation through expert surveys and patient interviews. The resulting 122 items were answered by 621 patients with musculoskeletal diseases. Exploratory factor analysis to ascertain dimensionality and Rasch analysis (partial credit model) for each of the resulting dimensions were performed. Exploratory factor analysis resulted in four dimensions, and subsequent Rasch analysis led to the following item banks: 'impaired productivity' (15 items), 'impaired cognitive performance' (18), 'impaired coping with stress' (13) and 'impaired physical performance' (low physical workload 20 items, high physical workload 10 items). The item banks exhibited person separation indices (reliability) between 0.89 and 0.96. The assessment of work performance adds the activities component to the more commonly employed participation component of the ICF-model. The four item banks can be adapted to specific jobs where necessary without losing comparability of person measures, as the item banks are based on Rasch analysis.

  19. Generalizability theory and item response theory

    OpenAIRE

    Glas, Cornelis A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.

    2012-01-01

    Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a selected-response format. This chapter presents a short overview of how item response theory and generalizability theory were integrated to model such assessments. Further, the precision of the esti...

  20. What Form of Mathematics Are Assessments Assessing? The Case of Multiplication and Division in Fourth Grade NAEP Items

    Science.gov (United States)

    Kosko Karl W.; Singh, Rashmi

    2018-01-01

    Multiplicative reasoning is a key concept in elementary school mathematics. Item statistics reported by the National Assessment of Educational Progress (NAEP) assessment provide the best current indicator for how well elementary students across the U.S. understand this, and other concepts. However, beyond expert reviews and statistical analysis,…

  1. Development and validation of the Single Item Narcissism Scale (SINS).

    Science.gov (United States)

    Konrath, Sara; Meier, Brian P; Bushman, Brad J

    2014-01-01

    The narcissistic personality is characterized by grandiosity, entitlement, and low empathy. This paper describes the development and validation of the Single Item Narcissism Scale (SINS). Although the use of longer instruments is superior in most circumstances, we recommend the SINS in some circumstances (e.g. under serious time constraints, online studies). In 11 independent studies (total N = 2,250), we demonstrate the SINS' psychometric properties. The SINS is significantly correlated with longer narcissism scales, but uncorrelated with self-esteem. It also has high test-retest reliability. We validate the SINS in a variety of samples (e.g., undergraduates, nationally representative adults), intrapersonal correlates (e.g., positive affect, depression), and interpersonal correlates (e.g., aggression, relationship quality, prosocial behavior). The SINS taps into the more fragile and less desirable components of narcissism. The SINS can be a useful tool for researchers, especially when it is important to measure narcissism with constraints preventing the use of longer measures.

  2. Development and Validation of the Single Item Narcissism Scale (SINS)

    Science.gov (United States)

    Konrath, Sara; Meier, Brian P.; Bushman, Brad J.

    2014-01-01

    Main Objectives The narcissistic personality is characterized by grandiosity, entitlement, and low empathy. This paper describes the development and validation of the Single Item Narcissism Scale (SINS). Although the use of longer instruments is superior in most circumstances, we recommend the SINS in some circumstances (e.g. under serious time constraints, online studies). Methods In 11 independent studies (total N = 2,250), we demonstrate the SINS' psychometric properties. Results The SINS is significantly correlated with longer narcissism scales, but uncorrelated with self-esteem. It also has high test-retest reliability. We validate the SINS in a variety of samples (e.g., undergraduates, nationally representative adults), intrapersonal correlates (e.g., positive affect, depression), and interpersonal correlates (e.g., aggression, relationship quality, prosocial behavior). The SINS taps into the more fragile and less desirable components of narcissism. Significance The SINS can be a useful tool for researchers, especially when it is important to measure narcissism with constraints preventing the use of longer measures. PMID:25093508

  3. Development and validation of the Single Item Narcissism Scale (SINS.

    Directory of Open Access Journals (Sweden)

    Sara Konrath

    Full Text Available MAIN OBJECTIVES: The narcissistic personality is characterized by grandiosity, entitlement, and low empathy. This paper describes the development and validation of the Single Item Narcissism Scale (SINS. Although the use of longer instruments is superior in most circumstances, we recommend the SINS in some circumstances (e.g. under serious time constraints, online studies. METHODS: In 11 independent studies (total N = 2,250, we demonstrate the SINS' psychometric properties. RESULTS: The SINS is significantly correlated with longer narcissism scales, but uncorrelated with self-esteem. It also has high test-retest reliability. We validate the SINS in a variety of samples (e.g., undergraduates, nationally representative adults, intrapersonal correlates (e.g., positive affect, depression, and interpersonal correlates (e.g., aggression, relationship quality, prosocial behavior. The SINS taps into the more fragile and less desirable components of narcissism. SIGNIFICANCE: The SINS can be a useful tool for researchers, especially when it is important to measure narcissism with constraints preventing the use of longer measures.

  4. Evolution of a Test Item

    Science.gov (United States)

    Spaan, Mary

    2007-01-01

    This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…

  5. Validity and usefulness of a single-item measure of patient-reported bother from side effects of cancer therapy.

    Science.gov (United States)

    Pearman, Timothy P; Beaumont, Jennifer L; Mroczek, Daniel; O'Connor, Mary; Cella, David

    2018-03-01

    The improving efficacy of cancer treatment has resulted in an increasing array of treatment-related symptoms and associated burdens imposed on individuals undergoing aggressive treatment of their disease. Often, clinical trials compare therapies that have different types, and severities, of adverse effects. Whether rated by clinicians or patients themselves, it can be difficult to know which side effect profile is more disruptive or bothersome to patients. A simple summary index of bother can help to adjudicate the variability in adverse effects across treatments being compared with each other. Across 4 studies, a total of 5765 patients enrolled in cooperative group studies and industry-sponsored clinical trials were the subjects of the current study. Patients were diagnosed with a range of primary cancer sites, including bladder, brain, breast, colon/rectum, head/neck, hepatobiliary, kidney, lung, ovary, pancreas, and prostate as well as leukemia and lymphoma. All patients were administered the Functional Assessment of Cancer Therapy-General version (FACT-G). The single item "I am bothered by side effects of treatment" (GP5), rated on a 5-point Likert scale, is part of the FACT-G. To determine its validity as a useful summary measure from the patient perspective, it was correlated with individual and aggregated clinician-rated adverse events and patient reports of their general ability to enjoy life. Analyses of pharmaceutical trials demonstrated that mean GP5 scores ("I am bothered by side effects of treatment") significantly differed by maximum adverse event grade (PEffect sizes ranged from 0.13 to 0.46. Analyses of cooperative group trials demonstrated a significant correlation between GP5 and item GF3 ("I am able to enjoy life") in the predicted direction. The single FACT-G item "I am bothered by side effects of treatment" is significantly associated with clinician-reported adverse events and with patients' ability to enjoy their lives. It has promise as an

  6. Evaluation of item candidates for a diabetic retinopathy quality of life item bank.

    Science.gov (United States)

    Fenwick, Eva K; Pesudovs, Konrad; Khadka, Jyoti; Rees, Gwyn; Wong, Tien Y; Lamoureux, Ecosse L

    2013-09-01

    We are developing an item bank assessing the impact of diabetic retinopathy (DR) on quality of life (QoL) using a rigorous multi-staged process combining qualitative and quantitative methods. We describe here the first two qualitative phases: content development and item evaluation. After a comprehensive literature review, items were generated from four sources: (1) 34 previously validated patient-reported outcome measures; (2) five published qualitative articles; (3) eight focus groups and 18 semi-structured interviews with 57 DR patients; and (4) seven semi-structured interviews with diabetes or ophthalmic experts. Items were then evaluated during 3 stages, namely binning (grouping) and winnowing (reduction) based on key criteria and panel consensus; development of item stems and response options; and pre-testing of items via cognitive interviews with patients. The content development phase yielded 1,165 unique items across 7 QoL domains. After 3 sessions of binning and winnowing, items were reduced to a minimally representative set (n = 312) across 9 domains of QoL: visual symptoms; ocular surface symptoms; activity limitation; mobility; emotional; health concerns; social; convenience; and economic. After 8 cognitive interviews, 42 items were amended resulting in a final set of 314 items. We have employed a systematic approach to develop items for a DR-specific QoL item bank. The psychometric properties of the nine QoL subscales will be assessed using Rasch analysis. The resulting validated item bank will allow clinicians and researchers to better understand the QoL impact of DR and DR therapies from the patient's perspective.

  7. Development of a Short Version of MSQOL-54 Using Factor Analysis and Item Response Theory.

    Directory of Open Access Journals (Sweden)

    Rosalba Rosato

    Full Text Available The Multiple Sclerosis Quality of Life-54 (MSQOL-54, 52 items grouped in 12 subscales plus two single items is the most used MS specific health related quality of life inventory.To develop a shortened version of the MSQOL-54.MSQOL-54 dimensionality and metric properties were investigated by confirmatory factor analysis (CFA and Rasch modelling (Partial Credit Model, PCM on MSQOL-54s completed by 473 MS patients. Their mean age was 41 years, 65% were women, and median Expanded Disability Status Scale (EDSS score was 2.0 (range 0-9.5. Differential item functioning (DIF was evaluated for gender, age and EDSS. Dimensionality of the resulting short version was assessed by exploratory factor analysis (EFA and CFA. Cognitive debriefing of the short instrument (vs. the original was then performed on 12 MS patients.CFA of MSQOL-54 subscales showed that the data fitted the overall model well. Two subscales (Role Limitations--Physical, Role Limitations--Emotional did not fit the PCM, and were removed; two other subscales (Health Perceptions, Social Function did not fit the model, but were retained as single items. Sexual Satisfaction (single-item subscale was also removed. The resulting MSQOL-29 consisted of 25 items grouped in 7 subscales, plus 4 single items. PCM fit statistics were within the acceptability range for all MSQOL-29 items except one which had significant DIF by age. EFA and CFA indicated adequate fit to the original two-factor (Physical and Mental Health Composites hypothesis. Cognitive debriefing confirmed that MSQOL-29 was acceptable and had lost no key items.The proposed MSQOL-29 is 50% shorter than MSQOL-54, yet preserves key quality of life dimensions. Prospective validation on a large, independent MS patient sample is ongoing.

  8. Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

    Science.gov (United States)

    Gierl, Mark J.; Lai, Hollis

    2013-01-01

    Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…

  9. Using Item Analysis to Assess Objectively the Quality of the Calgary-Cambridge OSCE Checklist

    Directory of Open Access Journals (Sweden)

    Tyrone Donnon

    2011-06-01

    Full Text Available Background:  The purpose of this study was to investigate the use of item analysis to assess objectively the quality of items on the Calgary-Cambridge Communications OSCE checklist. Methods:  A total of 150 first year medical students were provided with extensive teaching on the use of the Calgary-Cambridge Guidelines for interviewing patients and participated in a final year end 20 minute communication OSCE station.  Grouped into either the upper half (50% or lower half (50% communication skills performance groups, discrimination, difficulty and point biserial values were calculated for each checklist item. Results:  The mean score on the 33 item communication checklist was 24.09 (SD = 4.46 and the internal reliability coefficient was ? = 0.77. Although most of the items were found to have moderate (k = 12, 36% or excellent (k = 10, 30% discrimination values, there were 6 (18% identified as ‘fair’ and 3 (9% as ‘poor’. A post-examination review focused on item analysis findings resulted in an increase in checklist reliability (? = 0.80. Conclusions:  Item analysis has been used with MCQ exams extensively. In this study, it was also found to be an objective and practical approach to use in evaluating the quality of a standardized OSCE checklist.

  10. An Examination of Differential Item Functioning on the Vanderbilt Assessment of Leadership in Education

    Science.gov (United States)

    Polikoff, Morgan S.; May, Henry; Porter, Andrew C.; Elliott, Stephen N.; Goldring, Ellen; Murphy, Joseph

    2009-01-01

    The Vanderbilt Assessment of Leadership in Education is a 360-degree assessment of the effectiveness of principals' learning-centered leadership behaviors. In this report, we present results from a differential item functioning (DIF) study of the assessment. Using data from a national field trial, we searched for evidence of DIF on school level,…

  11. Face Validity of the Single Work Ability Item: Comparison with Objectively Measured Heart Rate Reserve over Several Days

    Science.gov (United States)

    Gupta, Nidhi; Jensen, Bjørn Søvsø; Søgaard, Karen; Carneiro, Isabella Gomes; Christiansen, Caroline Stordal; Hanisch, Christiana; Holtermann, Andreas

    2014-01-01

    Purpose: The purpose of this study was to investigate the face validity of the self-reported single item work ability with objectively measured heart rate reserve (%HRR) among blue-collar workers. Methods: We utilized data from 127 blue-collar workers (Female = 53; Male = 74) aged 18–65 years from the cross-sectional “New method for Objective Measurements of physical Activity in Daily living (NOMAD)” study. The workers reported their single item work ability and completed an aerobic capacity cycling test and objective measurements of heart rate reserve monitored with Actiheart for 3–4 days with a total of 5,810 h, including 2,640 working hours. Results: A significant moderate correlation between work ability and %HRR was observed among males (R = −0.33, P = 0.005), but not among females (R = 0.11, P = 0.431). In a gender-stratified multi-adjusted logistic regression analysis, males with high %HRR were more likely to report a reduced work ability compared to males with low %HRR [OR = 4.75, 95% confidence interval (95% CI) = 1.31 to 17.25]. However, this association was not found among females (OR = 0.26, 95% CI 0.03 to 2.16), and a significant interaction between work ability, %HRR and gender was observed (P = 0.03). Conclusions: The observed association between work ability and objectively measured %HRR over several days among male blue-collar workers supports the face validity of the single work ability item. It is a useful and valid measure of the relation between physical work demands and resources among male blue-collar workers. The contrasting association among females needs to be further investigated. PMID:24840350

  12. Face Validity of the Single Work Ability Item: Comparison with Objectively Measured Heart Rate Reserve over Several Days

    Directory of Open Access Journals (Sweden)

    Nidhi Gupta

    2014-05-01

    Full Text Available Purpose: The purpose of this study was to investigate the face validity of the self-reported single item work ability with objectively measured heart rate reserve (%HRR among blue-collar workers. Methods: We utilized data from 127 blue-collar workers (Female = 53; Male = 74 aged 18–65 years from the cross-sectional “New method for Objective Measurements of physical Activity in Daily living (NOMAD” study. The workers reported their single item work ability and completed an aerobic capacity cycling test and objective measurements of heart rate reserve monitored with Actiheart for 3–4 days with a total of 5,810 h, including 2,640 working hours. Results: A significant moderate correlation between work ability and %HRR was observed among males (R = −0.33, P = 0.005, but not among females (R = 0.11, P = 0.431. In a gender-stratified multi-adjusted logistic regression analysis, males with high %HRR were more likely to report a reduced work ability compared to males with low %HRR [OR = 4.75, 95% confidence interval (95% CI = 1.31 to 17.25]. However, this association was not found among females (OR = 0.26, 95% CI 0.03 to 2.16, and a significant interaction between work ability, %HRR and gender was observed (P = 0.03. Conclusions: The observed association between work ability and objectively measured %HRR over several days among male blue-collar workers supports the face validity of the single work ability item. It is a useful and valid measure of the relation between physical work demands and resources among male blue-collar workers. The contrasting association among females needs to be further investigated.

  13. Concurrent Validation of the Clinical Opiate Withdrawal Scale (COWS) and Single-Item Indices against the Clinical Institute Narcotic Assessment (CINA) Opioid Withdrawal Instrument

    Science.gov (United States)

    Tompkins, D. Andrew; Bigelow, George E.; Harrison, Joseph A.; Johnson, Rolley E.; Fudala, Paul J.; Strain, Eric C.

    2009-01-01

    Introduction The Clinical Opiate Withdrawal Scale (COWS) is an 11-item clinician-administered scale assessing opioid withdrawal. Though commonly used in clinical practice, it has not been systematically validated. The present study validated the COWS in comparison to the validated Clinical Institute Narcotic Assessment (CINA) scale. Method Opioid-dependent volunteers were enrolled in a residential trial and stabilized on morphine 30 mg given subcutaneously four times daily. Subjects then underwent double-blind, randomized challenges of intramuscularly administered placebo and naloxone (0.4 mg) on separate days, during which the COWS, CINA, and visual analog scale (VAS) assessments were concurrently obtained. Subjects completing both challenges were included (N=46). Correlations between mean peak COWS and CINA scores as well as self-report VAS questions were calculated. Results Mean peak COWS and CINA scores of 7.6 and 24.4, respectively, occurred on average 30 minutes post-injection of naloxone. Mean COWS and CINA scores 30 minutes after placebo injection were 1.3 and 18.9, respectively. The Pearson correlation coefficient for peak COWS and CINA scores during the naloxone challenge session was 0.85 (p<0.001). Peak COWS scores also correlated well with peak VAS self-report scores of bad drug effect (r=0.57, p<0.001) and feeling sick (r=0.57, p<0.001), providing additional evidence of concurrent validity. Placebo was not associated with any significant elevation of COWS, CINA, or VAS scores, indicating discriminant validity. Cronbach’s alpha for the COWS was 0.78, indicating good internal consistency (reliability). Discussion COWS, CINA, and certain VAS items are all valid measurement tools for acute opiate withdrawal. PMID:19647958

  14. Item Response Data Analysis Using Stata Item Response Theory Package

    Science.gov (United States)

    Yang, Ji Seung; Zheng, Xiaying

    2018-01-01

    The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…

  15. Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

    Science.gov (United States)

    Sachse, Karoline A.; Haag, Nicole

    2017-01-01

    Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…

  16. The 4-Item Negative Symptom Assessment (NSA-4) Instrument: A Simple Tool for Evaluating Negative Symptoms in Schizophrenia Following Brief Training.

    Science.gov (United States)

    Alphs, Larry; Morlock, Robert; Coon, Cheryl; van Willigenburg, Arjen; Panagides, John

    2010-07-01

    Objective. To assess the ability of mental health professionals to use the 4-item Negative Symptom Assessment instrument, derived from the Negative Symptom Assessment-16, to rapidly determine the severity of negative symptoms of schizophrenia.Design. Open participation.Setting. Medical education conferences.Participants. Attendees at two international psychiatry conferences.Measurements. Participants read a brief set of the 4-item Negative Symptom Assessment instructions and viewed a videotape of a patient with schizophrenia. Using the 1 to 6 4-item Negative Symptom Assessment severity rating scale, they rated four negative symptom items and the overall global negative symptoms. These ratings were compared with a consensus rating determination using frequency distributions and Chi-square tests for the proportion of participant ratings that were within one point of the expert rating.Results. More than 400 medical professionals (293 physicians, 50% with a European practice, and 55% who reported past utilization of schizophrenia ratings scales) participated. Between 82.1 and 91.1 percent of the 4-items and the global rating determinations by the participants were within one rating point of the consensus expert ratings. The differences between the percentage of participant rating scores that were within one point versus the percentage that were greater than one point different from those by the consensus experts was significant (pnegative symptoms using the 4-item Negative Symptom Assessment did not generally differ among the geographic regions of practice, the professional credentialing, or their familiarity with the use of schizophrenia symptom rating instruments.Conclusion. These findings suggest that clinicians from a variety of geographic practices can, after brief training, use the 4-item Negative Symptom Assessment effectively to rapidly assess negative symptoms in patients with schizophrenia.

  17. RT-based memory detection : Item saliency effects in the single-probe and the multiple-probe protocol

    NARCIS (Netherlands)

    Verschuere, B.; Kleinberg, B.; Theocharidou, K.

    RT-based memory detection may provide an efficient means to assess recognition of concealed information. There is, however, considerable heterogeneity in detection rates, and we explored two potential moderators: item saliency and test protocol. Participants tried to conceal low salient (e.g.,

  18. Generalizability theory and item response theory

    NARCIS (Netherlands)

    Glas, Cornelis A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.

    2012-01-01

    Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a

  19. Working memory for sequences of temporal durations reveals a volatile single-item store

    Directory of Open Access Journals (Sweden)

    Sanjay G Manohar

    2016-10-01

    Full Text Available When a sequence is held in working memory, different items are retained with differing fidelity. Here we ask whether a sequence of brief time intervals that must be remembered show recency effects, similar to those observed in verbal and visuospatial working memory. It has been suggested that prioritising some items over others can be accounted for by a focus of attention, maintaining some items in a privileged state. We therefore also investigated whether such benefits are vulnerable to disruption by attention or expectation. Participants listened to sequences of one to five tones, of varying durations (200ms to 2s. Subsequently, the length of one of the tones in the sequence had to be reproduced by holding a key. The discrepancy between the reproduced and actual durations quantified the fidelity of memory for auditory durations. Recall precision decreased with the number of items that had to be remembered, and was better for the first and last items of sequences, in line with set-size and serial position effects seen in other modalities. To test whether attentional filtering demands might impair performance, an irrelevant variation in pitch was introduced in some blocks of trials. In those blocks, memory precision was worse for sequences that consisted of only one item, i.e. the smallest memory set size. Thus, when irrelevant information was present, the benefit of having only one item in memory is attenuated. Finally we examined whether expectation could interfere with memory. On half the trials, the number of items in the upcoming sequence was cued. When the number of items was known in advance, performance was paradoxically worse when the sequence consisted of only one item. Thus the benefit of having only one item to remember is stronger when it is unexpectedly the only item. Our results suggest that similar mechanisms are used to hold auditory time durations in working memory, as for visual or verbal stimuli. Further, solitary items were

  20. Development of a simple 12-item theory-based instrument to assess the impact of continuing professional development on clinical behavioral intentions.

    Directory of Open Access Journals (Sweden)

    France Légaré

    Full Text Available Decision-makers in organizations providing continuing professional development (CPD have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions.Our multipronged study had four phases. 1 We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2 A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3 An international group of experts (n = 70 reached consensus on the most relevant items using electronic Delphi surveys. 4 We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85.A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral intentions showed adequate validity and

  1. Single-item measures for depression and anxiety: Validation of the Screening Tool for Psychological Distress in an inpatient cardiology setting.

    Science.gov (United States)

    Young, Quincy-Robyn; Nguyen, Michelle; Roth, Susan; Broadberry, Ann; Mackay, Martha H

    2015-12-01

    Depression and anxiety are common among patients with cardiovascular disease (CVD) and confer significant cardiac risk, contributing to CVD morbidity and mortality. Unfortunately, due to the lack of screening tools that address the specific needs of hospitalized patients, few cardiac inpatient programs offer routine screening for these forms of psychological distress, despite recommendations to do so. The purpose of this study was to validate single-item measures for depression and anxiety among cardiac inpatients. Consecutive inpatients were recruited from the cardiology and cardiac surgery step-down units at a university-affiliated, quaternary-care hospital. Subjects completed a questionnaire that included: (a) demographics, (b) single-item-measures for depression and anxiety (from the Screening Tool for Psychological Distress (STOP-D)), and (c) Hospital Anxiety and Depression Scale (HADS). One hundred and five participants were recruited with a wide variety of cardiac diagnoses, having a mean age of 66 years, and 28% were women. Both STOP-D items were highly correlated with their corresponding validated measures and demonstrated robust receiver-operator characteristic curves. Severity scores on both items correlated well with established severity cut-off scores on the corresponding subscales of the HADS. The STOP-D is a self-administered, self-report measure using two independent items that provide severity scores for depression and anxiety. The tool performs very well compared with other previously validated measures. Requiring no additional scoring and being free, STOP-D offers a simple and valid method for identifying hospitalized cardiac patients who are experiencing psychological distress. This crucial first step triggers initiation of appropriate monitoring and intervention, thus reducing the likelihood of the adverse cardiac outcomes associated with psychological distress. © The European Society of Cardiology 2014.

  2. Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data

    Directory of Open Access Journals (Sweden)

    Zahra Sharafi

    2017-01-01

    Full Text Available Background. The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods. The ordinal logistic regression (OLR and hierarchical ordinal logistic regression (HOLR were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™ 4.0 collected from 576 healthy school children were analyzed. Results. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.

  3. The PROMIS fatigue item bank has good measurement properties in patients with fibromyalgia and severe fatigue.

    Science.gov (United States)

    Yost, Kathleen J; Waller, Niels G; Lee, Minji K; Vincent, Ann

    2017-06-01

    Efficient management of fibromyalgia (FM) requires precise measurement of FM-specific symptoms. Our objective was to assess the measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) fatigue item bank (FIB) in people with FM. We applied classical psychometric and item response theory methods to cross-sectional PROMIS-FIB data from two samples. Data on the clinical FM sample were obtained at a tertiary medical center. Data for the U.S. general population sample were obtained from the PROMIS network. The full 95-item bank was administered to both samples. We investigated dimensionality of the item bank in both samples by separately fitting a bifactor model with two group factors; experience and impact. We assessed measurement invariance between samples, and we explored an alternate factor structure with the normative sample and subsequently confirmed that structure in the clinical sample. Finally, we assessed whether reporting FM subdomain scores added value over reporting a single total score. The item bank was dominated by a general fatigue factor. The fit of the initial bifactor model and evidence of measurement invariance indicated that the same constructs were measured across the samples. An alternative bifactor model with three group factors demonstrated slightly improved fit. Subdomain scores add value over a total score. We demonstrated that the PROMIS-FIB is appropriate for measuring fatigue in clinical samples of FM patients. The construct can be presented by a single score; however, subdomain scores for the three group factors identified in the alternative model may also be reported.

  4. Enactment versus observation: item-specific and relational processing in goal-directed action sequences (and lists of single actions.

    Directory of Open Access Journals (Sweden)

    Janette Schult

    Full Text Available What are the memory-related consequences of learning actions (such as "apply the patch" by enactment during study, as compared to action observation? Theories converge in postulating that enactment encoding increases item-specific processing, but not the processing of relational information. Typically, in the laboratory enactment encoding is studied for lists of unrelated single actions in which one action execution has no overarching purpose or relation with other actions. In contrast, real-life actions are usually carried out with the intention to achieve such a purpose. When actions are embedded in action sequences, relational information provides efficient retrieval cues. We contrasted memory for single actions with memory for action sequences in three experiments. We found more reliance on relational processing for action-sequences than single actions. To what degree can this relational information be used after enactment versus after the observation of an actor? We found indicators of superior relational processing after observation than enactment in ordered pair recall (Experiment 1A and in emerging subjective organization of repeated recall protocols (recall runs 2-3, Experiment 2. An indicator of superior item-specific processing after enactment compared to observation was recognition (Experiment 1B, Experiment 2. Similar net recall suggests that observation can be as good a learning strategy as enactment. We discuss possible reasons why these findings only partly converge with previous research and theorizing.

  5. 'Do you think you suffer from depression?' Reevaluating the use of a single item question for the screening of depression in older primary care patients

    DEFF Research Database (Denmark)

    Ayalon, Liat; Goldfracht, Margalit; Bech, Per

    2010-01-01

    evaluated against a depression diagnosis made by the Structured Clinical Interview for DSM-IV. RESULTS: Overall, 3.9% of the sample was diagnosed with depression. The most notable finding was that the single-item question, 'do you think you suffer from depression?' had as good or better sensitivity (83......%) than all other screens. Nonetheless, its specificity of 83% suggested that it has to be followed up by a through diagnostic interview. Additional sensitivity analyses concerning the use of a single depression item taken directly from the depression screening measures supported this finding. CONCLUSIONS......: An easy way to detect depression in older primary care patients would be asking the single question, 'do you think you suffer from depression?'...

  6. Development of a self-report physical function instrument for disability assessment: item pool construction and factor analysis.

    Science.gov (United States)

    McDonough, Christine M; Jette, Alan M; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M; Rasch, Elizabeth K

    2013-09-01

    To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. In-person and semistructured interviews and Internet and telephone surveys. Sample of SSA claimants (n=1017) and a normative sample of adults from the U.S. general population (n=999). Not applicable. Model fit statistics. The final item pool consisted of 139 items. Within the claimant sample, 58.7% were white; 31.8% were black; 46.6% were women; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution, which included more items and allowed separate characterization of: (1) changing and maintaining body position, (2) whole body mobility, (3) upper body function, and (4) upper extremity fine motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples, respectively, were: Comparative Fit Index=.93 and .98; Tucker-Lewis Index=.92 and .98; and root mean square error approximation=.05 and .04. The factor structure of the physical function item pool closely resembled the hypothesized content model. The 4 scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  7. Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.

    Science.gov (United States)

    Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E

    2018-02-02

    In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.

  8. Geriatric Anxiety Scale: item response theory analysis, differential item functioning, and creation of a ten-item short form (GAS-10).

    Science.gov (United States)

    Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L

    2015-07-01

    The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.

  9. Validation of the Single-Factor Model of the Relationship Assessment Scale among Married and Cohabiting Persons from Monterrey, Mexico

    Directory of Open Access Journals (Sweden)

    José Moral de la Rubia

    2015-07-01

    Full Text Available The study of intimate partner relationships is particularly important because this union is the foundation of the family. Satisfaction with the relationship can be defined as the overall attitude to the relationship and the partner. The Hendrick's Relationship Assessment Scale (RAS is a instrument commonly used to assess the construct. Previous research papers have showed that this scale has high internal consistency and a single-factor structure. Although there are validation studies of the RAS, these studies used inappropriate statistical techniques to analyze its Likert-type items, and to determine the number of factors; likewise, its factor invariance across sex has not been previously contrasted. Therefore, this study posed the following research questions: Does the RAS have consistent and discriminating items? Basing the analysis on a polychoric correlation matrix, what is its level of internal consistency? How many factors emerge using rigorous empirical methods? Is the single-factor model invariant across sex? In order to answer these research questions, we used a random route probability sampling in this instrument validation study of the RAS. The sample was extracted from the population of married couples or the ones living in consensual union in Monterrey, Mexico. There were 431 female and 376 male participants in the study. The RAS’ items were consistent and discriminative. The internal consistency of the scale was excellent in the whole sample (ordinal α = .93, as well as among female (ordinal α = .94 and male participants (ordinal α = .92. Horn's parallel analysis and Velicer's  minimum average partial test suggested a one factor solution. Moreover, the single-factor model (with one correlation between the residuals of the two negatively worded items had a close fit to the data, and its properties of invariance across sex were very acceptable by the Unweighted Least Squares method. We conclude that the scale shows internal

  10. Quantitative Literacy on the Web of Science, 2 – Mining the Health Numeracy Literature for Assessment Items

    Directory of Open Access Journals (Sweden)

    H.L. Vacher

    2009-01-01

    Full Text Available A topic search of the Web of Science (WoS database using the term “numeracy” produced a bibliography of 293 articles, reviews and editorial commentaries (Oct 2008. The citation graph of the bibliography clearly identifies five benchmark papers (1995-2001, four of which developed numeracy assessment instruments. Starting with the 80 papers that cite these benchmarks, we identified a set of 25 papers (1995-2008 in which the medical research community reports the development and/or application of health-numeracy assessments. In all we found 10 assessment instruments from which we have compiled a total of 48 assessment items. There are both general and context-specific tests, with the wide range in the latter illustrated by names such as the Diabetes Numeracy Test and the Asthma Numeracy Questionnaire. There is also a Medical Data Interpretation Test and a Subjective Numeracy Scale. Much of this literature discusses the validity and reliability of the test, and many papers include item-by-item results of the tests from when they were applied in the research reported in the papers. The research that used the tests was directed at exploring such subjects as the patients’ ability to evaluate risks and benefits in order to make informed decisions; to understand and carry out instructions in order to self-manage their medical conditions; and, in research settings, to understand what the researchers were asking in their assessments (e.g., quantified quality of life that require comparison of numerical information. We present the collection of items as a potential resource for educators interested in numeracy assessments in context.

  11. Development of coordination system model on single-supplier multi-buyer for multi-item supply chain with probabilistic demand

    Science.gov (United States)

    Olivia, G.; Santoso, A.; Prayogo, D. N.

    2017-11-01

    Nowadays, the level of competition between supply chains is getting tighter and a good coordination system between supply chains members is very crucial in solving the issue. This paper focused on a model development of coordination system between single supplier and buyers in a supply chain as a solution. Proposed optimization model was designed to determine the optimal number of deliveries from a supplier to buyers in order to minimize the total cost over a planning horizon. Components of the total supply chain cost consist of transportation costs, handling costs of supplier and buyers and also stock out costs. In the proposed optimization model, the supplier can supply various types of items to retailers whose item demand patterns are probabilistic. Sensitivity analysis of the proposed model was conducted to test the effect of changes in transport costs, handling costs and production capacities of the supplier. The results of the sensitivity analysis showed a significant influence on the changes in the transportation cost, handling costs and production capacity to the decisions of the optimal numbers of product delivery for each item to the buyers.

  12. Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

    Science.gov (United States)

    Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

    2015-01-01

    The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.

  13. Improved Approximation Algorithms for Item Pricing with Bounded Degree and Valuation

    Science.gov (United States)

    Hamane, Ryoso; Itoh, Toshiya

    When a store sells items to customers, the store wishes to decide the prices of the items to maximize its profit. If the store sells the items with low (resp. high) prices, the customers buy more (resp. less) items, which provides less profit to the store. It would be hard for the store to decide the prices of items. Assume that a store has a set V of n items and there is a set C of m customers who wish to buy those items. The goal of the store is to decide the price of each item to maximize its profit. We refer to this maximization problem as an item pricing problem. We classify the item pricing problems according to how many items the store can sell or how the customers valuate the items. If the store can sell every item i with unlimited (resp. limited) amount, we refer to this as unlimited supply (resp. limited supply). We say that the item pricing problem is single-minded if each customer j∈C wishes to buy a set ej⊆V of items and assigns valuation w(ej)≥0. For the single-minded item pricing problems (in unlimited supply), Balcan and Blum regarded them as weighted k-hypergraphs and gave several approximation algorithms. In this paper, we focus on the (pseudo) degree of k-hypergraphs and the valuation ratio, i. e., the ratio between the smallest and the largest valuations. Then for the single-minded item pricing problems (in unlimited supply), we show improved approximation algorithms (for k-hypergraphs, general graphs, bipartite graphs, etc.) with respect to the maximum (pseudo) degree and the valuation ratio.

  14. Using personality item characteristics to predict single-item reliability, retest reliability, and self-other agreement

    NARCIS (Netherlands)

    de Vries, Reinout Everhard; Realo, Anu; Allik, Jüri

    2016-01-01

    The use of reliability estimates is increasingly scrutinized as scholars become more aware that test–retest stability and self–other agreement provide a better approximation of the theoretical and practical usefulness of an instrument than its internal reliability. In this study, we investigate item

  15. Identifying Promising Items: The Use of Crowdsourcing in the Development of Assessment Instruments

    Science.gov (United States)

    Sadler, Philip M.; Sonnert, Gerhard; Coyle, Harold P.; Miller, Kelly A.

    2016-01-01

    The psychometrically sound development of assessment instruments requires pilot testing of candidate items as a first step in gauging their quality, typically a time-consuming and costly effort. Crowdsourcing offers the opportunity for gathering data much more quickly and inexpensively than from most targeted populations. In a simulation of a…

  16. Electronic assessment of clinical reasoning in clerkships: A mixed-methods comparison of long-menu key-feature problems with context-rich single best answer questions

    NARCIS (Netherlands)

    Huwendiek, S.; Reichert, F.; Duncker, C.; Leng, B.A. De; Vleuten, C.P.M. van der; Muijtjens, A.M.; Bosse, H.M.; Haag, M.; Hoffmann, G.F.; Tonshoff, B.; Dolmans, D.

    2017-01-01

    BACKGROUND: It remains unclear which item format would best suit the assessment of clinical reasoning: context-rich single best answer questions (crSBAs) or key-feature problems (KFPs). This study compared KFPs and crSBAs with respect to students' acceptance, their educational impact, and

  17. Using existing questionnaires in latent class analysis: should we use summary scores or single items as input? A methodological study using a cohort of patients with low back pain

    Directory of Open Access Journals (Sweden)

    Nielsen AM

    2016-04-01

    Full Text Available Anne Molgaard Nielsen,1 Werner Vach,2 Peter Kent,1,3 Lise Hestbaek,1,4 Alice Kongsted1,4 1Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark; 2Center for Medical Biometry and Medical Informatics, Medical Center, University of Freiburg, Freiburg, Germany; 3School of Physiotherapy and Exercise Science, Curtin University, Perth, Australia; 4Nordic Institute of Chiropractic and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark Background: Latent class analysis (LCA is increasingly being used in health research, but optimal approaches to handling complex clinical data are unclear. One issue is that commonly used questionnaires are multidimensional, but expressed as summary scores. Using the example of low back pain (LBP, the aim of this study was to explore and descriptively compare the application of LCA when using questionnaire summary scores and when using single items to subgrouping of patients based on multidimensional data. Materials and methods: Baseline data from 928 LBP patients in an observational study were classified into four health domains (psychology, pain, activity, and participation using the World Health Organization’s International Classification of Functioning, Disability, and Health framework. LCA was performed within each health domain using the strategies of summary-score and single-item analyses. The resulting subgroups were descriptively compared using statistical measures and clinical interpretability. Results: For each health domain, the preferred model solution ranged from five to seven subgroups for the summary-score strategy and seven to eight subgroups for the single-item strategy. There was considerable overlap between the results of the two strategies, indicating that they were reflecting the same underlying data structure. However, in three of the four health domains, the single-item strategy resulted in a more nuanced description, in terms

  18. Investigation of Science Inquiry Items for Use on an Alternate Assessment Based on Modified Achievement Standards Using Cognitive Lab Methodology

    Science.gov (United States)

    Dickenson, Tammiee S.; Gilmore, Joanna A.; Price, Karen J.; Bennett, Heather L.

    2013-01-01

    This study evaluated the benefits of item enhancements applied to science-inquiry items for incorporation into an alternate assessment based on modified achievement standards for high school students. Six items were included in the cognitive lab sessions involving both students with and without disabilities. The enhancements (e.g., use of visuals,…

  19. Item Modeling Concept Based on Multimedia Authoring

    Directory of Open Access Journals (Sweden)

    Janez Stergar

    2008-09-01

    Full Text Available In this paper a modern item design framework for computer based assessment based on Flash authoring environment will be introduced. Question design will be discussed as well as the multimedia authoring environment used for item modeling emphasized. Item type templates are a structured means of collecting and storing item information that can be used to improve the efficiency and security of the innovative item design process. Templates can modernize the item design, enhance and speed up the development process. Along with content creation, multimedia has vast potential for use in innovative testing. The introduced item design template is based on taxonomy of innovative items which have great potential for expanding the content areas and construct coverage of an assessment. The presented item design approach is based on GUI's – one for question design based on implemented item design templates and one for user interaction tracking/retrieval. The concept of user interfaces based on Flash technology will be discussed as well as implementation of the innovative approach of the item design forms with multimedia authoring. Also an innovative method for user interaction storage/retrieval based on PHP extending Flash capabilities in the proposed framework will be introduced.

  20. Item and test analysis to identify quality multiple choice questions (MCQS from an assessment of medical students of Ahmedabad, Gujarat

    Directory of Open Access Journals (Sweden)

    Sanju Gajjar

    2014-01-01

    Full Text Available Background: Multiple choice questions (MCQs are frequently used to assess students in different educational streams for their objectivity and wide reach of coverage in less time. However, the MCQs to be used must be of quality which depends upon its difficulty index (DIF I, discrimination index (DI and distracter efficiency (DE. Objective: To evaluate MCQs or items and develop a pool of valid items by assessing with DIF I, DI and DE and also to revise/ store or discard items based on obtained results. Settings: Study was conducted in a medical school of Ahmedabad. Materials and Methods: An internal examination in Community Medicine was conducted after 40 hours teaching during 1 st MBBS which was attended by 148 out of 150 students. Total 50 MCQs or items and 150 distractors were analyzed. Statistical Analysis: Data was entered and analyzed in MS Excel 2007 and simple proportions, mean, standard deviations, coefficient of variation were calculated and unpaired t test was applied. Results: Out of 50 items, 24 had "good to excellent" DIF I (31 - 60% and 15 had "good to excellent" DI (> 0.25. Mean DE was 88.6% considered as ideal/ acceptable and non functional distractors (NFD were only 11.4%. Mean DI was 0.14. Poor DI (< 0.15 with negative DI in 10 items indicates poor preparedness of students and some issues with framing of at least some of the MCQs. Increased proportion of NFDs (incorrect alternatives selected by < 5% students in an item decrease DE and makes it easier. There were 15 items with 17 NFDs, while rest items did not have any NFD with mean DE of 100%. Conclusion: Study emphasizes the selection of quality MCQs which truly assess the knowledge and are able to differentiate the students of different abilities in correct manner.

  1. Re-evaluating a vision-related quality of life questionnaire with item response theory (IRT and differential item functioning (DIF analyses

    Directory of Open Access Journals (Sweden)

    Knol Dirk L

    2011-09-01

    Full Text Available Abstract Background For the Low Vision Quality Of Life questionnaire (LVQOL it is unknown whether the psychometric properties are satisfactory when an item response theory (IRT perspective is considered. This study evaluates some essential psychometric properties of the LVQOL questionnaire in an IRT model, and investigates differential item functioning (DIF. Methods Cross-sectional data were used from an observational study among visually-impaired patients (n = 296. Calibration was performed for every dimension of the LVQOL in the graded response model. Item goodness-of-fit was assessed with the S-X2-test. DIF was assessed on relevant background variables (i.e. age, gender, visual acuity, eye condition, rehabilitation type and administration type with likelihood-ratio tests for DIF. The magnitude of DIF was interpreted by assessing the largest difference in expected scores between subgroups. Measurement precision was assessed by presenting test information curves; reliability with the index of subject separation. Results All items of the LVQOL dimensions fitted the model. There was significant DIF on several items. For two items the maximum difference between expected scores exceeded one point, and DIF was found on multiple relevant background variables. Item 1 'Vision in general' from the "Adjustment" dimension and item 24 'Using tools' from the "Reading and fine work" dimension were removed. Test information was highest for the "Reading and fine work" dimension. Indices for subject separation ranged from 0.83 to 0.94. Conclusions The items of the LVQOL showed satisfactory item fit to the graded response model; however, two items were removed because of DIF. The adapted LVQOL with 21 items is DIF-free and therefore seems highly appropriate for use in heterogeneous populations of visually impaired patients.

  2. Matrix Sampling of Items in Large-Scale Assessments

    Directory of Open Access Journals (Sweden)

    Ruth A. Childs

    2003-07-01

    Full Text Available Matrix sampling of items -' that is, division of a set of items into different versions of a test form..-' is used by several large-scale testing programs. Like other test designs, matrixed designs have..both advantages and disadvantages. For example, testing time per student is less than if each..student received all the items, but the comparability of student scores may decrease. Also,..curriculum coverage is maintained, but reporting of scores becomes more complex. In this paper,..matrixed designs are compared with more traditional designs in nine categories of costs:..development costs, materials costs, administration costs, educational costs, scoring costs,..reliability costs, comparability costs, validity costs, and reporting costs. In choosing among test..designs, a testing program should examine the costs in light of its mandate(s, the content of the..tests, and the financial resources available, among other considerations.

  3. The Long-Term Conditions Questionnaire: conceptual framework and item development.

    Science.gov (United States)

    Peters, Michele; Potter, Caroline M; Kelly, Laura; Hunter, Cheryl; Gibbons, Elizabeth; Jenkinson, Crispin; Coulter, Angela; Forder, Julien; Towers, Ann-Marie; A'Court, Christine; Fitzpatrick, Ray

    2016-01-01

    To identify the main issues of importance when living with long-term conditions to refine a conceptual framework for informing the item development of a patient-reported outcome measure for long-term conditions. Semi-structured qualitative interviews (n=48) were conducted with people living with at least one long-term condition. Participants were recruited through primary care. The interviews were transcribed verbatim and analyzed by thematic analysis. The analysis served to refine the conceptual framework, based on reviews of the literature and stakeholder consultations, for developing candidate items for a new measure for long-term conditions. Three main organizing concepts were identified: impact of long-term conditions, experience of services and support, and self-care. The findings helped to refine a conceptual framework, leading to the development of 23 items that represent issues of importance in long-term conditions. The 23 candidate items formed the first draft of the measure, currently named the Long-Term Conditions Questionnaire. The aim of this study was to refine the conceptual framework and develop items for a patient-reported outcome measure for long-term conditions, including single and multiple morbidities and physical and mental health conditions. Qualitative interviews identified the key themes for assessing outcomes in long-term conditions, and these underpinned the development of the initial draft of the measure. These initial items will undergo cognitive testing to refine the items prior to further validation in a survey.

  4. Calibration of Automatically Generated Items Using Bayesian Hierarchical Modeling.

    Science.gov (United States)

    Johnson, Matthew S.; Sinharay, Sandip

    For complex educational assessments, there is an increasing use of "item families," which are groups of related items. However, calibration or scoring for such an assessment requires fitting models that take into account the dependence structure inherent among the items that belong to the same item family. C. Glas and W. van der Linden…

  5. Guideline appraisal with AGREE II: online survey of the potential influence of AGREE II items on overall assessment of guideline quality and recommendation for use.

    Science.gov (United States)

    Hoffmann-Eßer, Wiebke; Siering, Ulrich; Neugebauer, Edmund A M; Brockhaus, Anne Catharina; McGauran, Natalie; Eikermann, Michaela

    2018-02-27

    The AGREE II instrument is the most commonly used guideline appraisal tool. It includes 23 appraisal criteria (items) organized within six domains. AGREE II also includes two overall assessments (overall guideline quality, recommendation for use). Our aim was to investigate how strongly the 23 AGREE II items influence the two overall assessments. An online survey of authors of publications on guideline appraisals with AGREE II and guideline users from a German scientific network was conducted between 10th February 2015 and 30th March 2015. Participants were asked to rate the influence of the AGREE II items on a Likert scale (0 = no influence to 5 = very strong influence). The frequencies of responses and their dispersion were presented descriptively. Fifty-eight of the 376 persons contacted (15.4%) participated in the survey and the data of the 51 respondents with prior knowledge of AGREE II were analysed. Items 7-12 of Domain 3 (rigour of development) and both items of Domain 6 (editorial independence) had the strongest influence on the two overall assessments. In addition, Items 15-17 (clarity of presentation) had a strong influence on the recommendation for use. Great variations were shown for the other items. The main limitation of the survey is the low response rate. In guideline appraisals using AGREE II, items representing rigour of guideline development and editorial independence seem to have the strongest influence on the two overall assessments. In order to ensure a transparent approach to reaching the overall assessments, we suggest the inclusion of a recommendation in the AGREE II user manual on how to consider item and domain scores. For instance, the manual could include an a-priori weighting of those items and domains that should have the strongest influence on the two overall assessments. The relevance of these assessments within AGREE II could thereby be further specified.

  6. Dissociating the neural correlates of intra-item and inter-item working-memory binding.

    Directory of Open Access Journals (Sweden)

    Carinne Piekema

    Full Text Available BACKGROUND: Integration of information streams into a unitary representation is an important task of our cognitive system. Within working memory, the medial temporal lobe (MTL has been conceptually linked to the maintenance of bound representations. In a previous fMRI study, we have shown that the MTL is indeed more active during working-memory maintenance of spatial associations as compared to non-spatial associations or single items. There are two explanations for this result, the mere presence of the spatial component activates the MTL, or the MTL is recruited to bind associations between neurally non-overlapping representations. METHODOLOGY/PRINCIPAL FINDINGS: The current fMRI study investigates this issue further by directly comparing intrinsic intra-item binding (object/colour, extrinsic intra-item binding (object/location, and inter-item binding (object/object. The three binding conditions resulted in differential activation of brain regions. Specifically, we show that the MTL is important for establishing extrinsic intra-item associations and inter-item associations, in line with the notion that binding of information processed in different brain regions depends on the MTL. CONCLUSIONS/SIGNIFICANCE: Our findings indicate that different forms of working-memory binding rely on specific neural structures. In addition, these results extend previous reports indicating that the MTL is implicated in working-memory maintenance, challenging the classic distinction between short-term and long-term memory systems.

  7. Negative affect impairs associative memory but not item memory.

    OpenAIRE

    Bisby, J. A.; Burgess, N.

    2014-01-01

    The formation of associations between items and their context has been proposed to rely on mechanisms distinct from those supporting memory for a single item. Although emotional experiences can profoundly affect memory, our understanding of how it interacts with different aspects of memory remains unclear. We performed three experiments to examine the effects of emotion on memory for items and their associations. By presenting neutral and negative items with background contexts, Experiment 1 ...

  8. An Anthropologist among the Psychometricians: Assessment Events, Ethnography, and Differential Item Functioning in the Mongolian Gobi

    Science.gov (United States)

    Maddox, Bryan; Zumbo, Bruno D.; Tay-Lim, Brenda; Qu, Demin

    2015-01-01

    This article explores the potential for ethnographic observations to inform the analysis of test item performance. In 2010, a standardized, large-scale adult literacy assessment took place in Mongolia as part of the United Nations Educational, Scientific and Cultural Organization Literacy Assessment and Monitoring Programme (LAMP). In a novel form…

  9. An approach for estimating item sensitivity to within-person change over time: An illustration using the Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog).

    Science.gov (United States)

    Dowling, N Maritza; Bolt, Daniel M; Deng, Sien

    2016-12-01

    When assessments are primarily used to measure change over time, it is important to evaluate items according to their sensitivity to change, specifically. Items that demonstrate good sensitivity to between-person differences at baseline may not show good sensitivity to change over time, and vice versa. In this study, we applied a longitudinal factor model of change to a widely used cognitive test designed to assess global cognitive status in dementia, and contrasted the relative sensitivity of items to change. Statistically nested models were estimated introducing distinct latent factors related to initial status differences between test-takers and within-person latent change across successive time points of measurement. Models were estimated using all available longitudinal item-level data from the Alzheimer's Disease Assessment Scale-Cognitive subscale, including participants representing the full-spectrum of disease status who were enrolled in the multisite Alzheimer's Disease Neuroimaging Initiative. Five of the 13 Alzheimer's Disease Assessment Scale-Cognitive items demonstrated noticeably higher loadings with respect to sensitivity to change. Attending to performance change on only these 5 items yielded a clearer picture of cognitive decline more consistent with theoretical expectations in comparison to the full 13-item scale. Items that show good psychometric properties in cross-sectional studies are not necessarily the best items at measuring change over time, such as cognitive decline. Applications of the methodological approach described and illustrated in this study can advance our understanding regarding the types of items that best detect fine-grained early pathological changes in cognition. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  10. Feed mechanism and method for feeding minute items

    Science.gov (United States)

    Stringer, Timothy Kent [Bucyrus, KS; Yerganian, Simon Scott [Lee's Summit, MO

    2009-10-20

    A feeding mechanism and method for feeding minute items, such as capacitors, resistors, or solder preforms. The mechanism is adapted to receive a plurality of the randomly-positioned and randomly-oriented extremely small or minute items, and to isolate, orient, and position one or more of the items in a specific repeatable pickup location wherefrom they may be removed for use by, for example, a computer-controlled automated assembly machine. The mechanism comprises a sliding shelf adapted to receive and support the items; a wiper arm adapted to achieve a single even layer of the items; and a pushing arm adapted to push the items into the pickup location. The mechanism can be adapted for providing the items with a more exact orientation, and can also be adapted for use in a liquid environment.

  11. Examination of the PROMIS upper extremity item bank.

    Science.gov (United States)

    Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R

    Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.

  12. Improved utilization of ADAS-cog assessment data through item response theory based pharmacometric modeling.

    Science.gov (United States)

    Ueckert, Sebastian; Plan, Elodie L; Ito, Kaori; Karlsson, Mats O; Corrigan, Brian; Hooker, Andrew C

    2014-08-01

    This work investigates improved utilization of ADAS-cog data (the primary outcome in Alzheimer's disease (AD) trials of mild and moderate AD) by combining pharmacometric modeling and item response theory (IRT). A baseline IRT model characterizing the ADAS-cog was built based on data from 2,744 individuals. Pharmacometric methods were used to extend the baseline IRT model to describe longitudinal ADAS-cog scores from an 18-month clinical study with 322 patients. Sensitivity of the ADAS-cog items in different patient populations as well as the power to detect a drug effect in relation to total score based methods were assessed with the IRT based model. IRT analysis was able to describe both total and item level baseline ADAS-cog data. Longitudinal data were also well described. Differences in the information content of the item level components could be quantitatively characterized and ranked for mild cognitively impairment and mild AD populations. Based on clinical trial simulations with a theoretical drug effect, the IRT method demonstrated a significantly higher power to detect drug effect compared to the traditional method of analysis. A combined framework of IRT and pharmacometric modeling permits a more effective and precise analysis than total score based methods and therefore increases the value of ADAS-cog data.

  13. Improving the Memory Sections of the Standardized Assessment of Concussion Using Item Analysis

    Science.gov (United States)

    McElhiney, Danielle; Kang, Minsoo; Starkey, Chad; Ragan, Brian

    2014-01-01

    The purpose of the study was to improve the immediate and delayed memory sections of the Standardized Assessment of Concussion (SAC) by identifying a list of more psychometrically sound items (words). A total of 200 participants with no history of concussion in the previous six months (aged 19.60 ± 2.20 years; N?=?93 men, N?=?107 women)…

  14. Psychometric assessment of the Adult-Adolescent Parenting Inventory in a sample of low-income single mothers.

    Science.gov (United States)

    Lutenbacher, M

    2001-01-01

    The Adult-Adolescent Parenting Inventory (AAPI) is a 32-item inventory widely used to identify adolescents and adults at risk for inadequate parenting behaviors. It includes four subscales representing the most frequent patterns associated with abusive parenting: (a) Inappropriate Expectations; (b) Lack of Empathy; (c) Parental Value of Corporal Punishment; and (d) Parent-Child Role Reversal. Although it has been used in a variety of samples, the psychometric properties of the AAPI have not been examined in low-income single mothers. The purposes of this study were to: (a) examine the reliability and validity of the Adult-Adolescent Parenting Inventory (AAPI) in a sample of 206 low-income single mothers; (b) assess the mother's risk for inadequate parenting by comparing their AAPI subscale scores with normative subscale scores on the AAPI; (c) assess the construct validity of the AAPI by testing the hypothesis that mothers with lower AAPI scores have a higher level of depressive symptoms and lower self-esteem in comparison to mothers with higher AAPI scores; and (d) determine whether the 4-factor structure proposed by Bavolek (1984) could be replicated. AAPI scores indicated these mothers were at high risk for child abuse when compared with normative data for parents with no known history of abuse. Higher risk for abusive parenting was associated with a higher level of depressive symptoms, less education, and unemployment. The subscales, Inappropriate Expectations and Parental Value of Corporal Punishment demonstrated poor internal consistency with Cronbach's alphas of .40 and .54, respectively. Hypothesis testing supported the construct validity of the AAPI. Bavolek's 4-factor structure was not supported. A 19-item modified version of the AAPI with three dimensions was identified. This modified version of the AAPI may provide a more efficacious tool for use with low-income single mothers.

  15. Language-related differential item functioning between English and German PROMIS Depression items is negligible.

    Science.gov (United States)

    Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias

    2017-12-01

    To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.

  16. Assessing the Straightforwardly-Worded Brief Fear of Negative Evaluation Scale for Differential Item Functioning Across Gender and Ethnicity.

    Science.gov (United States)

    Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael

    2015-06-01

    The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.

  17. A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

    Science.gov (United States)

    Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

    2018-04-10

    To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading .3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.

  18. [Impact of passing items above the ceiling on the assessment results of Peabody developmental motor scales].

    Science.gov (United States)

    Zhao, Gai; Bian, Yang; Li, Ming

    2013-12-18

    To analyze the impact of passing items above the roof level in the gross motor subtest of Peabody development motor scales (PDMS-2) on its assessment results. In the subtests of PDMS-2, 124 children from 1.2 to 71 months were administered. Except for the original scoring method, a new scoring method which includes passing items above the ceiling were developed. The standard scores and quotients of the two scoring methods were compared using the independent-samples t test. Only one child could pass the items above the ceiling in the stationary subtest, 19 children in the locomotion subtest, and 17 children in the visual-motor integration subtest. When the scores of these passing items were included in the raw scores, the total raw scores got the added points of 1-12, the standard scores added 0-1 points and the motor quotients added 0-3 points. The diagnostic classification was changed only in two children. There was no significant difference between those two methods about motor quotients or standard scores in the specific subtest (P>0.05). The passing items above a ceiling of PDMS-2 isn't a rare situation. It usually takes place in the locomotion subtest and visual-motor integration subtest. Including these passing items into the scoring system will not make significant difference in the standard scores of the subtests or the developmental motor quotients (DMQ), which supports the original setting of a ceiling established by upassing 3 items in a row. However, putting the passing items above the ceiling into the raw score will improve tracking of children's developmental trajectory and intervention effects.

  19. An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10).

    Science.gov (United States)

    Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul

    2018-03-01

    Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.

  20. The Single Item Literacy Screener: Evaluation of a brief instrument to identify limited reading ability

    Directory of Open Access Journals (Sweden)

    Chew Lisa D

    2006-03-01

    Full Text Available Abstract Background Reading skills are important for accessing health information, using health care services, managing one's health and achieving desirable health outcomes. Our objective was to assess the diagnostic accuracy of the Single Item Literacy Screener (SILS to identify limited reading ability, one component of health literacy, as measured by the S-TOFHLA. Methods Cross-sectional interview with 999 adults with diabetes residing in Vermont and bordering states. Participants were randomly recruited from Primary Care practices in the Vermont Diabetes Information System June 2003 – December 2004. The main outcome was limited reading ability. The primary predictor was the SILS. Results Of the 999 persons screened, 169 (17% had limited reading ability. The sensitivity of the SILS in detecting limited reading ability was 54% [95% CI: 47%, 61%] and the specificity was 83% [95% CI: 81%, 86%] with an area under the Receiver Operating Characteristics Curve (ROC of 0.73 [95% CI: 0.69, 0.78]. Seven hundred seventy (77% screened negative on the SILS and 692 of these subjects had adequate reading skills (negative predictive value = 0.90 [95% CI: 0.88, 0.92]. Of the 229 who scored positive on the SILS, 92 had limited reading ability (positive predictive value = 0.4 [95% CI: 0.34, 0.47]. Conclusion The SILS is a simple instrument designed to identify patients with limited reading ability who need help reading health-related materials. The SILS performs moderately well at ruling out limited reading ability in adults and allows providers to target additional assessment of health literacy skills to those most in need. Further study of the use of the SILS in clinical settings and with more diverse populations is warranted.

  1. Reliability and validity of the Spanish version of the 10-item Connor-Davidson Resilience Scale (10-item CD-RISC in young adults

    Directory of Open Access Journals (Sweden)

    García-Campayo Javier

    2011-08-01

    Full Text Available Abstract Background The 10-item Connor-Davidson Resilience Scale (10-item CD-RISC is an instrument for measuring resilience that has shown good psychometric properties in its original version in English. The aim of this study was to evaluate the validity and reliability of the Spanish version of the 10-item CD-RISC in young adults and to verify whether it is structured in a single dimension as in the original English version. Findings Cross-sectional observational study including 681 university students ranging in age from 18 to 30 years. The number of latent factors in the 10 items of the scale was analyzed by exploratory factor analysis. Confirmatory factor analysis was used to verify whether a single factor underlies the 10 items of the scale as in the original version in English. The convergent validity was analyzed by testing whether the mean of the scores of the mental component of SF-12 (MCS and the quality of sleep as measured with the Pittsburgh Sleep Index (PSQI were higher in subjects with better levels of resilience. The internal consistency of the 10-item CD-RISC was estimated using the Cronbach α test and test-retest reliability was estimated with the intraclass correlation coefficient. The Cronbach α coefficient was 0.85 and the test-retest intraclass correlation coefficient was 0.71. The mean MCS score and the level of quality of sleep in both men and women were significantly worse in subjects with lower resilience scores. Conclusions The Spanish version of the 10-item CD-RISC showed good psychometric properties in young adults and thus can be used as a reliable and valid instrument for measuring resilience. Our study confirmed that a single factor underlies the resilience construct, as was the case of the original scale in English.

  2. Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  3. Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  4. Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  5. Identifying predictors of physics item difficulty: A linear regression approach

    Science.gov (United States)

    Mesic, Vanes; Muratovic, Hasnija

    2011-06-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge

  6. Identifying predictors of physics item difficulty: A linear regression approach

    Directory of Open Access Journals (Sweden)

    Hasnija Muratovic

    2011-06-01

    Full Text Available Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal

  7. Item-focussed Trees for the Identification of Items in Differential Item Functioning.

    Science.gov (United States)

    Tutz, Gerhard; Berger, Moritz

    2016-09-01

    A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.

  8. Vegetable parenting practices scale: Item response modeling analyses

    Science.gov (United States)

    Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...

  9. Item response theory at subject- and group-level

    NARCIS (Netherlands)

    Tobi, Hilde

    1990-01-01

    This paper reviews the literature about item response models for the subject level and aggregated level (group level). Group-level item response models (IRMs) are used in the United States in large-scale assessment programs such as the National Assessment of Educational Progress and the California

  10. Examination of validity of fall risk assessment items for screening high fall risk elderly among the healthy community-dwelling Japanese population

    OpenAIRE

    DEMURA, Shinichi; SATO, Susumu; YAMAJI, Shunsuke; KASUGA, Kosho; NAGASAWA, Yoshinori

    2010-01-01

    We aimed to examine the validity of fall risk assessment items for the healthy community-dwelling elderly Japanese population. Participants were 1122 healthy elderly individuals aged 60 years and over (380 males and 742 females). The percentage who had experienced a fall was 15.8%. This study used fall experience and 50 fall risk assessment items representing the five risk factors (symptoms of falling, physical function, disease and physical symptom, environment, and behavior and character), ...

  11. Assessment of chromium(VI) release from 848 jewellery items by use of a diphenylcarbazide spot test

    DEFF Research Database (Denmark)

    Bregnbak, David; Johansen, Jeanne D.; Hamann, Dathan

    2016-01-01

    We recently evaluated and validated a diphenylcarbazide(DPC)-based screening spot test that can detect the release of chromium(VI) ions (≥0.5 ppm) from various metallic items and leather goods (1). We then screened a selection of metal screws, leather shoes, and gloves, as well as 50 earrings......, and identified chromium(VI) release from one earring. In the present study, we used the DPC spot test to assess chromium(VI) release in a much larger sample of jewellery items (n=848), 160 (19%) of which had previously be shown to contain chromium when analysed with X-ray fluorescence spectroscopy (2)....

  12. Spare Items validation

    International Nuclear Information System (INIS)

    Fernandez Carratala, L.

    1998-01-01

    There is an increasing difficulty for purchasing safety related spare items, with certifications by manufacturers for maintaining the original qualifications of the equipment of destination. The main reasons are, on the top of the logical evolution of technology, applied to the new manufactured components, the quitting of nuclear specific production lines and the evolution of manufacturers quality systems, originally based on nuclear codes and standards, to conventional industry standards. To face this problem, for many years different Dedication processes have been implemented to verify whether a commercial grade element is acceptable to be used in safety related applications. In the same way, due to our particular position regarding the spare part supplies, mainly from markets others than the american, C.N. Trillo has developed a methodology called Spare Items Validation. This methodology, which is originally based on dedication processes, is not a single process but a group of coordinated processes involving engineering, quality and management activities. These are to be performed on the spare item itself, its design control, its fabrication and its supply for allowing its use in destinations with specific requirements. The scope of application is not only focussed on safety related items, but also to complex design, high cost or plant reliability related components. The implementation in C.N. Trillo has been mainly curried out by merging, modifying and making the most of processes and activities which were already being performed in the company. (Author)

  13. Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  14. Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  15. Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

    Science.gov (United States)

    Sinharay, Sandip

    2017-09-01

    Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.

  16. Negative Affect Impairs Associative Memory but Not Item Memory

    Science.gov (United States)

    Bisby, James A.; Burgess, Neil

    2014-01-01

    The formation of associations between items and their context has been proposed to rely on mechanisms distinct from those supporting memory for a single item. Although emotional experiences can profoundly affect memory, our understanding of how it interacts with different aspects of memory remains unclear. We performed three experiments to examine…

  17. Comparison of single questions and brief questionnaire with longer validated food frequency questionnaire to assess adequate fruit and vegetable intake.

    Science.gov (United States)

    Cook, Amelia; Roberts, Kia; O'Leary, Fiona; Allman-Farinelli, Margaret Anne

    2015-01-01

    The aim of this study was to determine if a single question (SQ) for fruit and a SQ or five-item questionnaire for vegetable consumption (VFQ) could replace a longer food frequency questionnaire (FFQ) to screen for inadequate versus adequate intakes in populations. Participants (109) completed three test screeners: fruit SQ, vegetable SQ, and a five-item VFQ followed by the reference 74-item FFQ (version 2 of the Dietary Questionnaire for Epidemiological Studies [DQESv2]) including 13 fruit and 25 vegetable items. The five-item VFQ asked about intake of salad vegetables, cooked vegetables, white potatoes, legumes, and vegetable juice. The screeners were compared with the reference (DQESv2 FFQ) for sensitivity, specificity, and positive and negative predictive powers (PPV, NPV) to detect intakes of two or more servings of fruit and three or more servings of vegetables. Relative validity was examined using Bland-Altman statistics. The fruit SQ showed a PPV of 56% and an NPV of 83%. The PPV for the vegetable SQ was 30% and the NPV was 89%. For the five-item VFQ, the PPV was 39% and the NPV was 85%. Bland-Altman plots and linear regression equations showed that although the screener showed good agreement for fruit (unstandardized b1 coefficient = 0.04) for vegetable intake the difference between methods increased at higher intake levels (unstandardized b1 coefficients = -0.3 for the SQ, b1 = -0.6 for five-item VFQ). The fruit SQ and the five-item VFQ are suitable replacements for longer FFQs to detect inadequate intake and assess population mean but not individual intakes. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

    Science.gov (United States)

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  19. An NCME Instructional Module on Polytomous Item Response Theory Models

    Science.gov (United States)

    Penfield, Randall David

    2014-01-01

    A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…

  20. Can Item Keyword Feedback Help Remediate Knowledge Gaps?

    Science.gov (United States)

    Feinberg, Richard A; Clauser, Amanda L

    2016-10-01

    In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.

  1. Effects of Reducing the Cognitive Load of Mathematics Test Items on Student Performance

    Directory of Open Access Journals (Sweden)

    Susan C. Gillmor

    2015-01-01

    Full Text Available This study explores a new item-writing framework for improving the validity of math assessment items. The authors transfer insights from Cognitive Load Theory (CLT, traditionally used in instructional design, to educational measurement. Fifteen, multiple-choice math assessment items were modified using research-based strategies for reducing extraneous cognitive load. An experimental design with 222 middle-school students tested the effects of the reduced cognitive load items on student performance and anxiety. Significant findings confirm the main research hypothesis that reducing the cognitive load of math assessment items improves student performance. Three load-reducing item modifications are identified as particularly effective for reducing item difficulty: signalling important information, aesthetic item organization, and removing extraneous content. Load reduction was not shown to impact student anxiety. Implications for classroom assessment and future research are discussed.

  2. Concurrent validity and sensitivity to change of Direct Behavior Rating Single-Item Scales (DBR-SIS) within an elementary sample.

    Science.gov (United States)

    Smith, Rhonda L; Eklund, Katie; Kilgus, Stephen P

    2018-03-01

    The purpose of this study was to evaluate the concurrent validity, sensitivity to change, and teacher acceptability of Direct Behavior Rating single-item scales (DBR-SIS), a brief progress monitoring measure designed to assess student behavioral change in response to intervention. Twenty-four elementary teacher-student dyads implemented a daily report card intervention to promote positive student behavior during prespecified classroom activities. During both baseline and intervention, teachers completed DBR-SIS ratings of 2 target behaviors (i.e., Academic Engagement, Disruptive Behavior) whereas research assistants collected systematic direct observation (SDO) data in relation to the same behaviors. Five change metrics (i.e., absolute change, percent of change from baseline, improvement rate difference, Tau-U, and standardized mean difference; Gresham, 2005) were calculated for both DBR-SIS and SDO data, yielding estimates of the change in student behavior in response to intervention. Mean DBR-SIS scores were predominantly moderately to highly correlated with SDO data within both baseline and intervention, demonstrating evidence of the former's concurrent validity. DBR-SIS change metrics were also significantly correlated with SDO change metrics for both Disruptive Behavior and Academic Engagement, yielding evidence of the former's sensitivity to change. In addition, teacher Usage Rating Profile-Assessment (URP-A) ratings indicated they found DBR-SIS to be acceptable and usable. Implications for practice, study limitations, and areas of future research are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  3. Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

    NARCIS (Netherlands)

    Andriessen, T.M.J.C.; Jong, B. de; Jacobs, B.; Werf, S.P. van der; Vos, P.E.

    2009-01-01

    PRIMARY OBJECTIVE: To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). METHODS: Daily

  4. Item validity vs. item discrimination index: a redundancy?

    Science.gov (United States)

    Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

    2018-03-01

    In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.

  5. Assessing errors related to characteristics of the items measured

    International Nuclear Information System (INIS)

    Liggett, W.

    1980-01-01

    Errors that are related to some intrinsic property of the items measured are often encountered in nuclear material accounting. An example is the error in nondestructive assay measurements caused by uncorrected matrix effects. Nuclear material accounting requires for each materials type one measurement method for which bounds on these errors can be determined. If such a method is available, a second method might be used to reduce costs or to improve precision. If the measurement error for the first method is longer-tailed than Gaussian, then precision might be improved by measuring all items by both methods. 8 refs

  6. Analysis of Item-Level Bias in the Bayley-III Language Subscales: The Validity and Utility of Standardized Language Assessment in a Multilingual Setting.

    Science.gov (United States)

    Goh, Shaun K Y; Tham, Elaine K H; Magiati, Iliana; Sim, Litwee; Sanmugam, Shamini; Qiu, Anqi; Daniel, Mary L; Broekman, Birit F P; Rifkin-Graboi, Anne

    2017-09-18

    The purpose of this study was to improve standardized language assessments among bilingual toddlers by investigating and removing the effects of bias due to unfamiliarity with cultural norms or a distributed language system. The Expressive and Receptive Bayley-III language scales were adapted for use in a multilingual country (Singapore). Differential item functioning (DIF) was applied to data from 459 two-year-olds without atypical language development. This involved investigating if the probability of success on each item varied according to language exposure while holding latent language ability, gender, and socioeconomic status constant. Associations with language, behavioral, and emotional problems were also examined. Five of 16 items showed DIF, 1 of which may be attributed to cultural bias and another to a distributed language system. The remaining 3 items favored toddlers with higher bilingual exposure. Removal of DIF items reduced associations between language scales and emotional and language problems, but improved the validity of the expressive scale from poor to good. Our findings indicate the importance of considering cultural and distributed language bias in standardized language assessments. We discuss possible mechanisms influencing performance on items favoring bilingual exposure, including the potential role of inhibitory processing.

  7. Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

    Directory of Open Access Journals (Sweden)

    Suttida Rakkapao

    2016-10-01

    Full Text Available This study investigated the multiple-choice test of understanding of vectors (TUV, by applying item response theory (IRT. The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test’s distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.

  8. Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

    Science.gov (United States)

    Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

    2016-12-01

    This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.

  9. Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.

    Science.gov (United States)

    Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J

    2018-02-01

    Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.

  10. Alzheimer's Disease Assessment: A Review and Illustrations Focusing on Item Response Theory Techniques.

    Science.gov (United States)

    Balsis, Steve; Choudhury, Tabina K; Geraci, Lisa; Benge, Jared F; Patrick, Christopher J

    2018-04-01

    Alzheimer's disease (AD) affects neurological, cognitive, and behavioral processes. Thus, to accurately assess this disease, researchers and clinicians need to combine and incorporate data across these domains. This presents not only distinct methodological and statistical challenges but also unique opportunities for the development and advancement of psychometric techniques. In this article, we describe relatively recent research using item response theory (IRT) that has been used to make progress in assessing the disease across its various symptomatic and pathological manifestations. We focus on applications of IRT to improve scoring, test development (including cross-validation and adaptation), and linking and calibration. We conclude by describing potential future multidimensional applications of IRT techniques that may improve the precision with which AD is measured.

  11. Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.

    Science.gov (United States)

    Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li

    2014-09-01

    The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. The comparability of English, French and Dutch scores on the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F: an assessment of differential item functioning in patients with systemic sclerosis.

    Directory of Open Access Journals (Sweden)

    Linda Kwakkenbos

    Full Text Available The Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc patients.The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC model was utilized to assess differential item functioning (DIF, comparing English versus French and versus Dutch patient responses separately.A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference.There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics.

  13. Does the Assessment of Recovery Capital scale reflect a single or multiple domains?

    Science.gov (United States)

    Arndt, Stephan; Sahker, Ethan; Hedden, Suzy

    2017-01-01

    The goal of this study was to determine whether the 50-item Assessment of Recovery Capital scale represents a single general measure or whether multiple domains might be psychometrically useful for research or clinical applications. Data are from a cross-sectional de-identified existing program evaluation information data set with 1,138 clients entering substance use disorder treatment. Principal components and iterated factor analysis were used on the domain scores. Multiple group factor analysis provided a quasi-confirmatory factor analysis. The solution accounted for 75.24% of the total variance, suggesting that 10 factors provide a reasonably good fit. However, Tucker's congruence coefficients between the factor structure and defining weights (0.41-0.52) suggested a poor fit to the hypothesized 10-domain structure. Principal components of the 10-domain scores yielded one factor whose eigenvalue was greater than one (5.93), accounting for 75.8% of the common variance. A few domains had perceptible but small unique variance components suggesting that a few of the domains may warrant enrichment. Our findings suggest that there is one general factor, with a caveat. Using the 10 measures inflates the chance for Type I errors. Using one general measure avoids this issue, is simple to interpret, and could reduce the number of items. However, those seeking to maximally predict later recovery success may need to use the full instrument and all 10 domains.

  14. Item response modeling: a psychometric assessment of the children's fruit, vegetable, water, and physical activity self-efficacy scales among Chinese children.

    Science.gov (United States)

    Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C

    2017-09-16

    This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.

  15. Proposta de um instrumento de medida para avaliar a satisfação de clientes de bancos utilizando a Teoria da Resposta ao Item Proposal of tool to assess the satisfaction of bank customers using the Item Response Theory

    Directory of Open Access Journals (Sweden)

    Alceu Balbim Junior

    2011-01-01

    Full Text Available Este artigo apresenta um instrumento de medida para avaliação da satisfação de clientes de bancos utilizando a Teoria da Resposta ao Item (TRI. Satisfazer os clientes tem sido uma busca constante das organizações que procuram manterem-se competitivas no mercado. Estudos constatam a relação entre a qualidade percebida pelos clientes, a satisfação e fidelidade. A avaliação da satisfação pode ser realizada por meio da qualidade percebida pelos clientes e a construção de ferramentas de avaliação deve contemplar características específicas da atividade em questão. Embasando-se em artigos que avaliam a satisfação de clientes de bancos, propõe-se um instrumento formado por 29 itens. Os itens foram aplicados a 240 clientes a fim de avaliar a satisfação com o banco de maior relacionamento. Utilizando a Teoria da Resposta ao Item, foram identificados os parâmetros dos itens e a curva de informação. A análise do grau de discriminação dos itens indicou que todos são apropriados. A curva de informação obtida evidenciou o intervalo no qual o instrumento apresenta melhores estimativas para níveis de satisfação. O trabalho apresentou o nível médio de satisfação da amostra e a concentração de clientes nos diferentes níveis de satisfação da escala.This paper presents a model for assessing the satisfaction of bank customers using the Item Response Theory (IRT. Organizations are constantly making effort to satisfy customers seeking to remain competitive. Several studies have reported on the relationship between perceived quality, satisfaction, and loyalty. The assessment of satisfaction can be accomplished through the perceived quality, and the development of assessment tools should address specific features of the activity in question. Based on articles that assess the satisfaction of bank customers, this study proposes an assessment tool consisting of 29 items. The items were applied to 240 clients to assess their

  16. Negative affect impairs associative memory but not item memory.

    Science.gov (United States)

    Bisby, James A; Burgess, Neil

    2013-12-17

    The formation of associations between items and their context has been proposed to rely on mechanisms distinct from those supporting memory for a single item. Although emotional experiences can profoundly affect memory, our understanding of how it interacts with different aspects of memory remains unclear. We performed three experiments to examine the effects of emotion on memory for items and their associations. By presenting neutral and negative items with background contexts, Experiment 1 demonstrated that item memory was facilitated by emotional affect, whereas memory for an associated context was reduced. In Experiment 2, arousal was manipulated independently of the memoranda, by a threat of shock, whereby encoding trials occurred under conditions of threat or safety. Memory for context was equally impaired by the presence of negative affect, whether induced by threat of shock or a negative item, relative to retrieval of the context of a neutral item in safety. In Experiment 3, participants were presented with neutral and negative items as paired associates, including all combinations of neutral and negative items. The results showed both above effects: compared to a neutral item, memory for the associate of a negative item (a second item here, context in Experiments 1 and 2) is impaired, whereas retrieval of the item itself is enhanced. Our findings suggest that negative affect impairs associative memory while recognition of a negative item is enhanced. They support dual-processing models in which negative affect or stress impairs hippocampal-dependent associative memory while the storage of negative sensory/perceptual representations is spared or even strengthened.

  17. Assessment of acquired capability for suicide in clinical practice.

    Science.gov (United States)

    Rimkeviciene, Jurgita; Hawgood, Jacinta; O'Gorman, John; De Leo, Diego

    2016-12-01

    The Interpersonal Psychological Theory of suicide proposes that the interaction between Thwarted Belongingness, Perceived Burdensomeness, and Acquired Capability for Suicide (ACS) predicts proximal risk of death by suicide. Instruments to assess all three constructs are available. However, research on the validity of one of them, the acquired capability for suicide scale (ACSS), has been limited, especially in terms of its clinical relevance. This study aimed to explore the utility of the different versions of the ACSS in clinical assessment. Three versions of the scale were investigated, the full 20-item version, a 7-item version and a single item version representing self-perceived capability for suicide. In a sample of patients recruited from a clinic specialising in the treatment of suicidality and in a community sample, all versions of the ACSS were found to show reasonable levels of reliability and to correlate as expected with reports of suicidal ideation, self-harm, and attempted suicide. The item assessing self-perceived acquired capacity for suicide showed highest correlations with all levels of suicidal behaviour. However, no version of the ACSS on its own showed a capacity to indicate suicide attempts in the combined sample. It is concluded that the versions of the scale have construct validity, but their clinical utility is limited. An assessment using a single item on self-perceived ACS outperforms the full and shortened versions of ACSS in clinical settings and can be recommended with caution for clinicians interested in assessing this characteristic.

  18. Analyzing force concept inventory with item response theory

    Science.gov (United States)

    Wang, Jing; Bao, Lei

    2010-10-01

    Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.

  19. 'Do you think you suffer from depression?' Reevaluating the use of a single item question for the screening of depression in older primary care patients

    DEFF Research Database (Denmark)

    Ayalon, Liat; Goldfracht, Margalit; Bech, Per

    2010-01-01

    OBJECTIVES: The majority of older adults seek depression treatment in primary care. Despite impressive efforts to integrate depression treatment into primary care, depression often remains undetected. The overall goal of the present study was to compare a single item screening for depression...... to existing depression screening tools. METHODS: A cross sectional sample of 153 older primary care patients. Participants completed several depression-screening measures (e.g. a single depression screen, Patient Health Questionnaire-9, Major Depression Inventory, Visual Analogue Scale). Measures were......: An easy way to detect depression in older primary care patients would be asking the single question, 'do you think you suffer from depression?'...

  20. The Comparability of English, French and Dutch Scores on the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F): An Assessment of Differential Item Functioning in Patients with Systemic Sclerosis

    Science.gov (United States)

    Kwakkenbos, Linda; Willems, Linda M.; Baron, Murray; Hudson, Marie; Cella, David; van den Ende, Cornelia H. M.; Thombs, Brett D.

    2014-01-01

    Objective The Functional Assessment of Chronic Illness Therapy- Fatigue (FACIT-F) is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc) patients. Methods The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF), comparing English versus French and versus Dutch patient responses separately. Results A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD) lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference. Conclusions There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics. PMID:24638101

  1. Criterion validity of the Short Mood and Feelings Questionnaire and one- and two-item depression screens in young adolescents

    Directory of Open Access Journals (Sweden)

    McCauley Elizabeth

    2010-02-01

    Full Text Available Abstract Background The use of short screening questionnaires may be a promising option for identifying children at risk for depression in a community setting. The objective of this study was to assess the validity of the Short Mood and Feelings Questionnaire (SMFQ and one- and two-item screening instruments for depressive disorders in a school-based sample of young adolescents. Methods Participants were 521 sixth-grade students attending public middle schools. Child and parent versions of the SMFQ were administered to evaluate the child's depressive symptoms. The presence of any depressive disorder during the previous month was assessed using the Diagnostic Interview Schedule for Children (DISC as the criterion standard. First, we assessed the diagnostic accuracy of child, parent, and combined scores of the full 13-item SMFQ by calculating the area under the receiver operating characteristic curve (AUC, sensitivity and specificity. The same approach was then used to evaluate the accuracy of a two-item scale consisting of only depressed mood and anhedonia items, and a single depressed mood item. Results The combined child + parent SMFQ score showed the highest accuracy (AUC = 0.86. Diagnostic accuracy was lower for child (AUC = 0.73 and parent (AUC = 0.74 SMFQ versions. Corresponding versions of one- and two-item screens had lower AUC estimates, but the combined versions of the brief screens each still showed moderate accuracy. Furthermore, child and combined versions of the two-item screen demonstrated higher sensitivity (although lower specificity than either the one-item screen or the full SMFQ. Conclusions Under conditions where parents accompany children to screening settings (e.g. primary care, use of a child + parent version of the SMFQ is recommended. However, when parents are not available, and the cost of a false positive result is minimal, then a one- or two-item screen may be useful for initial identification of at-risk youth.

  2. Validity and reliability of the TED-QOL: a new three-item questionnaire to assess quality of life in thyroid eye disease.

    Science.gov (United States)

    Fayers, Tessa; Dolman, Peter J

    2011-12-01

    To develop and test a user-friendly questionnaire for rapidly assessing quality of life (QOL) in thyroid eye disease (TED). A three-item questionnaire, the TED-QOL, was designed and compared to the 16-item Graves Ophthalmopathy (GO)-QOL and the nine-item GO-Quality of Life Scale (QLS). 100 patients with TED were administered all three questionnaires on two occasions. Results were compared to clinical severity scores (Vision, Inflammation, Strabismus, Appearance (VISA) classification). Main outcomes were construct and criterion validity, test-retest reliability, duration, comprehension and completion rates. TED-QOL correlated strongly with the other questionnaires for corresponding items (Pearson correlation: appearance 0.71, 0.62; functioning 0.69, 0.66; overall QOL 0.53). Test-retest analysis demonstrated good reliability for all three questionnaires (intraclass correlations: TED-QOL 0.81, 0.74, 0.87; GO-QOL 0.81, 0.82; GO-QLS 0.74, 0.86, 0.67). TED-QOL was significantly faster to complete (1.6 min vs GO-QOL 3.1 min, GO-QLS 2.7 min, p<0.0001) and had a higher completion rate (100% vs GO-QOL 78%, GO-QLS 94%). There was only moderate correlation between items on all three questionnaires and VISA scores. The TED-QOL is rapid and easy to complete and analyse and has similar validity and reliability to longer questionnaires. All questionnaires showed only moderate correlation with disease severity, emphasising the discrepancy between objective and subjective assessments and the importance of measuring both.

  3. Methodology for the development and calibration of the SCI-QOL item banks.

    Science.gov (United States)

    Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David

    2015-05-01

    To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.

  4. Sources of interference in item and associative recognition memory.

    Science.gov (United States)

    Osth, Adam F; Dennis, Simon

    2015-04-01

    A powerful theoretical framework for exploring recognition memory is the global matching framework, in which a cue's memory strength reflects the similarity of the retrieval cues being matched against the contents of memory simultaneously. Contributions at retrieval can be categorized as matches and mismatches to the item and context cues, including the self match (match on item and context), item noise (match on context, mismatch on item), context noise (match on item, mismatch on context), and background noise (mismatch on item and context). We present a model that directly parameterizes the matches and mismatches to the item and context cues, which enables estimation of the magnitude of each interference contribution (item noise, context noise, and background noise). The model was fit within a hierarchical Bayesian framework to 10 recognition memory datasets that use manipulations of strength, list length, list strength, word frequency, study-test delay, and stimulus class in item and associative recognition. Estimates of the model parameters revealed at most a small contribution of item noise that varies by stimulus class, with virtually no item noise for single words and scenes. Despite the unpopularity of background noise in recognition memory models, background noise estimates dominated at retrieval across nearly all stimulus classes with the exception of high frequency words, which exhibited equivalent levels of context noise and background noise. These parameter estimates suggest that the majority of interference in recognition memory stems from experiences acquired before the learning episode. (c) 2015 APA, all rights reserved).

  5. A more general model for testing measurement invariance and differential item functioning.

    Science.gov (United States)

    Bauer, Daniel J

    2017-09-01

    The evaluation of measurement invariance is an important step in establishing the validity and comparability of measurements across individuals. Most commonly, measurement invariance has been examined using 1 of 2 primary latent variable modeling approaches: the multiple groups model or the multiple-indicator multiple-cause (MIMIC) model. Both approaches offer opportunities to detect differential item functioning within multi-item scales, and thereby to test measurement invariance, but both approaches also have significant limitations. The multiple groups model allows 1 to examine the invariance of all model parameters but only across levels of a single categorical individual difference variable (e.g., ethnicity). In contrast, the MIMIC model permits both categorical and continuous individual difference variables (e.g., sex and age) but permits only a subset of the model parameters to vary as a function of these characteristics. The current article argues that moderated nonlinear factor analysis (MNLFA) constitutes an alternative, more flexible model for evaluating measurement invariance and differential item functioning. We show that the MNLFA subsumes and combines the strengths of the multiple group and MIMIC models, allowing for a full and simultaneous assessment of measurement invariance and differential item functioning across multiple categorical and/or continuous individual difference variables. The relationships between the MNLFA model and the multiple groups and MIMIC models are shown mathematically and via an empirical demonstration. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  6. Item response theory - A first approach

    Science.gov (United States)

    Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

    2017-07-01

    The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.

  7. Item response theory analysis of the Pain Self-Efficacy Questionnaire.

    Science.gov (United States)

    Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

    2017-01-01

    The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain

  8. A study of the psychometric properties of 12-item World Health Organization Disability Assessment Schedule 2.0 in a large population of people with chronic musculoskeletal pain.

    Science.gov (United States)

    Saltychev, Mikhail; Bärlund, Esa; Mattie, Ryan; McCormick, Zachary; Paltamaa, Jaana; Laimi, Katri

    2017-02-01

    To assess the validity of the Finnish translation of the 12-item World Health Organization Disability Assessment Schedule (WHODAS 2.0). Cross-sectional cohort survey study. Physical and Rehabilitation Medicine outpatient university clinic. The 501 consecutive patients with chronic musculoskeletal pain. Exploratory factor analysis and a graded response model using item response theory analysis were used to assess the constructs and discrimination ability of WHODAS 2.0. The exploratory factor analysis revealed two retained factors with eigenvalues 5.15 and 1.04. Discrimination ability of all items was high or perfect, varying from 1.2 to 2.5. The difficulty levels of seven out of 12 items were shifted towards the elevated disability level. As a result, the entire test characteristic curve showed a shift towards higher levels of disability, placing it at the point of disability level of +1 (where 0 indicates the average level of disability within the sample). The present data indicate that the Finnish translation of the 12-item WHODAS 2.0 is a valid instrument for measuring restrictions of activity and participation among patients with chronic musculoskeletal pain.

  9. Sensitivity and specificity of the 3-item memory test in the assessment of post traumatic amnesia.

    Science.gov (United States)

    Andriessen, Teuntje M J C; de Jong, Ben; Jacobs, Bram; van der Werf, Sieberen P; Vos, Pieter E

    2009-04-01

    To investigate how the type of stimulus (pictures or words) and the method of reproduction (free recall or recognition after a short or a long delay) affect the sensitivity and specificity of a 3-item memory test in the assessment of post traumatic amnesia (PTA). Daily testing was performed in 64 consecutively admitted traumatic brain injured patients, 22 orthopedically injured patients and 26 healthy controls until criteria for resolution of PTA were reached. Subjects were randomly assigned to a test with visual or verbal stimuli. Short delay reproduction was tested after an interval of 3-5 minutes, long delay reproduction was tested after 24 hours. Sensitivity and specificity were calculated over the first 4 test days. The 3-word test showed higher sensitivity than the 3-picture test, while specificity of the two tests was equally high. Free recall was a more effortful task than recognition for both patients and controls. In patients, a longer delay between registration and recall resulted in a significant decrease in the number of items reproduced. Presence of PTA is best assessed with a memory test that incorporates the free recall of words after a long delay.

  10. Item response theory analysis of Centers for Disease Control and Prevention Health-Related Quality of Life (CDC HRQOL) items in adults with arthritis.

    Science.gov (United States)

    Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C

    2016-03-12

    Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.

  11. Quantitative Analysis of Complex Multiple-Choice Items in Science Technology and Society: Item Scaling

    Directory of Open Access Journals (Sweden)

    Ángel Vázquez Alonso

    2005-05-01

    Full Text Available The scarce attention to assessment and evaluation in science education research has been especially harmful for Science-Technology-Society (STS education, due to the dialectic, tentative, value-laden, and controversial nature of most STS topics. To overcome the methodological pitfalls of the STS assessment instruments used in the past, an empirically developed instrument (VOSTS, Views on Science-Technology-Society have been suggested. Some methodological proposals, namely the multiple response models and the computing of a global attitudinal index, were suggested to improve the item implementation. The final step of these methodological proposals requires the categorization of STS statements. This paper describes the process of categorization through a scaling procedure ruled by a panel of experts, acting as judges, according to the body of knowledge from history, epistemology, and sociology of science. The statement categorization allows for the sound foundation of STS items, which is useful in educational assessment and science education research, and may also increase teachers’ self-confidence in the development of the STS curriculum for science classrooms.

  12. Gender Invariance of the Gambling Behavior Scale for Adolescents (GBS-A): An Analysis of Differential Item Functioning Using Item Response Theory.

    Science.gov (United States)

    Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina

    2017-01-01

    As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.

  13. The Role of Item Models in Automatic Item Generation

    Science.gov (United States)

    Gierl, Mark J.; Lai, Hollis

    2012-01-01

    Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…

  14. Selection of multiple cued items is possible during visual short-term memory maintenance.

    Science.gov (United States)

    Matsukura, Michi; Vecera, Shaun P

    2015-07-01

    Recent neuroimaging studies suggest that maintenance of a selected object feature held in visual short-term/working memory (VSTM/VWM) is supported by the same neural mechanisms that encode the sensory information. If VSTM operates by retaining "reasonable copies" of scenes constructed during sensory processing (Serences, Ester, Vogel, & Awh, 2009, p. 207, the sensory recruitment hypothesis), then attention should be able to select multiple items represented in VSTM as long as the number of these attended items does not exceed the typical VSTM capacity. It is well known that attention can select at least two noncontiguous locations at the same time during sensory processing. However, empirical reports from the studies that examined this possibility are inconsistent. In the present study, we demonstrate that (1) attention can indeed select more than a single item during VSTM maintenance when observers are asked to recognize a set of items in the manner that these items were originally attended, and (2) attention can select multiple cued items regardless of whether these items are perceptually organized into a single group (contiguous locations) or not (noncontiguous locations). The results also replicate and extend the recent finding that selective attention that operates during VSTM maintenance is sensitive to the observers' goal and motivation to use the cueing information.

  15. Problems with the factor analysis of items: Solutions based on item response theory and item parcelling

    Directory of Open Access Journals (Sweden)

    Gideon P. De Bruin

    2004-10-01

    Full Text Available The factor analysis of items often produces spurious results in the sense that unidimensional scales appear multidimensional. This may be ascribed to failure in meeting the assumptions of linearity and normality on which factor analysis is based. Item response theory is explicitly designed for the modelling of the non-linear relations between ordinal variables and provides a strong alternative to the factor analysis of items. Items may also be combined in parcels that are more likely to satisfy the assumptions of factor analysis than do the items. The use of the Rasch rating scale model and the factor analysis of parcels is illustrated with data obtained with the Locus of Control Inventory. The results of these analyses are compared with the results obtained through the factor analysis of items. It is shown that the Rasch rating scale model and the factoring of parcels produce superior results to the factor analysis of items. Recommendations for the analysis of scales are made. Opsomming Die faktorontleding van items lewer dikwels misleidende resultate op, veral in die opsig dat eendimensionele skale as meerdimensioneel voorkom. Hierdie resultate kan dikwels daaraan toegeskryf word dat daar nie aan die aannames van lineariteit en normaliteit waarop faktorontleding berus, voldoen word nie. Itemresponsteorie, wat eksplisiet vir die modellering van die nie-liniêre verbande tussen ordinale items ontwerp is, bied ’n aantreklike alternatief vir die faktorontleding van items. Items kan ook in pakkies gegroepeer word wat meer waarskynlik aan die aannames van faktorontleding voldoen as individuele items. Die gebruik van die Rasch beoordelingskaalmodel en die faktorontleding van pakkies word aan die hand van data wat met die Lokus van Beheervraelys verkry is, gedemonstreer. Die resultate van hierdie ontledings word vergelyk met die resultate wat deur ‘n faktorontleding van die individuele items verkry is. Die resultate dui daarop dat die Rasch

  16. Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities.

    Science.gov (United States)

    Hong, Ickpyo; Velozo, Craig A; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L; Shulman, Lisa M

    2016-09-01

    The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R (2) less than 10 %). The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59-0.85) and acceptable internal consistency (Cronbach's alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms.

  17. Item selection via Bayesian IRT models.

    Science.gov (United States)

    Arima, Serena

    2015-02-10

    With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.

  18. Item information and discrimination functions for trinary PCM items

    NARCIS (Netherlands)

    Akkermans, Wies; Muraki, Eiji

    1997-01-01

    For trinary partial credit items the shape of the item information and the item discrimination function is examined in relation to the item parameters. In particular, it is shown that these functions are unimodal if δ2 – δ1 < 4 ln 2 and bimodal otherwise. The locations and values of the maxima are

  19. A tool for assessing case history and feedback skills in audiology students working with simulated patients.

    Science.gov (United States)

    Hughes, Jane; Wilson, Wayne J; MacBean, Naomi; Hill, Anne E

    2016-12-01

    To develop a tool for assessing audiology students taking a case history and giving feedback with simulated patients (SP). Single observation, single group design. Twenty-four first-year audiology students, five simulated patients, two clinical educators, and three evaluators. The Audiology Simulated Patient Interview Rating Scale (ASPIRS) was developed consisting of six items assessing specific clinical skills, non-verbal communication, verbal communication, interpersonal skills, interviewing skills, and professional practice skills. These items are applied once for taking a case history and again for giving feedback. The ASPIRS showed very high internal consistency (α = 0.91-0.97; mean inter-item r = 0.64-0.85) and fair-to-moderate agreement between evaluators (29.2-54.2% exact and 79.2-100% near agreement; κ weighted up to 0.60). It also showed fair-to-moderate absolute agreement amongst evaluators for single evaluator scores (intraclass correlation coefficient [ICC] r = 0.35-0.59) and substantial consistency of agreement amongst evaluators for three-evaluator averaged scores (ICC r = 0.62-0.81). Factor analysis showed the ASPIRS' 12 items fell into two components, one containing all feedback items and one containing all case history items. The ASPIRS shows promise as the first published tool for assessing audiology students taking a case history and giving feedback with an SP.

  20. Methods for Assessing Item, Step, and Threshold Invariance in Polytomous Items Following the Partial Credit Model

    Science.gov (United States)

    Penfield, Randall D.; Myers, Nicholas D.; Wolfe, Edward W.

    2008-01-01

    Measurement invariance in the partial credit model (PCM) can be conceptualized in several different but compatible ways. In this article the authors distinguish between three forms of measurement invariance in the PCM: step invariance, item invariance, and threshold invariance. Approaches for modeling these three forms of invariance are proposed,…

  1. Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

    Czech Academy of Sciences Publication Activity Database

    Martinková, Patrícia; Drabinová, Adéla; Liaw, Y.L.; Sanders, E.A.; McFarland, J.L.; Price, R.M.

    2017-01-01

    Roč. 16, č. 2 (2017), č. článku rm2. ISSN 1931-7913 R&D Projects: GA ČR GJ15-15856Y Grant - others:NSF(US) DUE-1043443 Institutional support: RVO:67985807 Keywords : differential item functioning * fairness * conceptual assessments * concept inventory * undergraduate education * bias Subject RIV: AM - Education OBOR OECD: Education , special (to gifted persons, those with learning disabilities) Impact factor: 3.930, year: 2016

  2. Evaluating an Automated Number Series Item Generator Using Linear Logistic Test Models

    Directory of Open Access Journals (Sweden)

    Bao Sheng Loe

    2018-04-01

    Full Text Available This study investigates the item properties of a newly developed Automatic Number Series Item Generator (ANSIG. The foundation of the ANSIG is based on five hypothesised cognitive operators. Thirteen item models were developed using the numGen R package and eleven were evaluated in this study. The 16-item ICAR (International Cognitive Ability Resource1 short form ability test was used to evaluate construct validity. The Rasch Model and two Linear Logistic Test Model(s (LLTM were employed to estimate and predict the item parameters. Results indicate that a single factor determines the performance on tests composed of items generated by the ANSIG. Under the LLTM approach, all the cognitive operators were significant predictors of item difficulty. Moderate to high correlations were evident between the number series items and the ICAR test scores, with high correlation found for the ICAR Letter-Numeric-Series type items, suggesting adequate nomothetic span. Extended cognitive research is, nevertheless, essential for the automatic generation of an item pool with predictable psychometric properties.

  3. Item Response Theory at Subject- and Group-Level. Research Report 90-1.

    Science.gov (United States)

    Tobi, Hilde

    This paper reviews the literature about item response models for the subject level and aggregated level (group level). Group-level item response models (IRMs) are used in the United States in large-scale assessment programs such as the National Assessment of Educational Progress and the California Assessment Program. In the Netherlands, these…

  4. Assessing the Equivalence of Paper, Mobile Phone, and Tablet Survey Responses at a Community Mental Health Center Using Equivalent Halves of a 'Gold-Standard' Depression Item Bank.

    Science.gov (United States)

    Brodey, Benjamin B; Gonzalez, Nicole L; Elkin, Kathryn Ann; Sasiela, W Jordan; Brodey, Inger S

    2017-09-06

    The computerized administration of self-report psychiatric diagnostic and outcomes assessments has risen in popularity. If results are similar enough across different administration modalities, then new administration technologies can be used interchangeably and the choice of technology can be based on other factors, such as convenience in the study design. An assessment based on item response theory (IRT), such as the Patient-Reported Outcomes Measurement Information System (PROMIS) depression item bank, offers new possibilities for assessing the effect of technology choice upon results. To create equivalent halves of the PROMIS depression item bank and to use these halves to compare survey responses and user satisfaction among administration modalities-paper, mobile phone, or tablet-with a community mental health care population. The 28 PROMIS depression items were divided into 2 halves based on content and simulations with an established PROMIS response data set. A total of 129 participants were recruited from an outpatient public sector mental health clinic based in Memphis. All participants took both nonoverlapping halves of the PROMIS IRT-based depression items (Part A and Part B): once using paper and pencil, and once using either a mobile phone or tablet. An 8-cell randomization was done on technology used, order of technologies used, and order of PROMIS Parts A and B. Both Parts A and B were administered as fixed-length assessments and both were scored using published PROMIS IRT parameters and algorithms. All 129 participants received either Part A or B via paper assessment. Participants were also administered the opposite assessment, 63 using a mobile phone and 66 using a tablet. There was no significant difference in item response scores for Part A versus B. All 3 of the technologies yielded essentially identical assessment results and equivalent satisfaction levels. Our findings show that the PROMIS depression assessment can be divided into 2 equivalent

  5. Development of a lack of appetite item bank for computer-adaptive testing (CAT)

    DEFF Research Database (Denmark)

    Thamsborg, Lise Laurberg Holst; Petersen, Morten Aa; Aaronson, Neil K

    2015-01-01

    to 12 lack of appetite items. CONCLUSIONS: Phases 1-3 resulted in 12 lack of appetite candidate items. Based on a field testing (phase 4), the psychometric characteristics of the items will be assessed and the final item bank will be generated. This CAT item bank is expected to provide precise...

  6. Open Single Item of Perceived Risk Factors (OSIPRF toward Cardiovascular Diseases Is an Appropriate Instrument for Evaluating Psychological Symptoms

    Directory of Open Access Journals (Sweden)

    Mozhgan Saeidi

    2016-12-01

    Full Text Available Psychological symptoms are considered as one of the aspects and consequences of cardiovascular diseases (CVDs, management of which can precipitate and facilitate the process of recovery. Evaluation of the psychological symptoms can increase awareness of treatment team regarding patients’ mental health, which can be beneficial for designing treatment programs (1. However, time-consuming process of interviews and assessment by questionnaires lead to fatigue and lack of patient cooperation, which may be problematic for healthcare evaluators. Therefore, the use of brief and suitable alternatives is always recommended.The use of practical and easy to implement instruments is constantly emphasized. A practical method for assessing patients' psychological status is examining causal beliefs and attitudes about the disease. The causal beliefs and perceived risk factors by patients, which are significantly related to the actual risk factors for CVDs (2, are not only related to psychological adjustment and mental health but also have an impact on patients’ compliance with treatment recommendations (3.It seems that several risk factors are at play regarding the perceived risk factors for CVDs such as gender (4, age (5, and most importantly, patients’ psychological status (3. Accordingly, evaluation of causal beliefs and perceived risk factors by patients could probably be a shortcut method for evaluation of patients’ psychological health. In recent years, Saeidi and Komasi (5 proposed a question and investigated the perceived risk factors with an open single item: “What do you think is the main cause of your illness?”. According to the authors, the perceived risk factors are recorded in five categories including biological (age, gender, and family history, environmental (dust, smoke, passive smoking, toxic substances, and effects of war, physiological (diabetes, hypertension, hyperlipidemia, and obesity, behavioral (lack of exercise, nutrition

  7. Non-ignorable missingness item response theory models for choice effects in examinee-selected items.

    Science.gov (United States)

    Liu, Chen-Wei; Wang, Wen-Chung

    2017-11-01

    Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.

  8. Inventory control in multi-item production systems

    NARCIS (Netherlands)

    Bruin, J.

    2010-01-01

    This thesis focusses on the analysis and construction of control policies in multiitem production systems. In such systems, multiple items can be made to stock, but they have to share the finite capacity of a single machine. This machine can only produce one unit at a time and if it is set-up for

  9. Calibration of context-specific survey items to assess youth physical activity behaviour.

    Science.gov (United States)

    Saint-Maurice, Pedro F; Welk, Gregory J; Bartee, R Todd; Heelan, Kate

    2017-05-01

    This study tests calibration models to re-scale context-specific physical activity (PA) items to accelerometer-derived PA. A total of 195 4th-12th grades children wore an Actigraph monitor and completed the Physical Activity Questionnaire (PAQ) one week later. The relative time spent in moderate-to-vigorous PA (MVPA % ) obtained from the Actigraph at recess, PE, lunch, after-school, evening and weekend was matched with a respective item score obtained from the PAQ's. Item scores from 145 participants were calibrated against objective MVPA % using multiple linear regression with age, and sex as additional predictors. Predicted minutes of MVPA for school, out-of-school and total week were tested in the remaining sample (n = 50) using equivalence testing. The results showed that PAQ β-weights ranged from 0.06 (lunch) to 4.94 (PE) MVPA % (P PAQ and accelerometer MVPA at school and out-of-school ranged from -15.6 to +3.8 min and the PAQ was within 10-15% of accelerometer measured activity. This study demonstrated that context-specific items can be calibrated to predict minutes of MVPA in groups of youth during in- and out-of-school periods.

  10. A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.

    Science.gov (United States)

    Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri

    2017-03-01

    Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.

  11. Individuals with knee impairments identify items in need of clarification in the Patient Reported Outcomes Measurement Information System (PROMIS®) pain interference and physical function item banks - a qualitative study.

    Science.gov (United States)

    Lynch, Andrew D; Dodds, Nathan E; Yu, Lan; Pilkonis, Paul A; Irrgang, James J

    2016-05-11

    The content and wording of the Patient Reported Outcome Measurement Information System (PROMIS) Physical Function and Pain Interference item banks have not been qualitatively assessed by individuals with knee joint impairments. The purpose of this investigation was to identify items in the PROMIS Physical Function and Pain Interference Item Banks that are irrelevant, unclear, or otherwise difficult to respond to for individuals with impairment of the knee and to suggest modifications based on cognitive interviews. Twenty-nine individuals with knee joint impairments qualitatively assessed items in the Pain Interference and Physical Function Item Banks in a mixed-methods cognitive interview. Field notes were analyzed to identify themes and frequency counts were calculated to identify items not relevant to individuals with knee joint impairments. Issues with clarity were identified in 23 items in the Physical Function Item Bank, resulting in the creation of 43 new or modified items, typically changing words within the item to be clearer. Interpretation issues included whether or not the knee joint played a significant role in overall health and age/gender differences in items. One quarter of the original items (31 of 124) in the Physical Function Item Bank were identified as irrelevant to the knee joint. All 41 items in the Pain Interference Item Bank were identified as clear, although individuals without significant pain substituted other symptoms which interfered with their life. The Physical Function Item Bank would benefit from additional items that are relevant to individuals with knee joint impairments and, by extension, to other lower extremity impairments. Several issues in clarity were identified that are likely to be present in other patient cohorts as well.

  12. A Third-Order Item Response Theory Model for Modeling the Effects of Domains and Subdomains in Large-Scale Educational Assessment Surveys

    Science.gov (United States)

    Rijmen, Frank; Jeon, Minjeong; von Davier, Matthias; Rabe-Hesketh, Sophia

    2014-01-01

    Second-order item response theory models have been used for assessments consisting of several domains, such as content areas. We extend the second-order model to a third-order model for assessments that include subdomains nested in domains. Using a graphical model framework, it is shown how the model does not suffer from the curse of…

  13. Automated Item Generation with Recurrent Neural Networks.

    Science.gov (United States)

    von Davier, Matthias

    2018-03-12

    Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.

  14. Brief Sensation Seeking Scale: Latent structure of 8-item and 4-item versions in Peruvian adolescents.

    Science.gov (United States)

    Merino-Soto, Cesar; Salas Blas, Edwin

    2018-01-01

    This research intended to validate two brief scales of sensations seeking with Peruvian adolescents: the eight item scale (BSSS8; Hoyle, Stephenson, Palmgreen, Lorch, y Donohew, 2002) and the four item scale (BSSS4; Stephenson, Hoyle, Slater, y Palmgreen, 2003). Questionnaires were administered to 618 voluntary participants, with an average age of 13.6 years, from different levels of high school, state and private school in a district in the south of Lima. It analyzed the internal structure of both short versions using three models: a) unidimensional (M1), b) oblique or related dimensions (M2), and c) the bifactor model (M3). Results show that both instruments have a single dimension which best represents the variability of the items; a fact that can be explained both by the complexity of the concept and by the small number of items representing each factor, which is more noticeable in the BSSS4. Reliability is within levels found by previous studies: alpha: .745 = BSSS8 and BSSS4 =. 643; omega coefficient: .747 in BSSS8 and .651 in BSSS4. These are considered suitable for the type of instruments studied. Based on the correlation between the two instruments, it was found that there are satisfactory levels of equivalence between the BSSS8 and BSSS4. However, it is recommended that the BSSS4 is mainly used for research and for the purpose of describing populations.

  15. Semiparametric Item Response Functions in the Context of Guessing

    Science.gov (United States)

    Falk, Carl F.; Cai, Li

    2016-01-01

    We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…

  16. A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

    Science.gov (United States)

    Fukuhara, Hirotaka; Kamata, Akihito

    2011-01-01

    A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…

  17. Quality of life in infants and children with atopic dermatitis: Addressing issues of differential item functioning across countries in multinational clinical trials

    Directory of Open Access Journals (Sweden)

    Tennant Alan

    2007-07-01

    Full Text Available Abstract Background A previous study had identified 45 items assessing the impact of atopic dermatitis (AD on the whole family. From these it was intended to develop two separate scales, one assessing impact on carers and the other determining the effect on the child. Methods The 45 items were included in three clinical trials designed to test the efficacy of a new topical treatment (pimecrolimus, Elidel cream 1% in the treatment of AD in infants and children and in validation studies in the UK, US, Germany, France and the Netherlands. Rasch analyses were undertaken to determine whether an internationally valid, unidimensional scale could be developed that would inform on the direct impact of AD on the child. Results Rasch analyses applied to the data from the trials indicated that the draft measure consisted of two scales, one assessing the QoL of the carer and the other (consisting of 12 items measuring the impact of AD on the child. Three of the 12 potential items failed to fit the measurement model in Europe and five in the US. In addition, four items exhibiting differential item functioning (DIF by country were identified. After removing the misfitting items and controlling for DIF it was possible to derive a scale; The Childhood Impact of Atopic Dermatitis (CIAD with good item fit for each trial analysis. Analysis of the validation data from each of the different countries confirmed that the CIAD had adequate internal consistency, reproducibility and construct validity. The CIAD demonstrated the benefits of treatment with Elidel over placebo in the European trial. A similar (non-significant trend was found for the US trials. Conclusion The study represents a novel method of dealing with the problem of DIF associated with different cultures. Such problems are likely to arise in any multinational study involving patient-reported outcome measures, as items in the scales are likely to be valued differently in different cultures. However, where

  18. Use of differential item functioning analysis to assess the equivalence of translations of a questionnaire

    NARCIS (Netherlands)

    Petersen, Morten Aa; Groenvold, Mogens; Bjorner, Jakob B.; Aaronson, Neil; Conroy, Thierry; Cull, Ann; Fayers, Peter; Hjermstad, Marianne; Sprangers, Mirjam; Sullivan, Marianne

    2003-01-01

    In cross-national comparisons based on questionnaires, accurate translations are necessary to obtain valid results. Differential item functioning (DIF) analysis can be used to test whether translations of items in multi-item scales are equivalent to the original. In data from 10,815 respondents

  19. A unified factor-analytic approach to the detection of item and test bias: Illustration with the effect of providing calculators to students with dyscalculia

    Directory of Open Access Journals (Sweden)

    Lee, M. K.

    2016-01-01

    Full Text Available An absence of measurement bias against distinct groups is a prerequisite for the use of a given psychological instrument in scientific research or high-stakes assessment. Factor analysis is the framework explicitly adopted for the identification of such bias when the instrument consists of a multi-test battery, whereas item response theory is employed when the focus narrows to a single test composed of discrete items. Item response theory can be treated as a mild nonlinearization of the standard factor model, and thus the essential unity of bias detection at the two levels merits greater recognition. Here we illustrate the benefits of a unified approach with a real-data example, which comes from a statewide test of mathematics achievement where examinees diagnosed with dyscalculia were accommodated with calculators. We found that items that can be solved by explicit arithmetical computation became easier for the accommodated examinees, but the quantitative magnitude of this differential item functioning (measurement bias was small.

  20. Differential item functioning of the UWES-17 in South Africa

    Directory of Open Access Journals (Sweden)

    Leanne Goliath-Yarde

    2011-11-01

    Research purpose: This study assesses the Differential Item Functioning (DIF of the Utrecht Work Engagement Scale (UWES-17 for different South African cultural groups in a South African company. Motivation for the study: Organisations are using the UWES-17 more and more in South Africa to assess work engagement. Therefore, research evidence from psychologists or assessment practitioners on its DIF across different cultural groups is necessary. Research design, approach and method: The researchers conducted a Secondary Data Analysis (SDA on the UWES-17 sample (n = 2429 that they obtained from a cross-sectional survey undertaken in a South African Information and Communication Technology (ICT sector company (n = 24 134. Quantitative item data on the UWES-17 scale enabled the authors to address the research question. Main findings: The researchers found uniform and/or non-uniform DIF on five of the vigour items, four of the dedication items and two of the absorption items. This also showed possible Differential Test Functioning (DTF on the vigour and dedication dimensions. Practical/managerial implications: Based on the DIF, the researchers suggested that organisations should not use the UWES-17 comparatively for different cultural groups or employment decisions in South Africa. Contribution/value add: The study provides evidence on DIF and possible DTF for the UWES-17. However, it also raises questions about possible interaction effects that need further investigation.

  1. Cleaning and disinfection of patient care items, in relation to small animals.

    Science.gov (United States)

    Weese, J Scott

    2015-03-01

    Patient care involves several medical and surgical items, including those that come into contact with sterile or other high-risk body sites and items that have been used on other patients. These situations create a risk for infection if items are contaminated, and the implications can range from single infections to large outbreaks. To minimize the risk, proper equipment cleaning, disinfection/sterilization, storage, and monitoring practices are required. Risks posed by different items; the required level of cleaning, disinfection, or sterilization; the methods that are available and appropriate; and how to ensure efficacy, must be considered when designing and implementing an infection control program. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. War Reserve Analysis and Secondary Item Procureability Assessment of the AMCOM Supported Weapon Systems

    National Research Council Canada - National Science Library

    Maddux, Gary

    2000-01-01

    .... IOD evaluates the impacts of nonavailability of secondary items on the life cycle supportability of AMCOM weapon systems and evaluates the producibility of secondary items for war reserve requirements...

  3. Criteria for eliminating items of a Test of Figural Analogies

    Directory of Open Access Journals (Sweden)

    Diego Blum

    2013-12-01

    Full Text Available This paper describes the steps taken to eliminate two of the items in a Test of Figural Analogies (TFA. The main guidelines of psychometric analysis concerning Classical Test Theory (CTT and Item Response Theory (IRT are explained. The item elimination process was based on both the study of the CTT difficulty and discrimination index, and the unidimensionality analysis. The a, b, and c parameters of the Three Parameter Logistic Model of IRT were also considered for this purpose, as well as the assessment of each item fitting this model. The unfavourable characteristics of a group of TFA items are detailed, and decisions leading to their possible elimination are discussed.

  4. The influence of item order on intentional response distortion in the assessment of high potentials: assessing pilot applicants.

    Science.gov (United States)

    Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander

    2014-04-01

    An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant. © 2013 International Union of Psychological Science.

  5. Effects of memantine on cognition in patients with moderate to severe Alzheimer's disease: post-hoc analyses of ADAS-cog and SIB total and single-item scores from six randomized, double-blind, placebo-controlled studies.

    Science.gov (United States)

    Mecocci, Patrizia; Bladström, Anna; Stender, Karina

    2009-05-01

    The post-hoc analyses reported here evaluate the specific effects of memantine treatment on ADAS-cog single-items or SIB subscales for patients with moderate to severe AD. Data from six multicentre, randomised, placebo-controlled, parallel-group, double-blind, 6-month studies were used as the basis for these post-hoc analyses. All patients with a Mini-Mental State Examination (MMSE) score of less than 20 were included. Analyses of patients with moderate AD (MMSE: 10-19), evaluated with the Alzheimer's disease Assessment Scale (ADAS-cog) and analyses of patients with moderate to severe AD (MMSE: 3-14), evaluated using the Severe Impairment Battery (SIB), were performed separately. The mean change from baseline showed a significant benefit of memantine treatment on both the ADAS-cog (p ADAS-cog single-item analyses showed significant benefits of memantine treatment, compared to placebo, for mean change from baseline for commands (p < 0.001), ideational praxis (p < 0.05), orientation (p < 0.01), comprehension (p < 0.05), and remembering test instructions (p < 0.05) for observed cases (OC). The SIB subscale analyses showed significant benefits of memantine, compared to placebo, for mean change from baseline for language (p < 0.05), memory (p < 0.05), orientation (p < 0.01), praxis (p < 0.001), and visuospatial ability (p < 0.01) for OC. Memantine shows significant benefits on overall cognitive abilities as well as on specific key cognitive domains for patients with moderate to severe AD. (c) 2009 John Wiley & Sons, Ltd.

  6. A Generalized Logistic Regression Procedure to Detect Differential Item Functioning among Multiple Groups

    Science.gov (United States)

    Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul

    2011-01-01

    We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…

  7. 47 CFR 76.985 - Subscriber bill itemization.

    Science.gov (United States)

    2010-10-01

    ...) The amount of the total bill assessed as a franchise fee and the identity of the franchising authority... fees and costs itemized pursuant to this section. (c) Local franchising authorities may adopt...

  8. Item-Level Psychometrics of the Glasgow Outcome Scale: Extended Structured Interviews.

    Science.gov (United States)

    Hong, Ickpyo; Li, Chih-Ying; Velozo, Craig A

    2016-04-01

    The Glasgow Outcome Scale-Extended (GOSE) structured interview captures critical components of activities and participation, including home, shopping, work, leisure, and family/friend relationships. Eighty-nine community dwelling adults with mild-moderate traumatic brain injury (TBI) were recruited (average = 2.7 year post injury). Nine items of the 19 items were used for the psychometrics analysis purpose. Factor analysis and item-level psychometrics were investigated using the Rasch partial-credit model. Although the principal components analysis of residuals suggests that a single measurement factor dominates the measure, the instrument did not meet the factor analysis criteria. Five items met the rating scale criteria. Eight items fit the Rasch model. The instrument demonstrated low person reliability (0.63), low person strata (2.07), and a slight ceiling effect. The GOSE demonstrated limitations in precisely measuring activities/participation for individuals after TBI. Future studies should examine the impact of the low precision of the GOSE on effect size. © The Author(s) 2016.

  9. Validation of a method for assessing resident physicians' quality improvement proposals.

    Science.gov (United States)

    Leenstra, James L; Beckman, Thomas J; Reed, Darcy A; Mundell, William C; Thomas, Kris G; Krajicek, Bryan J; Cha, Stephen S; Kolars, Joseph C; McDonald, Furman S

    2007-09-01

    Residency programs involve trainees in quality improvement (QI) projects to evaluate competency in systems-based practice and practice-based learning and improvement. Valid approaches to assess QI proposals are lacking. We developed an instrument for assessing resident QI proposals--the Quality Improvement Proposal Assessment Tool (QIPAT-7)-and determined its validity and reliability. QIPAT-7 content was initially obtained from a national panel of QI experts. Through an iterative process, the instrument was refined, pilot-tested, and revised. Seven raters used the instrument to assess 45 resident QI proposals. Principal factor analysis was used to explore the dimensionality of instrument scores. Cronbach's alpha and intraclass correlations were calculated to determine internal consistency and interrater reliability, respectively. QIPAT-7 items comprised a single factor (eigenvalue = 3.4) suggesting a single assessment dimension. Interrater reliability for each item (range 0.79 to 0.93) and internal consistency reliability among the items (Cronbach's alpha = 0.87) were high. This method for assessing resident physician QI proposals is supported by content and internal structure validity evidence. QIPAT-7 is a useful tool for assessing resident QI proposals. Future research should determine the reliability of QIPAT-7 scores in other residency and fellowship training programs. Correlations should also be made between assessment scores and criteria for QI proposal success such as implementation of QI proposals, resident scholarly productivity, and improved patient outcomes.

  10. Negative affectivity and social inhibition in cardiovascular disease: evaluating type-D personality and its assessment using item response theory.

    Science.gov (United States)

    Emons, Wilco H M; Meijer, Rob R; Denollet, Johan

    2007-07-01

    Individuals with increased levels of both negative affectivity (NA) and social inhibition (SI)-referred to as type-D personality-are at increased risk of adverse cardiac events. We used item response theory (IRT) to evaluate NA, SI, and type-D personality as measured by the DS14. The objectives of this study were (a) to evaluate the relative contribution of individual items to the measurement precision at the cutoff to distinguish type-D from non-type-D personality and (b) to investigate the comparability of NA, SI, and type-D constructs across the general population and clinical populations. Data from representative samples including 1316 respondents from the general population, 427 respondents diagnosed with coronary heart disease, and 732 persons suffering from hypertension were analyzed using the graded response IRT model. In Study 1, the information functions obtained in the IRT analysis showed that (a) all items had highest measurement precision around the cutoff and (b) items are most informative at the higher end of the scale. In Study 2, the IRT analysis showed that measurements were fairly comparable across the general population and clinical populations. The DS14 adequately measures NA and SI, with highest reliability in the trait range around the cutoff. The DS14 is a valid instrument to assess and compare type-D personality across clinical groups.

  11. Development of the Oxford Participation and Activities Questionnaire: constructing an item pool

    Directory of Open Access Journals (Sweden)

    Kelly L

    2015-05-01

    Full Text Available Laura Kelly, Crispin Jenkinson, Sarah Dummett, Jill Dawson, Ray Fitzpatrick, David Morley Health Services Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK Purpose: The Oxford Participation and Activities Questionnaire is a patient-reported outcome measure in development that is grounded on the World Health Organization International Classification of Functioning, Disability, and Health (ICF. The study reported here aimed to inform and generate an item pool for the new measure, which is specifically designed for the assessment of participation and activity in patients experiencing a range of health conditions. Methods: Items were informed through in-depth interviews conducted with 37 participants spanning a range of conditions. Interviews aimed to identify how their condition impacted their ability to participate in meaningful activities. Conditions included arthritis, cancer, chronic back pain, diabetes, motor neuron disease, multiple sclerosis, Parkinson's disease, and spinal cord injury. Transcripts were analyzed using the framework method. Statements relating to ICF themes were recast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n=13 were used to assess items for face and content validity. Results: ICF themes relevant to activities and participation in everyday life were explored, and a total of 222 items formed the initial item pool. This item pool was refined by the research team and 28 generic items were mapped onto all nine chapters of the ICF construct, detailing activity and participation. Cognitive interviewing confirmed the questionnaire instructions, items, and response options were acceptable to participants. Conclusion: Using a clear conceptual basis to inform item generation, 28 items have been identified as suitable to undergo further psychometric testing. A large-scale postal survey will follow in order to refine the instrument further and

  12. Learning environment assessments of a single curriculum being taught at two medical schools 10,000 miles apart.

    Science.gov (United States)

    Tackett, Sean; Shochet, Robert; Shilkofski, Nicole A; Colbert-Getz, Jorie; Rampal, Krishna; Abu Bakar, Hamidah; Wright, Scott

    2015-06-17

    Perdana University Graduate School of Medicine (PUGSOM), the first graduate-entry medical school in Malaysia, was established in 2011 in collaboration with Johns Hopkins University School of Medicine (JHUSOM), an American medical school. This study compared learning environments (LE) at these two schools, which shared the same overarching curriculum, along with a comparator Malaysian medical school, Cyberjaya University College of Medical Sciences (CUCMS). As a secondary aim, we compared 2 LE assessment tools - the widely-used Dundee Ready Educational Environment Measure (DREEM) and the newer Johns Hopkins Learning Environment Scale (JHLES). Students responded anonymously at the end of their first year of medical school to surveys which included DREEM, JHLES, single-item global LE assessment variables, and demographics questions. Respondents included 24/24 (100 %) students at PUGSOM, 100/120 (83 %) at JHUSOM, and 79/83 (95 %) at CUCMS. PUGSOM had the highest overall LE ratings (p safety" domains. JHLES detected significant differences across schools in 5/7 domains and had stronger correlations than DREEM to each global LE assessment variable. The inaugural class of medical students at PUGSOM rated their LE exceptionally highly, providing evidence that transporting a medical school curriculum may be successful. The JHLES showed promise as a LE assessment tool for use in international settings.

  13. Assessing Psycho-social Barriers to Rehabilitation in Injured Workers with Chronic Musculoskeletal Pain: Development and Item Properties of the Yellow Flag Questionnaire (YFQ).

    Science.gov (United States)

    Salathé, Cornelia Rolli; Trippolini, Maurizio Alen; Terribilini, Livio Claudio; Oliveri, Michael; Elfering, Achim

    2018-06-01

    Purpose To develop a multidimensional scale to asses psychosocial beliefs-the Yellow Flag Questionnaire (YFQ)-aimed at guiding interventions for workers with chronic musculoskeletal (MSK) pain. Methods Phase 1 consisted of item selection based on literature search, item development and expert consensus rounds. In phase 2, items were reduced with calculating a quality-score per item, using structure equation modeling and confirmatory factor analysis on data from 666 workers. In phase 3, Cronbach's α, and Pearson correlations coefficients were computed to compare YFQ with disability, anxiety, depression and self-efficacy and the YFQ score based on data from 253 injured workers. Regressions of YFQ total score on disability, anxiety, depression and self-efficacy were calculated. Results After phase 1, the YFQ included 116 items and 15 domains. Further reductions of items in phase 2 by applying the item quality criteria reduced the total to 48 items. Phase factor analysis with structural equation modeling confirmed 32 items in seven domains: activity, work, emotions, harm & blame, diagnosis beliefs, co-morbidity and control. Cronbach α was 0.91 for the total score, between 0.49 and 0.81 for the 7 distinct scores of each domain, respectively. Correlations between YFQ total score ranged with disability, anxiety, depression and self-efficacy was .58, .66, .73, -.51, respectively. After controlling for age and gender the YFQ total score explained between R2 27% and R2 53% variance of disability, anxiety, depression and self-efficacy. Conclusions The YFQ, a multidimensional screening scale is recommended for use to assess psychosocial beliefs of workers with chronic MSK pain. Further evaluation of the measurement properties such as the test-retest reliability, responsiveness and prognostic validity is warranted.

  14. The measurement of tritium in Canadian food items

    International Nuclear Information System (INIS)

    Brown, R.M.

    1995-03-01

    Food items locally grown near Perth, Ontario and grocery store produce and locally grown items from the Pickering-Ajax area in the vicinity of the Pickering Nuclear Generating Station (PNGS) have been analyzed for free water tritium (HTO) and organically bound tritium (OBT). The technique of measuring 3 He ingrowth in samples by mass spectrometry has been used because of its sensitivity and freedom from opportunity for contamination during processing and measurement. Concentrations observed at each site were of the order expected on the basis of known levels of tritium in the local atmosphere and precipitation. There was considerable variation between different materials and limited correlation between materials of a single type. (author). 10 refs., 8 tabs., 4 figs

  15. Using Explanatory Item Response Models to Evaluate Complex Scientific Tasks Designed for the Next Generation Science Standards

    Science.gov (United States)

    Chiu, Tina

    This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.

  16. Developing economic order quantity model for non-instantaneous deteriorating items in vendor-managed inventory (VMI) system

    Science.gov (United States)

    Tat, Roya; Allah Taleizadeh, Ata; Esmaeili, Maryam

    2015-05-01

    This paper develops an economic order quantity model for non-instantaneous deteriorating items with and without shortages to investigate the performance of the vendor-managed inventory (VMI) system. This model is developed for a two-level supply chain consisting of a single supplier and single retailer with a single non-instantaneous deteriorating item. A numerical example and sensitivity analysis are provided to illustrate how increasing or reducing the related parameters change the optimal values of the decision variables of the two proposed models. The results show that VMI works better and charges lower cost in all conditions.

  17. Poisson and negative binomial item count techniques for surveys with sensitive question.

    Science.gov (United States)

    Tian, Guo-Liang; Tang, Man-Lai; Wu, Qin; Liu, Yin

    2017-04-01

    Although the item count technique is useful in surveys with sensitive questions, privacy of those respondents who possess the sensitive characteristic of interest may not be well protected due to a defect in its original design. In this article, we propose two new survey designs (namely the Poisson item count technique and negative binomial item count technique) which replace several independent Bernoulli random variables required by the original item count technique with a single Poisson or negative binomial random variable, respectively. The proposed models not only provide closed form variance estimate and confidence interval within [0, 1] for the sensitive proportion, but also simplify the survey design of the original item count technique. Most importantly, the new designs do not leak respondents' privacy. Empirical results show that the proposed techniques perform satisfactorily in the sense that it yields accurate parameter estimate and confidence interval.

  18. TWO-PARAMETER IRT MODEL APPLICATION TO ASSESS PROBABILISTIC CHARACTERISTICS OF PROHIBITED ITEMS DETECTION BY AVIATION SECURITY SCREENERS

    Directory of Open Access Journals (Sweden)

    Alexander K. Volkov

    2017-01-01

    Full Text Available The modern approaches to the aviation security screeners’ efficiency have been analyzedand, certain drawbacks have been considered. The main drawback is the complexity of ICAO recommendations implementation concerning taking into account of shadow x-ray image complexity factors during preparation and evaluation of prohibited items detection efficiency by aviation security screeners. Х-ray image based factors are the specific properties of the x-ray image that in- fluence the ability to detect prohibited items by aviation security screeners. The most important complexity factors are: geometric characteristics of a prohibited item; view difficulty of prohibited items; superposition of prohibited items byother objects in the bag; bag content complexity; the color similarity of prohibited and usual items in the luggage.The one-dimensional two-parameter IRT model and the related criterion of aviation security screeners’ qualification have been suggested. Within the suggested model the probabilistic detection characteristics of aviation security screeners are considered as functions of such parameters as the difference between level of qualification and level of x-ray images com- plexity, and also between the aviation security screeners’ responsibility and structure of their professional knowledge. On the basis of the given model it is possible to consider two characteristic functions: first of all, characteristic function of qualifica- tion level which describes multi-complexity level of x-ray image interpretation competency of the aviation security screener; secondly, characteristic function of the x-ray image complexity which describes the range of x-ray image interpretation com- petency of the aviation security screeners having various training levels to interpret the x-ray image of a certain level of com- plexity. The suggested complex criterion to assess the level of the aviation security screener qualification allows to evaluate his or

  19. Symptoms of anxiety in depression: assessment of item performance of the Hamilton Anxiety Rating Scale in patients with depression.

    Science.gov (United States)

    Vaccarino, Anthony L; Evans, Kenneth R; Sills, Terrence L; Kalali, Amir H

    2008-01-01

    Although diagnostically dissociable, anxiety is strongly co-morbid with depression. To examine further the clinical symptoms of anxiety in major depressive disorder (MDD), a non-parametric item response analysis on "blinded" data from four pharmaceutical company clinical trials was performed on the Hamilton Anxiety Rating Scale (HAMA) across levels of depressive severity. The severity of depressive symptoms was assessed using the 17-item Hamilton Depression Rating Scale (HAMD). HAMA and HAMD measures were supplied for each patient on each of two post-screen visits (n=1,668 observations). Option characteristic curves were generated for all 14 HAMA items to determine the probability of scoring a particular option on the HAMA in relation to the total HAMD score. Additional analyses were conducted using Pearson's product-moment correlations. Results showed that anxiety-related symptomatology generally increased as a function of overall depressive severity, though there were clear differences between individual anxiety symptoms in their relationship with depressive severity. In particular, anxious mood, tension, insomnia, difficulties in concentration and memory, and depressed mood were found to discriminate over the full range of HAMD scores, increasing continuously with increases in depressive severity. By contrast, many somatic-related symptoms, including muscular, sensory, cardiovascular, respiratory, gastro-intestinal, and genito-urinary were manifested primarily at higher levels of depression and did not discriminate well at lower HAMD scores. These results demonstrate anxiety as a core feature of depression, and the relationship between anxiety-related symptoms and depression should be considered in the assessment of depression and evaluation of treatment strategies and outcome.

  20. Australian Biology Test Item Bank, Years 11 and 12. Volume II: Year 12.

    Science.gov (United States)

    Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

    This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…

  1. Australian Biology Test Item Bank, Years 11 and 12. Volume I: Year 11.

    Science.gov (United States)

    Brown, David W., Ed.; Sewell, Jeffrey J., Ed.

    This document consists of test items which are applicable to biology courses throughout Australia (irrespective of course materials used); assess key concepts within course statement (for both core and optional studies); assess a wide range of cognitive processes; and are relevant to current biological concepts. These items are arranged under…

  2. Hippocampal damage equally impairs memory for single items and memory for conjunctions.

    Science.gov (United States)

    Stark, Craig E L; Squire, Larry R

    2003-01-01

    single-item and associative memory.

  3. Retrieval of very large numbers of items in the Web of Science: an exercise to develop accurate search strategies

    NARCIS (Netherlands)

    Arencibia-Jorge, R.; Leydesdorff, L.; Chinchilla-Rodríguez, Z.; Rousseau, R.; Paris, S.W.

    2009-01-01

    The Web of Science interface counts at most 100,000 retrieved items from a single query. If the query results in a dataset containing more than 100,000 items the number of retrieved items is indicated as >100,000. The problem studied here is how to find the exact number of items in a query that

  4. Item response theory analysis of the mechanics baseline test

    Science.gov (United States)

    Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

    2012-02-01

    Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.

  5. Reduced-Item Food Audits Based on the Nutrition Environment Measures Surveys.

    Science.gov (United States)

    Partington, Susan N; Menzies, Tim J; Colburn, Trina A; Saelens, Brian E; Glanz, Karen

    2015-10-01

    The community food environment may contribute to obesity by influencing food choice. Store and restaurant audits are increasingly common methods for assessing food environments, but are time consuming and costly. A valid, reliable brief measurement tool is needed. The purpose of this study was to develop and validate reduced-item food environment audit tools for stores and restaurants. Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed in 820 stores and 1,795 restaurants in West Virginia, San Diego, and Seattle. Data mining techniques (correlation-based feature selection and linear regression) were used to identify survey items highly correlated to total survey scores and produce reduced-item audit tools that were subsequently validated against full NEMS surveys. Regression coefficients were used as weights that were applied to reduced-item tool items to generate comparable scores to full NEMS surveys. Data were collected and analyzed in 2008-2013. The reduced-item tools included eight items for grocery, ten for convenience, seven for variety, and five for other stores; and 16 items for sit-down, 14 for fast casual, 19 for fast food, and 13 for specialty restaurants-10% of the full NEMS-S and 25% of the full NEMS-R. There were no significant differences in median scores for varying types of retail food outlets when compared to the full survey scores. Median in-store audit time was reduced 25%-50%. Reduced-item audit tools can reduce the burden and complexity of large-scale or repeated assessments of the retail food environment without compromising measurement quality. Copyright © 2015 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  6. A Single Conjunction Risk Assessment Metric: the F-Value

    Science.gov (United States)

    Frigm, Ryan Clayton; Newman, Lauri K.

    2009-01-01

    The Conjunction Assessment Team at NASA Goddard Space Flight Center provides conjunction risk assessment for many NASA robotic missions. These risk assessments are based on several figures of merit, such as miss distance, probability of collision, and orbit determination solution quality. However, these individual metrics do not singly capture the overall risk associated with a conjunction, making it difficult for someone without this complete understanding to take action, such as an avoidance maneuver. The goal of this analysis is to introduce a single risk index metric that can easily convey the level of risk without all of the technical details. The proposed index is called the conjunction "F-value." This paper presents the concept of the F-value and the tuning of the metric for use in routine Conjunction Assessment operations.

  7. Methodological quality of diagnostic accuracy studies on non-invasive coronary CT angiography: influence of QUADAS (Quality Assessment of Diagnostic Accuracy Studies included in systematic reviews) items on sensitivity and specificity

    International Nuclear Information System (INIS)

    Schueler, Sabine; Walther, Stefan; Schuetz, Georg M.; Schlattmann, Peter; Dewey, Marc

    2013-01-01

    To evaluate the methodological quality of diagnostic accuracy studies on coronary computed tomography (CT) angiography using the QUADAS (Quality Assessment of Diagnostic Accuracy Studies included in systematic reviews) tool. Each QUADAS item was individually defined to adapt it to the special requirements of studies on coronary CT angiography. Two independent investigators analysed 118 studies using 12 QUADAS items. Meta-regression and pooled analyses were performed to identify possible effects of methodological quality items on estimates of diagnostic accuracy. The overall methodological quality of coronary CT studies was merely moderate. They fulfilled a median of 7.5 out of 12 items. Only 9 of the 118 studies fulfilled more than 75 % of possible QUADAS items. One QUADAS item (''Uninterpretable Results'') showed a significant influence (P = 0.02) on estimates of diagnostic accuracy with ''no fulfilment'' increasing specificity from 86 to 90 %. Furthermore, pooled analysis revealed that each QUADAS item that is not fulfilled has the potential to change estimates of diagnostic accuracy. The methodological quality of studies investigating the diagnostic accuracy of non-invasive coronary CT is only moderate and was found to affect the sensitivity and specificity. An improvement is highly desirable because good methodology is crucial for adequately assessing imaging technologies. (orig.)

  8. Methodological quality of diagnostic accuracy studies on non-invasive coronary CT angiography: influence of QUADAS (Quality Assessment of Diagnostic Accuracy Studies included in systematic reviews) items on sensitivity and specificity

    Energy Technology Data Exchange (ETDEWEB)

    Schueler, Sabine; Walther, Stefan; Schuetz, Georg M. [Humboldt-Universitaet zu Berlin, Freie Universitaet Berlin, Charite Medical School, Department of Radiology, Berlin (Germany); Schlattmann, Peter [University Hospital of Friedrich Schiller University Jena, Department of Medical Statistics, Informatics, and Documentation, Jena (Germany); Dewey, Marc [Humboldt-Universitaet zu Berlin, Freie Universitaet Berlin, Charite Medical School, Department of Radiology, Berlin (Germany); Charite, Institut fuer Radiologie, Berlin (Germany)

    2013-06-15

    To evaluate the methodological quality of diagnostic accuracy studies on coronary computed tomography (CT) angiography using the QUADAS (Quality Assessment of Diagnostic Accuracy Studies included in systematic reviews) tool. Each QUADAS item was individually defined to adapt it to the special requirements of studies on coronary CT angiography. Two independent investigators analysed 118 studies using 12 QUADAS items. Meta-regression and pooled analyses were performed to identify possible effects of methodological quality items on estimates of diagnostic accuracy. The overall methodological quality of coronary CT studies was merely moderate. They fulfilled a median of 7.5 out of 12 items. Only 9 of the 118 studies fulfilled more than 75 % of possible QUADAS items. One QUADAS item (''Uninterpretable Results'') showed a significant influence (P = 0.02) on estimates of diagnostic accuracy with ''no fulfilment'' increasing specificity from 86 to 90 %. Furthermore, pooled analysis revealed that each QUADAS item that is not fulfilled has the potential to change estimates of diagnostic accuracy. The methodological quality of studies investigating the diagnostic accuracy of non-invasive coronary CT is only moderate and was found to affect the sensitivity and specificity. An improvement is highly desirable because good methodology is crucial for adequately assessing imaging technologies. (orig.)

  9. The Body Appreciation Scale-2: item refinement and psychometric evaluation.

    Science.gov (United States)

    Tylka, Tracy L; Wood-Barcalow, Nichole L

    2015-01-01

    Considered a positive body image measure, the 13-item Body Appreciation Scale (BAS; Avalos, Tylka, & Wood-Barcalow, 2005) assesses individuals' acceptance of, favorable opinions toward, and respect for their bodies. While the BAS has accrued psychometric support, we improved it by rewording certain BAS items (to eliminate sex-specific versions and body dissatisfaction-based language) and developing additional items based on positive body image research. In three studies, we examined the reworded, newly developed, and retained items to determine their psychometric properties among college and online community (Amazon Mechanical Turk) samples of 820 women and 767 men. After exploratory factor analysis, we retained 10 items (five original BAS items). Confirmatory factor analysis upheld the BAS-2's unidimensionality and invariance across sex and sample type. Its internal consistency, test-retest reliability, and construct (convergent, incremental, and discriminant) validity were supported. The BAS-2 is a psychometrically sound positive body image measure applicable for research and clinical settings. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.

    Science.gov (United States)

    Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A

    2018-03-01

    This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.

  11. Psychometric evaluation of the 10-item Short Opiate Withdrawal Scale-Gossop (SOWS-Gossop) in patients undergoing opioid detoxification.

    Science.gov (United States)

    Vernon, Margaret K; Reinders, Stefan; Mannix, Sally; Gullo, Kristen; Gorodetzky, Charles W; Clinch, Thomas

    2016-09-01

    The Short Opiate Withdrawal Scale (SOWS)-Gossop is a 10-item questionnaire developed to evaluate opioid withdrawal symptom severity. The scale was derived from the original 32-item Opiate Withdrawal Scale in order to reduce redundancy while providing an equally sensitive measure of opioid withdrawal symptom severity appropriate for research and clinical practice. The objective of this study was to examine the psychometric properties and provide score interpretation guidelines for the SOWS-Gossop 10-item version. Blinded, pooled data from two trials assessing the efficacy of lofexidine hydrochloride in reducing withdrawal symptoms in patients undergoing opioid detoxification were used to evaluate the quantitative psychometric properties and score interpretation of the SOWS-Gossop. Five hundred fifty-five (N=555) observations were available at baseline with numbers decreasing to n=213 at day 7. Mean (standard deviation) SOWS-Gossop scores were 10.4 (6.86) at baseline, 8.7 (6.49) on day 1, 10.5 (7.21) on day 2, and 3.1 (3.95) on day 7. Confirmatory factor analysis indicated that the SOWS-Gossop items loaded on a single factor consistent with a single total score. Intra-class correlations (95% confidence interval) were 0.78 (0.70-0.85) between baseline and day 1, 0.84 (0.79-0.89) between days 4 and 5, and 0.88 (0.83-0.91) between days 6 and 7, demonstrating good test-retest reliability. Mean SOWS-Gossop scores varied significantly (popioid withdrawal and has excellent psychometric properties. The SOWS-Gossop is an appropriate, precise, and sensitive measure to evaluate the symptoms of acute opioid withdrawal in research or clinical settings. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Creating a brief rating scale for the assessment of learning disabilities using reliability and true score estimates of the scale's items based on the Rasch model.

    Science.gov (United States)

    Sideridis, Georgios; Padeliadu, Susana

    2013-01-01

    The purpose of the present studies was to provide the means to create brief versions of instruments that can aid the diagnosis and classification of students with learning disabilities and comorbid disorders (e.g., attention-deficit/hyperactivity disorder). A sample of 1,108 students with and without a diagnosis of learning disabilities took part in study 1. Using information from modern theory methods (i.e., the Rasch model), a scale was created that included fewer than one third of the original battery items designed to assess reading skills. This best item synthesis was then evaluated for its predictive and criterion validity with a valid external reading battery (study 2). Using a sample of 232 students with and without learning disabilities, results indicated that the brief version of the scale was equally effective as the original scale in predicting reading achievement. Analysis of the content of the brief scale indicated that the best item synthesis involved items from cognition, motivation, strategy use, and advanced reading skills. It is suggested that multiple psychometric criteria be employed in evaluating the psychometric adequacy of scales used for the assessment and identification of learning disabilities and comorbid disorders.

  13. Chip based single cell analysis for nanotoxicity assessment.

    Science.gov (United States)

    Shah, Pratikkumar; Kaushik, Ajeet; Zhu, Xuena; Zhang, Chengxiao; Li, Chen-Zhong

    2014-05-07

    Nanomaterials, because of their tunable properties and performances, have been utilized extensively in everyday life related consumable products and technology. On exposure, beyond the physiological range, nanomaterials cause health risks via affecting the function of organisms, genomic systems, and even the central nervous system. Thus, new analytical approaches for nanotoxicity assessment to verify the feasibility of nanomaterials for future use are in demand. The conventional analytical techniques, such as spectrophotometric assay-based techniques, usually require a lengthy and time-consuming process and often produce false positives, and often cannot be implemented at a single cell level measurement for studying cell behavior without interference from its surrounding environment. Hence, there is a demand for a precise, accurate, sensitive assessment for toxicity using single cells. Recently, due to the advantages of automation of fluids and minimization of human errors, the integration of a cell-on-a-chip (CoC) with a microfluidic system is in practice for nanotoxicity assessments. This review explains nanotoxicity and its assessment approaches with advantages/limitations and new approaches to overcome the confines of traditional techniques. Recent advances in nanotoxicity assessment using a CoC integrated with a microfluidic system are also discussed in this review, which may be of use for nanotoxicity assessment and diagnostics.

  14. Selection of material balance areas and item control areas

    International Nuclear Information System (INIS)

    1975-04-01

    Section 70.58, ''Fundamental Nuclear Material Controls,'' of 10 CFR Part 70, ''Special Nuclear Material,'' requires certain licensees authorized to possess more than one effective kilogram of special nuclear material to establish Material Balance Areas (MBAs) or Item Control Areas (ICAs) for the physical and administrative control of nuclear materials. This section requires that: (1) each MBA be an identifiable physical area such that the quantity of nuclear material being moved into or out of the MBA is represented by a measured value; (2) the number of MBAs be sufficient to localize nuclear material losses or thefts and identify the mechanisms; (3) the custody of all nuclear material within an MBA or ICA be the responsibility of a single designated individual; and (4) ICAs be established according to the same criteria as MBAs except that control into and out of such areas would be by item identity and count for previously determined special nuclear material quantities, the validity of which must be ensured by tamper-safing unless the items are sealed sources. This guide describes bases acceptable to the NRC staff for the selection of material balance areas and item control areas. (U.S.)

  15. An Application of Cognitive Diagnostic Assessment on TIMMS-2007 8th Grade Mathematics Items

    Science.gov (United States)

    Toker, Turker; Green, Kathy

    2012-01-01

    The least squares distance method (LSDM) was used in a cognitive diagnostic analysis of TIMSS (Trends in International Mathematics and Science Study) items administered to 4,498 8th-grade students from seven geographical regions of Turkey, extending analysis of attributes from content to process and skill attributes. Logit item positions were…

  16. Modelling non-ignorable missing data mechanisms with item response theory models

    NARCIS (Netherlands)

    Holman, Rebecca; Glas, Cornelis A.W.

    2005-01-01

    A model-based procedure for assessing the extent to which missing data can be ignored and handling non-ignorable missing data is presented. The procedure is based on item response theory modelling. As an example, the approach is worked out in detail in conjunction with item response data modelled

  17. Modelling non-ignorable missing-data mechanisms with item response theory models

    NARCIS (Netherlands)

    Holman, Rebecca; Glas, Cees A. W.

    2005-01-01

    A model-based procedure for assessing the extent to which missing data can be ignored and handling non-ignorable missing data is presented. The procedure is based on item response theory modelling. As an example, the approach is worked out in detail in conjunction with item response data modelled

  18. Translation and cross-cultural adaptation of the Detailed Assessment of Speed of Handwriting 17+ to Brazilian Portuguese: conceptual, item and semantic equivalence.

    Science.gov (United States)

    Cardoso, Monique Herrera; Capellini, Simone Aparecida

    2018-02-19

    Perform a cross-cultural adaptation of the Detailed Assessment of Speed of Handwriting 17+ (DASH 17+) for Brazilians. Evaluation of (1) conceptual, item and (2) semantic equivalence, with assistance of four translators and application of a pilot study to 36 students. (1) The concepts and items are equivalent in the British and Brazilian cultures. (2) Adaptations were made concerning the English language pangram used in copying tasks and selection of the lower-case, cursive handwriting in the alphabet-writing task. Application of the pilot study verified acceptability and understanding of the proposed tasks by the students. The Brazilian Portuguese version of the DASH 17+ was presented after finalization of the conceptual, item and semantic equivalence of the instrument. Further studies on psychometric properties should be conducted with the purpose of measuring the speed of handwriting in youngsters and adults with greater reliability and validity to the procedure.

  19. Students' proficiency scores within multitrait item response theory

    Science.gov (United States)

    Scott, Terry F.; Schumayer, Daniel

    2015-12-01

    In this paper we present a series of item response models of data collected using the Force Concept Inventory. The Force Concept Inventory (FCI) was designed to poll the Newtonian conception of force viewed as a multidimensional concept, that is, as a complex of distinguishable conceptual dimensions. Several previous studies have developed single-trait item response models of FCI data; however, we feel that multidimensional models are also appropriate given the explicitly multidimensional design of the inventory. The models employed in the research reported here vary in both the number of fitting parameters and the number of underlying latent traits assumed. We calculate several model information statistics to ensure adequate model fit and to determine which of the models provides the optimal balance of information and parsimony. Our analysis indicates that all item response models tested, from the single-trait Rasch model through to a model with ten latent traits, satisfy the standard requirements of fit. However, analysis of model information criteria indicates that the five-trait model is optimal. We note that an earlier factor analysis of the same FCI data also led to a five-factor model. Furthermore the factors in our previous study and the traits identified in the current work match each other well. The optimal five-trait model assigns proficiency scores to all respondents for each of the five traits. We construct a correlation matrix between the proficiencies in each of these traits. This correlation matrix shows strong correlations between some proficiencies, and strong anticorrelations between others. We present an interpretation of this correlation matrix.

  20. Assessment of endogenous dopamine release by methylphenidate challenge using iodine-123 iodobenzamide single-photon emission tomography

    International Nuclear Information System (INIS)

    Booij, J.; Korn, P.; Linszen, D.H.; Royen, E.A. van

    1997-01-01

    This double-blind, placebo-controlled study assessed pharmacologically induced endogenous dopamine (DA) release in healthy male volunteers (n=12). Changes in endogenous DA release after injection of the psychostimulant drug methylphenidate were evaluated by single-photon emission tomography (SPET) and constant infusion of iodine-123 iodobenzamide ([ 123 I[IBZM), a D 2 receptor radioligand that is sensitive to endogenous DA release. Methylphenidate induced displacement of striatal [ 123 I[IBZM binding, resulting in a significantly decrease in the specific to non-specific [ 123 I[IBZM uptake ratio (average: 8.6%) in comparison with placebo (average: -1.9%). Moreover, injection of methylphenidate induced significant behavioural responses on the following items: excitement, anxiety, tension, and mannerisms and posturing. The results of this study demonstrate the feasibility of using constant infusion of [ 123 I[IBZM and SPET imaging to measure endogenous DA release after methylphenidate challenge and to investigate neurochemical aspects of behaviour. (orig.). With 2 figs., 1 tab

  1. Does the Assessment of Recovery Capital scale reflect a single or multiple domains?

    Directory of Open Access Journals (Sweden)

    Arndt S

    2017-07-01

    Full Text Available Stephan Arndt,1–3 Ethan Sahker,1,4 Suzy Hedden1 1Iowa Consortium for Substance Abuse Research and Evaluation, 2Department of Psychiatry, Carver College of Medicine, 3Department of Biostatistics, College of Public Health, 4Department of Psychological and Quantitative Foundations, Counseling Psychology Program College of Education, University of Iowa, Iowa City, IA, USA Objective: The goal of this study was to determine whether the 50-item Assessment of Recovery Capital scale represents a single general measure or whether multiple domains might be psychometrically useful for research or clinical applications. Methods: Data are from a cross-sectional de-identified existing program evaluation information data set with 1,138 clients entering substance use disorder treatment. Principal components and iterated factor analysis were used on the domain scores. Multiple group factor analysis provided a quasi-confirmatory factor analysis. Results: The solution accounted for 75.24% of the total variance, suggesting that 10 factors provide a reasonably good fit. However, Tucker’s congruence coefficients between the factor structure and defining weights (0.41–0.52 suggested a poor fit to the hypothesized 10-domain structure. Principal components of the 10-domain scores yielded one factor whose eigenvalue was greater than one (5.93, accounting for 75.8% of the common variance. A few domains had perceptible but small unique variance components suggesting that a few of the domains may warrant enrichment. Conclusion: Our findings suggest that there is one general factor, with a caveat. Using the 10 measures inflates the chance for Type I errors. Using one general measure avoids this issue, is simple to interpret, and could reduce the number of items. However, those seeking to maximally predict later recovery success may need to use the full instrument and all 10 domains. Keywords: social support, psychometrics, quality of life

  2. Item analysis of the Spanish version of the Boston Naming Test with a Spanish speaking adult population from Colombia.

    Science.gov (United States)

    Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos

    2018-02-23

    The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.

  3. A validation study using a modified version of Postural Assessment Scale for Stroke Patients: Postural Stroke Study in Gothenburg (POSTGOT

    Directory of Open Access Journals (Sweden)

    Danielsson Anna

    2011-10-01

    Full Text Available Abstract Background A modified version of Postural Assessment Scale for Stroke Patients (PASS was created with some changes in the description of the items and clarifications in the manual (e.g. much help was defined as support from 2 persons. The aim of this validation study was to assess intrarater and interrater reliability using this modified version of PASS, at a stroke unit, for patients in the acute phase after their first event of stroke. Methods In the intrarater reliability study 114 patients and in the interrater reliability study 15 patients were examined twice with the test within one to 24 hours in the first week after stroke. Spearman's rank correlation, Kappa coefficients, Percentage Agreement and the newer rank-invariant methods; Relative Position, Relative Concentration and Relative rank Variance were used for the statistical analysis. Results For the intrarater reliability Spearman's rank correlations were 0.88-0.98 and k were 0.70-0.93 for the individual items. Small, statistically significant, differences were found for two items regarding Relative Position and for one item regarding Relative Concentration. There was no Relative rank Variance for any single item. For the interrater reliability, Spearman's rank correlations were 0.77-0.99 for individual items. For some items there was a possible, even if not proved, reliability problem regarding Relative Position and Relative Concentration. There was no Relative rank Variance for the single items, except for a small Relative rank Variance for one item. Conclusions The high intrarater and interrater reliability shown for the modified Postural Assessment Scale for Stroke Patients, the Swedish version of Postural Assessment Scale for Stroke Patients, with traditional and newer statistical analyses, particularly for assessments performed by the same rater, support the use of the Swedish version of Postural Assessment Scale for Stroke Patients, in the acute stage after stroke both

  4. An Arrangement of the Items Influencing Assessment of the Electrotechnical Technology Course / PROEJA, campuses Campos Centro and Itaperuna: The Learners’ View

    Directory of Open Access Journals (Sweden)

    Jorge Luíz Clemente Gomes

    2016-04-01

    Full Text Available This work aims to organize pre-defined items that affect the students’ answers when assessing the Electrotechnical Technology Course / PROEJA. The research was carried out from October / 2011 to December / 2012 with questionnaires applied with 1st to 6th period students. At campus Campos Centro, “Technical Visits” and “Internship” presented high levels of importance and low satisfaction, while “Personal Realization” and “Professional Achievement” presented high levels of relevance and satisfaction. At campus Itaperuna, “Job opportunities” and “Professional Achievement” presented high levels of relevance and satisfaction. Items “Faculty” and “New Technologies”, presented high importance but low satisfaction. The research aims at improving the quality of the course.

  5. International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

    Science.gov (United States)

    Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey

    2016-01-01

    We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…

  6. Item analysis of single-peaked response data : the psychometric evaluation of bipolar measurement scales

    NARCIS (Netherlands)

    Polak, Maaike Geertruida

    2011-01-01

    The thesis explains the fundamental difference between unipolar and bipolar measurement scales for psychological characteristics. We explore the use of correspondence analysis (CA), a technique that is similar to principal component analysis and is available in SAS and SPSS, to select items that

  7. Content Validity and Psychometric Characteristics of the "Knowledge about Older Patients Quiz" for Nurses Using Item Response Theory.

    Science.gov (United States)

    Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J

    2016-11-01

    To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational

  8. Bibliometric studies on single journals: a review

    OpenAIRE

    Kevin Wan , Utap Anyi; Anuar , N.B.; Zainab, A.N

    2009-01-01

    This paper covers a total of 82 bibliometric studies on single journals (62 studies cover unique titles) published between 1998 and 2008 grouped into the following fields; Arts, Humanities and Social Sciences (12 items); Medical and Health Sciences (19 items); Sciences and Technology (30 items) and Library and Information Sciences (21 items). Under each field the studies are described in accordance to their geographical location in the following order, United Kingdom, United States and Americ...

  9. Psychometric Validation of the World Health Organization Disability Assessment Schedule 2.0-Twelve-Item Version in Persons with Spinal Cord Injuries

    Science.gov (United States)

    Smedema, Susan Miller; Ruiz, Derek; Mohr, Michael J.

    2017-01-01

    Purpose: To evaluate the factorial and concurrent validity and internal consistency reliability of the World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) 12-item version in persons with spinal cord injuries. Method: Two hundred forty-seven adults with spinal cord injuries completed an online survey consisting of the WHODAS…

  10. Assessing Psychopathy Among Justice Involved Adolescents with the PCL: YV: An Item Response Theory Examination Across Gender

    Science.gov (United States)

    Tsang, Siny; Schmidt, Karen M.; Vincent, Gina M.; Salekin, Randall T.; Moretti, Marlene M.; Odgers, Candice L.

    2014-01-01

    This study used an item response theory (IRT) model and a large adolescent sample of justice involved youth (N = 1,007, 38% female) to examine the item functioning of the Psychopathy Checklist – Youth Version (PCL: YV). Items that were most discriminating (or most sensitive to changes) of the latent trait (thought to be psychopathy) among adolescents included “Glibness/superficial charm”, “Lack of remorse”, and “Need for stimulation”, whereas items that were least discriminating included “Pathological lying”, “Failure to accept responsibility”, and “Lacks goals.” The items “Impulsivity” and “Irresponsibility” were the most likely to be rated high among adolescents, whereas “Parasitic lifestyle”, and “Glibness/superficial charm” were the most likely to be rated low. Evidence of differential item functioning (DIF) on four of the 13 items was found between boys and girls. “Failure to accept responsibility” and “Impulsivity” were endorsed more frequently to describe adolescent girls than boys at similar levels of the latent trait, and vice versa for “Grandiose sense of self-worth” and “Lacks goals.” The DIF findings suggest that four PCL: YV items function differently between boys and girls. PMID:25580672

  11. Dimensionality of the UWES-17: An item response modelling analysis

    Directory of Open Access Journals (Sweden)

    Deon P. de Bruin

    2013-10-01

    Research purpose: The main focus of this study was to use the Rasch model to provide insight into the dimensionality of the UWES-17, and to assess whether work engagement should be interpreted as one single overall score, three separate scores, or a combination. Motivation for the study: It is unclear whether a summative score is more representative of work engagement or whether scores are more meaningful when interpreted for each dimension separately. Previous work relied on confirmatory factor analysis; the potential of item response models has not been tapped. Research design: A quantitative cross-sectional survey design approach was used. Participants, 2429 employees of a South African Information and Communication Technology (ICT company, completed the UWES-17. Main findings: Findings indicate that work engagement should be treated as a unidimensional construct: individual scores should be interpreted in a summative manner, giving a single global score. Practical/managerial implications: Users of the UWES-17 may interpret a single, summative score for work engagement. Findings of this study should also contribute towards standardising UWES-17 scores, allowing meaningful comparisons to be made. Contribution/value-add: The findings will benefit researchers, organisational consultants and managers. Clarity on dimensionality and interpretation of work engagement will assist researchers in future studies. Managers and consultants will be able to make better-informed decisions when using work engagement data.

  12. Diverse Food Items Are Similarly Categorized by 8- to 13-Year-Old Children

    Science.gov (United States)

    Beltran, Alicia; Knight Sepulveda, Karina; Watson, Kathy; Baranowski, Tom; Baranowski, Janice; Islam, Noemi; Missaghian, Mariam

    2008-01-01

    Objective: Assess how 8- to 13-year-old children categorized and labeled food items for possible use as part of a food search strategy in a computerized 24-hour dietary recall. Design: A set of 62 cards with pictures and names of food items from 18 professionally defined food groups was sorted by each child into piles of similar food items.…

  13. A novel multi-item joint replenishment problem considering multiple type discounts.

    Directory of Open Access Journals (Sweden)

    Ligang Cui

    Full Text Available In business replenishment, discount offers of multi-item may either provide different discount schedules with a single discount type, or provide schedules with multiple discount types. The paper investigates the joint effects of multiple discount schemes on the decisions of multi-item joint replenishment. In this paper, a joint replenishment problem (JRP model, considering three discount (all-unit discount, incremental discount, total volume discount offers simultaneously, is constructed to determine the basic cycle time and joint replenishment frequencies of multi-item. To solve the proposed problem, a heuristic algorithm is proposed to find the optimal solutions and the corresponding total cost of the JRP model. Numerical experiment is performed to test the algorithm and the computational results of JRPs under different discount combinations show different significance in the replenishment cost reduction.

  14. Varying the item format improved the range of measurement in patient-reported outcome measures assessing physical function.

    Science.gov (United States)

    Liegl, Gregor; Gandek, Barbara; Fischer, H Felix; Bjorner, Jakob B; Ware, John E; Rose, Matthias; Fries, James F; Nolte, Sandra

    2017-03-21

    Physical function (PF) is a core patient-reported outcome domain in clinical trials in rheumatic diseases. Frequently used PF measures have ceiling effects, leading to large sample size requirements and low sensitivity to change. In most of these instruments, the response category that indicates the highest PF level is the statement that one is able to perform a given physical activity without any limitations or difficulty. This study investigates whether using an item format with an extended response scale, allowing respondents to state that the performance of an activity is easy or very easy, increases the range of precise measurement of self-reported PF. Three five-item PF short forms were constructed from the Patient-Reported Outcomes Measurement Information System (PROMIS®) wave 1 data. All forms included the same physical activities but varied in item stem and response scale: format A ("Are you able to …"; "without any difficulty"/"unable to do"); format B ("Does your health now limit you …"; "not at all"/"cannot do"); format C ("How difficult is it for you to …"; "very easy"/"impossible"). Each short-form item was answered by 2217-2835 subjects. We evaluated unidimensionality and estimated a graded response model for the 15 short-form items and remaining 119 items of the PROMIS PF bank to compare item and test information for the short forms along the PF continuum. We then used simulated data for five groups with different PF levels to illustrate differences in scoring precision between the short forms using different item formats. Sufficient unidimensionality of all short-form items and the original PF item bank was supported. Compared to formats A and B, format C increased the range of reliable measurement by about 0.5 standard deviations on the positive side of the PF continuum of the sample, provided more item information, and was more useful in distinguishing known groups with above-average functioning. Using an item format with an extended

  15. Exploratory factor analysis of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale in people newly diagnosed with advanced cancer.

    Science.gov (United States)

    Bai, Mei; Dixon, Jane K

    2014-01-01

    The purpose of this study was to reexamine the factor pattern of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale (FACIT-Sp-12) using exploratory factor analysis in people newly diagnosed with advanced cancer. Principal components analysis (PCA) and 3 common factor analysis methods were used to explore the factor pattern of the FACIT-Sp-12. Factorial validity was assessed in association with quality of life (QOL). Principal factor analysis (PFA), iterative PFA, and maximum likelihood suggested retrieving 3 factors: Peace, Meaning, and Faith. Both Peace and Meaning positively related to QOL, whereas only Peace uniquely contributed to QOL. This study supported the 3-factor model of the FACIT-Sp-12. Suggestions for revision of items and further validation of the identified factor pattern were provided.

  16. Behavioral decoding of working memory items inside and outside the focus of attention.

    Science.gov (United States)

    Mallett, Remington; Lewis-Peacock, Jarrod A

    2018-03-31

    How we attend to our thoughts affects how we attend to our environment. Holding information in working memory can automatically bias visual attention toward matching information. By observing attentional biases on reaction times to visual search during a memory delay, it is possible to reconstruct the source of that bias using machine learning techniques and thereby behaviorally decode the content of working memory. Can this be done when more than one item is held in working memory? There is some evidence that multiple items can simultaneously bias attention, but the effects have been inconsistent. One explanation may be that items are stored in different states depending on the current task demands. Recent models propose functionally distinct states of representation for items inside versus outside the focus of attention. Here, we use behavioral decoding to evaluate whether multiple memory items-including temporarily irrelevant items outside the focus of attention-exert biases on visual attention. Only the single item in the focus of attention was decodable. The other item showed a brief attentional bias that dissipated until it returned to the focus of attention. These results support the idea of dynamic, flexible states of working memory across time and priority. © 2018 New York Academy of Sciences.

  17. Differential item functioning magnitude and impact measures from item response theory models.

    Science.gov (United States)

    Kleinman, Marjorie; Teresi, Jeanne A

    2016-01-01

    Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.

  18. Science Literacy: How do High School Students Solve PISA Test Items?

    Science.gov (United States)

    Wati, F.; Sinaga, P.; Priyandoko, D.

    2017-09-01

    The Programme for International Students Assessment (PISA) does assess students’ science literacy in a real-life contexts and wide variety of situation. Therefore, the results do not provide adequate information for the teacher to excavate students’ science literacy because the range of materials taught at schools depends on the curriculum used. This study aims to investigate the way how junior high school students in Indonesia solve PISA test items. Data was collected by using PISA test items in greenhouse unit employed to 36 students of 9th grade. Students’ answer was analyzed qualitatively for each item based on competence tested in the problem. The way how students answer the problem exhibits their ability in particular competence which is influenced by a number of factors. Those are students’ unfamiliarity with test construction, low performance on reading, low in connecting available information and question, and limitation on expressing their ideas effectively and easy-read. As the effort, selected PISA test items can be used in accordance teaching topic taught to familiarize students with science literacy.

  19. Adaptive screening for depression--recalibration of an item bank for the assessment of depression in persons with mental and somatic diseases and evaluation in a simulated computer-adaptive test environment.

    Science.gov (United States)

    Forkmann, Thomas; Kroehne, Ulf; Wirtz, Markus; Norra, Christine; Baumeister, Harald; Gauggel, Siegfried; Elhan, Atilla Halil; Tennant, Alan; Boecker, Maren

    2013-11-01

    This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated. Recalibration was performed in a sample of 161 patients treated for a depressive syndrome, 103 patients from cardiology, and 103 patients from otorhinolaryngology (mean age 44.1, SD=14.0; 44.7% female) and was cross-validated in a sample of 117 patients undergoing rehabilitation for cardiac diseases (mean age 58.4, SD=10.5; 24.8% women). Unidimensionality of the itembank was checked and a Rasch analysis was performed that evaluated local dependency (LD), differential item functioning (DIF), item fit and reliability. CAT-simulation was conducted with the total sample and additional simulated data. Recalibration resulted in a strictly unidimensional item bank with 36 items, showing good Rasch model fit (item fit residualsLD. CAT simulation revealed that 13 items on average were necessary to estimate depression in the range of -2 and +2 logits when terminating at SE≤0.32 and 4 items if using SE≤0.50. Receiver Operating Characteristics analysis showed that θ estimates based on the CAT algorithm have good criterion validity with regard to depression diagnoses (Area Under the Curve≥.78 for all cut-off criteria). The recalibration of the ADIB succeeded and the simulation studies conducted suggest that it has good screening performance in the samples investigated and that it may reasonably add to the improvement of depression assessment. © 2013.

  20. Development of a subjective cognitive decline questionnaire using item response theory: a pilot study.

    Science.gov (United States)

    Gifford, Katherine A; Liu, Dandan; Romano, Raymond; Jones, Richard N; Jefferson, Angela L

    2015-12-01

    Subjective cognitive decline (SCD) may indicate unhealthy cognitive changes, but no standardized SCD measurement exists. This pilot study aims to identify reliable SCD questions. 112 cognitively normal (NC, 76±8 years, 63% female), 43 mild cognitive impairment (MCI; 77±7 years, 51% female), and 33 diagnostically ambiguous participants (79±9 years, 58% female) were recruited from a research registry and completed 57 self-report SCD questions. Psychometric methods were used for item-reduction. Factor analytic models assessed unidimensionality of the latent trait (SCD); 19 items were removed with extreme response distribution or trait-fit. Item response theory (IRT) provided information about question utility; 17 items with low information were dropped. Post-hoc simulation using computerized adaptive test (CAT) modeling selected the most commonly used items (n=9 of 21 items) that represented the latent trait well (r=0.94) and differentiated NC from MCI participants (F(1,146)=8.9, p=0.003). Item response theory and computerized adaptive test modeling identified nine reliable SCD items. This pilot study is a first step toward refining SCD assessment in older adults. Replication of these findings and validation with Alzheimer's disease biomarkers will be an important next step for the creation of a SCD screener.

  1. Evaluating the healthiness of chain-restaurant menu items using crowdsourcing: a new method.

    Science.gov (United States)

    Lesser, Lenard I; Wu, Leslie; Matthiessen, Timothy B; Luft, Harold S

    2017-01-01

    To develop a technology-based method for evaluating the nutritional quality of chain-restaurant menus to increase the efficiency and lower the cost of large-scale data analysis of food items. Using a Modified Nutrient Profiling Index (MNPI), we assessed chain-restaurant items from the MenuStat database with a process involving three steps: (i) testing 'extreme' scores; (ii) crowdsourcing to analyse fruit, nut and vegetable (FNV) amounts; and (iii) analysis of the ambiguous items by a registered dietitian. In applying the approach to assess 22 422 foods, only 3566 could not be scored automatically based on MenuStat data and required further evaluation to determine healthiness. Items for which there was low agreement between trusted crowd workers, or where the FNV amount was estimated to be >40 %, were sent to a registered dietitian. Crowdsourcing was able to evaluate 3199, leaving only 367 to be reviewed by the registered dietitian. Overall, 7 % of items were categorized as healthy. The healthiest category was soups (26 % healthy), while desserts were the least healthy (2 % healthy). An algorithm incorporating crowdsourcing and a dietitian can quickly and efficiently analyse restaurant menus, allowing public health researchers to analyse the healthiness of menu items.

  2. Complement or Contamination: A Study of the Validity of Multiple-Choice Items when Assessing Reasoning Skills in Physics

    OpenAIRE

    Anders Jönsson; David Rosenlund; Fredrik Alvén

    2017-01-01

    The purpose of this study is to investigate the validity of using multiple-choice (MC) items as a complement to constructed-response (CR) items when making decisions about student performance on reasoning tasks. CR items from a national test in physics have been reformulated into MC items and students’ reasoning skills have been analyzed in two substudies. In the first study, 12 students answered the MC items and were asked to explain their answers orally. In the second study, 102 students fr...

  3. Evaluating rivastigmine in mild-to-moderate Parkinson's disease dementia using ADAS-cog items.

    Science.gov (United States)

    Schmitt, Frederick A; Aarsland, Dag; Brønnick, Kolbjørn S; Meng, Xiangyi; Tekin, Sibel; Olin, Jason T

    2010-08-01

    Rivastigmine has been shown to improve cognition in patients with Parkinson's disease dementia (PDD). To further explore the impact of anticholinesterase therapy on PDD, Alzheimer's Disease Assessment Scale-cognitive subscale (ADAS-cog) items were assessed in a retrospective analysis of a 24-week, double-blind, placebo-controlled trial of rivastigmine. Mean changes from baseline at week 24 were calculated for ADAS-cog item scores and for 3 cognitive domain scores. A total of 362 patients were randomized to 3 to 12 mg/d rivastigmine capsules and 179 to placebo. Patients with PDD receiving rivastigmine improved versus placebo on items: word recall, following commands, ideational praxis, remembering test instructions, and comprehension of spoken language (P ADAS-cog is sensitive to broad cognitive changes in PDD. Overall, rivastigmine was associated with improvements on individual cognitive items and general cognitive domains.

  4. Varying the item format improved the range of measurement in patient-reported outcome measures assessing physical function

    DEFF Research Database (Denmark)

    Liegl, Gregor; Gandek, Barbara; Fischer, H. Felix

    2017-01-01

    precision between the short forms using different item formats. Results: Sufficient unidimensionality of all short-form items and the original PF item bank was supported. Compared to formats A and B, format C increased the range of reliable measurement by about 0.5 standard deviations on the positive side...

  5. A Model of Batch Scheduling for a Single Batch Processor with Additional Setups to Minimize Total Inventory Holding Cost of Parts of a Single Item Requested at Multi-due-date

    Science.gov (United States)

    Hakim Halim, Abdul; Ernawati; Hidayat, Nita P. A.

    2018-03-01

    This paper deals with a model of batch scheduling for a single batch processor on which a number of parts of a single items are to be processed. The process needs two kinds of setups, i. e., main setups required before processing any batches, and additional setups required repeatedly after the batch processor completes a certain number of batches. The parts to be processed arrive at the shop floor at the times coinciding with their respective starting times of processing, and the completed parts are to be delivered at multiple due dates. The objective adopted for the model is that of minimizing total inventory holding cost consisting of holding cost per unit time for a part in completed batches, and that in in-process batches. The formulation of total inventory holding cost is derived from the so-called actual flow time defined as the interval between arrival times of parts at the production line and delivery times of the completed parts. The actual flow time satisfies not only minimum inventory but also arrival and delivery just in times. An algorithm to solve the model is proposed and a numerical example is shown.

  6. Item-saving assessment of self-care performance in children with developmental disabilities: A prospective caregiver-report computerized adaptive test

    Science.gov (United States)

    Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi

    2018-01-01

    Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with

  7. Linguistic Simplification of Mathematics Items: Effects for Language Minority Students in Germany

    Science.gov (United States)

    Haag, Nicole; Heppt, Birgit; Roppelt, Alexander; Stanat, Petra

    2015-01-01

    In large-scale assessment studies, language minority students typically obtain lower test scores in mathematics than native speakers. Although this performance difference was related to the linguistic complexity of test items in some studies, other studies did not find linguistically demanding math items to be disproportionally more difficult for…

  8. Effects of Using Modified Items to Test Students with Persistent Academic Difficulties

    Science.gov (United States)

    Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.

    2010-01-01

    This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…

  9. Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form.

    Science.gov (United States)

    Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W

    2015-05-01

    To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.

  10. Test-retest reliability of selected items of Health Behaviour in School-aged Children (HBSC survey questionnaire in Beijing, China

    Directory of Open Access Journals (Sweden)

    Liu Yang

    2010-08-01

    Full Text Available Abstract Background Children's health and health behaviour are essential for their development and it is important to obtain abundant and accurate information to understand young people's health and health behaviour. The Health Behaviour in School-aged Children (HBSC study is among the first large-scale international surveys on adolescent health through self-report questionnaires. So far, more than 40 countries in Europe and North America have been involved in the HBSC study. The purpose of this study is to assess the test-retest reliability of selected items in the Chinese version of the HBSC survey questionnaire in a sample of adolescents in Beijing, China. Methods A sample of 95 male and female students aged 11 or 15 years old participated in a test and retest with a three weeks interval. Student Identity numbers of respondents were utilized to permit matching of test-retest questionnaires. 23 items concerning physical activity, sedentary behaviour, sleep and substance use were evaluated by using the percentage of response shifts and the single measure Intraclass Correlation Coefficients (ICC with 95% confidence interval (CI for all respondents and stratified by gender and age. Items on substance use were only evaluated for school children aged 15 years old. Results The percentage of no response shift between test and retest varied from 32% for the item on computer use at weekends to 92% for the three items on smoking. Of all the 23 items evaluated, 6 items (26% showed a moderate reliability, 12 items (52% displayed a substantial reliability and 4 items (17% indicated almost perfect reliability. No gender and age group difference of the test-retest reliability was found except for a few items on sedentary behaviour. Conclusions The overall findings of this study suggest that most selected indicators in the HBSC survey questionnaire have satisfactory test-retest reliability for the students in Beijing. Further test-retest studies in a large

  11. Protein single-model quality assessment by feature-based probability density functions.

    Science.gov (United States)

    Cao, Renzhi; Cheng, Jianlin

    2016-04-04

    Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method-Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.

  12. Seeking missing pieces in science concept assessments: Reevaluating the Brief Electricity and Magnetism Assessment through Rasch analysis

    Directory of Open Access Journals (Sweden)

    Lin Ding

    2014-02-01

    Full Text Available Discipline-based science concept assessments are powerful tools to measure learners’ disciplinary core ideas. Among many such assessments, the Brief Electricity and Magnetism Assessment (BEMA has been broadly used to gauge student conceptions of key electricity and magnetism (E&M topics in college-level introductory physics courses. Differing from typical concept inventories that focus only on one topic of a subject area, BEMA covers a broad range of topics in the electromagnetism domain. In spite of this fact, prior studies exclusively used a single aggregate score to represent individual students’ overall understanding of E&M without explicating the construct of this assessment. Additionally, BEMA has been used to compare traditional physics courses with a reformed course entitled Matter and Interactions (M&I. While prior findings were in favor of M&I, no empirical evidence was sought to rule out possible differential functioning of BEMA that may have inadvertently advantaged M&I students. In this study, we used Rasch analysis to seek two missing pieces regarding the construct and differential functioning of BEMA. Results suggest that although BEMA items generally can function together to measure the same construct of application and analysis of E&M concepts, several items may need further revision. Additionally, items that demonstrate differential functioning for the two courses are detected. Issues such as item contextual features and student familiarity with question settings may underlie these findings. This study highlights often overlooked threats in science concept assessments and provides an exemplar for using evidence-based reasoning to make valid inferences and arguments.

  13. Seeking missing pieces in science concept assessments: Reevaluating the Brief Electricity and Magnetism Assessment through Rasch analysis

    Science.gov (United States)

    Ding, Lin

    2014-02-01

    Discipline-based science concept assessments are powerful tools to measure learners' disciplinary core ideas. Among many such assessments, the Brief Electricity and Magnetism Assessment (BEMA) has been broadly used to gauge student conceptions of key electricity and magnetism (E&M) topics in college-level introductory physics courses. Differing from typical concept inventories that focus only on one topic of a subject area, BEMA covers a broad range of topics in the electromagnetism domain. In spite of this fact, prior studies exclusively used a single aggregate score to represent individual students' overall understanding of E&M without explicating the construct of this assessment. Additionally, BEMA has been used to compare traditional physics courses with a reformed course entitled Matter and Interactions (M&I). While prior findings were in favor of M&I, no empirical evidence was sought to rule out possible differential functioning of BEMA that may have inadvertently advantaged M&I students. In this study, we used Rasch analysis to seek two missing pieces regarding the construct and differential functioning of BEMA. Results suggest that although BEMA items generally can function together to measure the same construct of application and analysis of E&M concepts, several items may need further revision. Additionally, items that demonstrate differential functioning for the two courses are detected. Issues such as item contextual features and student familiarity with question settings may underlie these findings. This study highlights often overlooked threats in science concept assessments and provides an exemplar for using evidence-based reasoning to make valid inferences and arguments.

  14. What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.

    Science.gov (United States)

    Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne

    Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…

  15. Memory-based attention capture when multiple items are maintained in visual working memory.

    Science.gov (United States)

    Hollingworth, Andrew; Beck, Valerie M

    2016-07-01

    Efficient visual search requires that attention is guided strategically to relevant objects, and most theories of visual search implement this function by means of a target template maintained in visual working memory (VWM). However, there is currently debate over the architecture of VWM-based attentional guidance. We contrasted a single-item-template hypothesis with a multiple-item-template hypothesis, which differ in their claims about structural limits on the interaction between VWM representations and perceptual selection. Recent evidence from van Moorselaar, Theeuwes, and Olivers (2014) indicated that memory-based capture during search, an index of VWM guidance, is not observed when memory set size is increased beyond a single item, suggesting that multiple items in VWM do not guide attention. In the present study, we maximized the overlap between multiple colors held in VWM and the colors of distractors in a search array. Reliable capture was observed when 2 colors were held in VWM and both colors were present as distractors, using both the original van Moorselaar et al. singleton-shape search task and a search task that required focal attention to array elements (gap location in outline square stimuli). In the latter task, memory-based capture was consistent with the simultaneous guidance of attention by multiple VWM representations. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  16. Random Item Generation Is Affected by Age

    Science.gov (United States)

    Multani, Namita; Rudzicz, Frank; Wong, Wing Yiu Stephanie; Namasivayam, Aravind Kumar; van Lieshout, Pascal

    2016-01-01

    Purpose: Random item generation (RIG) involves central executive functioning. Measuring aspects of random sequences can therefore provide a simple method to complement other tools for cognitive assessment. We examine the extent to which RIG relates to specific measures of cognitive function, and whether those measures can be estimated using RIG…

  17. The Dutch-Flemish PROMIS Physical Function item bank exhibited strong psychometric properties in patients with chronic pain.

    Science.gov (United States)

    Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D

    2017-07-01

    The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Numerosity estimates for attended and unattended items in visual search.

    Science.gov (United States)

    Kelley, Troy D; Cassenti, Daniel N; Marusich, Laura R; Ghirardelli, Thomas G

    2017-07-01

    The goal of this research was to examine memories created for the number of items during a visual search task. Participants performed a visual search task for a target defined by a single feature (Experiment 1A), by a conjunction of features (Experiment 1B), or by a specific spatial configuration of features (Experiment 1C). On some trials following the search task, subjects were asked to recall the total number of items in the previous display. In all search types, participants underestimated the total number of items, but the severity of the underestimation varied depending on the efficiency of the search. In three follow-up studies (Experiments 2A, 2B, and 2C) using the same visual stimuli, the participants' only task was to estimate the number of items on each screen. Participants still underestimated the numerosity of the items, although the degree of underestimation was smaller than in the search tasks and did not depend on the type of visual stimuli. In Experiment 3, participants were asked to recall the number of items in a display only once. Subjects still displayed a tendency to underestimate, indicating that the underestimation effects seen in Experiments 1A-1C were not attributable to knowledge of the estimation task. The degree of underestimation depends on the efficiency of the search task, with more severe underestimation in efficient search tasks. This suggests that the lower attentional demands of very efficient searches leads to less encoding of numerosity of the distractor set.

  19. The Role of Content and Context in PISA Interest Scales: A study of the embedded interest items in the PISA 2006 science assessment

    Science.gov (United States)

    Drechsel, Barbara; Carstensen, Claus; Prenzel, Manfred

    2011-01-01

    This paper focuses interest in science as one of the attitudinal aspects of scientific literacy. Large-scale data from the Programme for International Student Assessment (PISA) 2006 are analysed in order to describe student interest more precisely. So far the analyses have provided a general indicator of interest, aggregated over all contexts and contents in the science test. With its innovative approach PISA embeds interest items within the cognitive test unit and its contents and contexts. The main difference from conventional interest measures is that in most questionnaires, a relatively small number of interest items cover broad fields of contents and contexts. The science units represent a number of systematically differentiated scientific contexts and contents. The units' stimulus texts allow for concrete descriptions of relevant content aspects, applications, and contexts. In the analyses, multidimensional item response models are applied in order to disentangle student interest. The results indicate that multidimensional models fit the data. A two-dimensional model separating interest into two different knowledge of science dimensions described in the PISA science framework is further analysed with respect to gender, performance differences, and country. The findings give a comprehensive description of students' interest in science. The paper deals with methodological problems and describes requirements of the test construction for further assessments. The results are discussed with regard to their significance for science education.

  20. The importance of rating scale design in the measurement of patient-reported outcomes using questionnaires or item banks.

    Science.gov (United States)

    Khadka, Jyoti; McAlinden, Colm; Gothwal, Vijaya K; Lamoureux, Ecosse L; Pesudovs, Konrad

    2012-06-26

    To investigate the effect of rating scale designs (question formats and response categories) on item difficulty calibrations and assess the impact that rating scale differences have on overall vision-related activity limitation (VRAL) scores. Sixteen existing patient-reported outcome instruments (PROs) suitable for cataract assessment, with different rating scales, were self-administered by patients on a cataract surgery waiting list. A total of 226 VRAL items from these PROs in their native rating scales were included in an item bank and calibrated using Rasch analysis. Fifteen item/content areas (e.g., reading newspapers) appearing in at least three different PROs were identified. Within each content area, item calibrations were compared and their range calculated. Similarly, five PROs having at least three items in common with the Visual Function (VF-14) were compared in terms of average item measures. A total of 614 patients (mean age ± SD, 74.1 ± 9.4 years) participated. Items with the same content varied in their calibration by as much as two logits; "reading the small print" had the largest range (1.99 logits) followed by "watching TV" (1.60). Compared with the VF-14 (0.00 logits), the rating scale of the Visual Disability Assessment (1.13 logits) produced the most difficult items and the Cataract Symptom Scale (0.24 logits) produced the least difficult items. The VRAL item bank was suboptimally targeted to the ability level of the participants (2.00 logits). Rating scale designs have a significant effect on item calibrations. Therefore, constructing item banks from existing items in their native formats carries risks to face validity and transmission of problems inherent in existing instruments, such as poor targeting.

  1. Self-assessed health-related quality of life (HRQOL) in men currently being treated for prostate cancer (PC) with radiotherapy

    International Nuclear Information System (INIS)

    Dale, William; Ignacio, Lani; Vijayakumar, Srinivasan

    1996-01-01

    Purpose/Objective: A questionnaire was designed to assess three dimensions of HRQOL symptoms known to be important for PC patients from clinical evidence and the literature: bowel function (BF), urinary function (UF), and sexual function (SF). This questionnaire was tested for reliability and validity for patients currently receiving radiotherapy for PC. There has been some suggestion that patients can suffer along several dimensions of HRQOL, yet not feel that their lives are adversely affected by these apparent impairment. Each of the HRQOL dimensions was related to a question asking directly how bothersome the reported symptoms were perceived to be. Materials and Methods: A six-page questionnaire was given to patients during treatment visits for radiation therapy for PC. The questionnaire design is based on clinical experience and a literature review to assess three HRQOL dimensions using, Likert-type questions: BF (12 items), UF (11 items), and SF (9 items), as well as a single question for each asking about how bothersome the reported symptoms are to the patient. Items in each section were analyzed with principle components factor analysis to identify meaningful sub-scales. Items found to have high factor loadings were grouped together to form scales, and the reliability and validity of the created scales was assessed. The scale scores were used to assess whether increased symptoms resulted in an increase in the perceived 'bothersomeness' to patients from the symptoms. Results: For the 62 cases, sub-scales were identified in each dimension from the factor analysis. For BF, sub-scales were identified: an 'urgency' scale (4 items), a 'daily living' scale (3 items), and single 'blood' item; for UF, sub-scales were identified for an 'urgency' scale (4 items), a 'weakness of urinary stream' scale (3 items), and a single 'blood' item; for SF, sub-scales were identified for an 'interest/satisfaction' scale (5 items) and for an 'impotence' scale (3 items). Reliability

  2. Teoria da Resposta ao Item Teoria de la respuesta al item Item response theory

    Directory of Open Access Journals (Sweden)

    Eutalia Aparecida Candido de Araujo

    2009-12-01

    Full Text Available A preocupação com medidas de traços psicológicos é antiga, sendo que muitos estudos e propostas de métodos foram desenvolvidos no sentido de alcançar este objetivo. Entre os trabalhos propostos, destaca-se a Teoria da Resposta ao Item (TRI que, a princípio, veio completar limitações da Teoria Clássica de Medidas, empregada em larga escala até hoje na medida de traços psicológicos. O ponto principal da TRI é que ela leva em consideração o item particularmente, sem relevar os escores totais; portanto, as conclusões não dependem apenas do teste ou questionário, mas de cada item que o compõe. Este artigo propõe-se a apresentar esta Teoria que revolucionou a teoria de medidas.La preocupación con las medidas de los rasgos psicológicos es antigua y muchos estudios y propuestas de métodos fueron desarrollados para lograr este objetivo. Entre estas propuestas de trabajo se incluye la Teoría de la Respuesta al Ítem (TRI que, en principio, vino a completar las limitaciones de la Teoría Clásica de los Tests, ampliamente utilizada hasta hoy en la medida de los rasgos psicológicos. El punto principal de la TRI es que se tiene en cuenta el punto concreto, sin relevar las puntuaciones totales; por lo tanto, los resultados no sólo dependen de la prueba o cuestionario, sino que de cada ítem que lo compone. En este artículo se propone presentar la Teoría que revolucionó la teoría de medidas.The concern with measures of psychological traits is old and many studies and proposals of methods were developed to achieve this goal. Among these proposed methods highlights the Item Response Theory (IRT that, in principle, came to complete limitations of the Classical Test Theory, which is widely used until nowadays in the measurement of psychological traits. The main point of IRT is that it takes into account the item in particular, not relieving the total scores; therefore, the findings do not only depend on the test or questionnaire

  3. A note on monotonicity of item response functions for ordered polytomous item response theory models.

    Science.gov (United States)

    Kang, Hyeon-Ah; Su, Ya-Hui; Chang, Hua-Hua

    2018-03-08

    A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales. © 2018 The British Psychological Society.

  4. Using item response theory to address vulnerabilities in FFQ.

    Science.gov (United States)

    Kazman, Josh B; Scott, Jonathan M; Deuster, Patricia A

    2017-09-01

    The limitations for self-reporting of dietary patterns are widely recognised as a major vulnerability of FFQ and the dietary screeners/scales derived from FFQ. Such instruments can yield inconsistent results to produce questionable interpretations. The present article discusses the value of psychometric approaches and standards in addressing these drawbacks for instruments used to estimate dietary habits and nutrient intake. We argue that a FFQ or screener that treats diet as a 'latent construct' can be optimised for both internal consistency and the value of the research results. Latent constructs, a foundation for item response theory (IRT)-based scales (e.g. Patient Reported Outcomes Measurement Information System) are typically introduced in the design stage of an instrument to elicit critical factors that cannot be observed or measured directly. We propose an iterative approach that uses such modelling to refine FFQ and similar instruments. To that end, we illustrate the benefits of psychometric modelling by using items and data from a sample of 12 370 Soldiers who completed the 2012 US Army Global Assessment Tool (GAT). We used factor analysis to build the scale incorporating five out of eleven survey items. An IRT-driven assessment of response category properties indicates likely problems in the ordering or wording of several response categories. Group comparisons, examined with differential item functioning (DIF), provided evidence of scale validity across each Army sub-population (sex, service component and officer status). Such an approach holds promise for future FFQ.

  5. Evaluating the quality of medical multiple-choice items created with automated processes.

    Science.gov (United States)

    Gierl, Mark J; Lai, Hollis

    2013-07-01

    Computerised assessment raises formidable challenges because it requires large numbers of test items. Automatic item generation (AIG) can help address this test development problem because it yields large numbers of new items both quickly and efficiently. To date, however, the quality of the items produced using a generative approach has not been evaluated. The purpose of this study was to determine whether automatic processes yield items that meet standards of quality that are appropriate for medical testing. Quality was evaluated firstly by subjecting items created using both AIG and traditional processes to rating by a four-member expert medical panel using indicators of multiple-choice item quality, and secondly by asking the panellists to identify which items were developed using AIG in a blind review. Fifteen items from the domain of therapeutics were created in three different experimental test development conditions. The first 15 items were created by content specialists using traditional test development methods (Group 1 Traditional). The second 15 items were created by the same content specialists using AIG methods (Group 1 AIG). The third 15 items were created by a new group of content specialists using traditional methods (Group 2 Traditional). These 45 items were then evaluated for quality by a four-member panel of medical experts and were subsequently categorised as either Traditional or AIG items. Three outcomes were reported: (i) the items produced using traditional and AIG processes were comparable on seven of eight indicators of multiple-choice item quality; (ii) AIG items can be differentiated from Traditional items by the quality of their distractors, and (iii) the overall predictive accuracy of the four expert medical panellists was 42%. Items generated by AIG methods are, for the most part, equivalent to traditionally developed items from the perspective of expert medical reviewers. While the AIG method produced comparatively fewer plausible

  6. Assessing cross-cultural item bias in questionnaires: Acculturation and the Measurement of Social Support and Family Cohesion for Adolescents

    OpenAIRE

    Hemert, Dianne A. van; Baerveldt, Chris; Vermande, Marjolijn

    2001-01-01

    Amethod is presented for evaluating the presence and size of cross-cultural item biases. The examined items concern parental support and family cohesion in a Likert-type questionnaire for adolescents in The Netherlands. Each evaluated item has two versions, a collectivist and an individualistic one, that measure the same theoretical construct. The standardized difference between the score means of the item versions, called the ?e score, gives an indication of the cultural bias of the item. As...

  7. The Piper Fatigue Scale-12 (PFS-12): psychometric findings and item reduction in a cohort of breast cancer survivors.

    Science.gov (United States)

    Reeve, Bryce B; Stover, Angela M; Alfano, Catherine M; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B; Piper, Barbara F

    2012-11-01

    Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study's primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Breast cancer survivors (n = 799; stages in situ through IIIa; ages 29-86 years) were recruited through three SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has four subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale's content validity, items' relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87 to 0.89, compared to 0.90-0.94 prior to item removal. The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African

  8. Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

    Science.gov (United States)

    Aybek, Eren Can; Demirtasli, R. Nukhet

    2017-01-01

    This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…

  9. The relationship between early changes in the HAMD-17 anxiety/somatization factor items and treatment outcome among depressed outpatients.

    Science.gov (United States)

    Farabaugh, Amy; Mischoulon, David; Fava, Maurizio; Wu, Shirley L; Mascarini, Alessandra; Tossani, Eliana; Alpert, Jonathan E

    2005-03-01

    The 17-item Hamilton Rating Scale for Depression (HAMD-17) Anxiety/Somatization factor includes six items: Anxiety (psychic), Anxiety (somatic), Somatic Symptoms (gastrointestinal), Somatic Symptoms (general), Hypochondriasis and Insight. This study examines the relationship between early changes (defined as those observed between baseline and week 1) in these HAMD-17 Anxiety/Somatization Factor items and treatment outcome among major depressive disorder (MDD) patients who participated in a study comparing the antidepressant efficacy of a standardized extract of hypericum with both placebo and fluoxetine. Following a 1-week, single-blind washout, patients with MDD diagnosed by the Structured Clinical Interview for DSM-IV (SCID) were randomized to 12 weeks of double-blind treatment with hypericum extract (900 mg/day), fluoxetine (20 mg/day) or placebo. The relationship between early changes in HAMD-17 anxiety/somatization factor items and treatment outcome was assessed separately for patients who received study treatment (hypericum or fluoxetine) versus placebo with a logistic regression method. One hundred and thirty-five patients (female 57%, mean age=37.3+/-11.0 years; mean baseline HAMD-17=19.7+/-3.2 years) were randomized to double-blind treatment and were included in the intent-to-treat (ITT) analyses. After adjusting for baseline HAMD-17 scores and for multiple comparisons with the Bonferroni correction, patients who remitted (HAMD-17 score Somatic Symptoms (General) scores than non-remitters. No other significant differences in early changes were noted for the remaining items between remitters versus non-remitters who received active treatment. For patients treated with placebo, early change was not predictive of remission for any of the items after Bonferroni correction. In conclusion, the presence of early improvement on the HAMD-17 item concerning fatigue and general somatic symptoms is significantly predictive of achieving remission at endpoint with

  10. Selecting Items for Criterion-Referenced Tests.

    Science.gov (United States)

    Mellenbergh, Gideon J.; van der Linden, Wim J.

    1982-01-01

    Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)

  11. Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items

    Science.gov (United States)

    Cher Wong, Cheow

    2015-01-01

    Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…

  12. Detection and validation of unscalable item score patterns using Item Response Theory: An illustration with Harter's Self-Perception Profile for Children

    NARCIS (Netherlands)

    Meijer, R.R.; Egberink, I.J.L.; Emons, Wilco H.M.; Sijtsma, Klaas

    2008-01-01

    We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985)Self-Perception Profile

  13. Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

    Science.gov (United States)

    Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

    2016-01-01

    High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…

  14. Human resource management in the delivery of postal items

    Directory of Open Access Journals (Sweden)

    Kujačić Momčilo D.

    2015-01-01

    Full Text Available Delivery of postal items is the last phase in the postal conveyance process. This phase involved up to 57% in total costs of postal items conveyance. In order to reduce the costs of delivery phase, postal organizations apply different methods and techniques. Legal and technological regulations, various restrictions regarding the selection and deployment of employees influence the choice of appropriate methods. Also, the principle of availability of the universal postal service is an essential factor in defining the optimal model. In this paper, the model for assessing and planning of the number of employees in the delivery service observed postal operator has been proposed, with respect to the principles of productivity and accessibility constraints of the universal postal service. This paper will analyze the impact of daily fluctuations in the number of full-time employees and the possibility of hiring a part-time workers in the days with increased traffic volume in the delivery of items, when usually the items from large customers are delivered.

  15. Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

    Science.gov (United States)

    Scheuneman, Janice Dowd; Gerritz, Kalle

    1990-01-01

    Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)

  16. Lawton IADL scale in dementia: can item response theory make it more informative?

    Science.gov (United States)

    McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M

    2014-07-01

    impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.

  17. Psychometric aspects of item mapping for criterion-referenced interpretation and bookmark standard setting.

    Science.gov (United States)

    Huynh, Huynh

    2010-01-01

    Locating an item on an achievement continuum (item mapping) is well-established in technical work for educational/psychological assessment. Applications of item mapping may be found in criterion-referenced (CR) testing (or scale anchoring, Beaton and Allen, 1992; Huynh, 1994, 1998a, 2000a, 2000b, 2006), computer-assisted testing, test form assembly, and in standard setting methods based on ordered test booklets. These methods include the bookmark standard setting originally used for the CTB/TerraNova tests (Lewis, Mitzel, Green, and Patz, 1999), the item descriptor process (Ferrara, Perie, and Johnson, 2002) and a similar process described by Wang (2003) for multiple-choice licensure and certification examinations. While item response theory (IRT) models such as the Rasch and two-parameter logistic (2PL) models traditionally place a binary item at its location, Huynh has argued in the cited papers that such mapping may not be appropriate in selecting items for CR interpretation and scale anchoring.

  18. Using an FSDS-R Item to Screen for Sexually Related Distress: A MsFLASH Analysis

    Science.gov (United States)

    Carpenter, Janet S; Reed, Susan D; Guthrie, Katherine A; Larson, Joseph C; Newton, Katherine M; Lau, R Jane; Learman, Lee A; Shifren, Jan L

    2015-01-01

    Introduction The Female Sexual Distress Scale-Revised (FSDS-R) was created and validated to assess distress associated with impaired sexual function, but it is lengthy for use in clinical practice and research when assessing sexual function is not a primary objective. Aim The study aims to evaluate whether a single item from the FSDS-R could be identified to use to screen midlife women for bothersome diminution in sexual function based on three criteria: (i) highly correlated with total scores; (ii) correlated with commonly assessed domains of female sexual functioning; and (iii) able to differentiate between women reporting high and low sexual concerns during the prior month. Methods Data from 93 midlife women were collected by the Menopause Strategies Finding Lasting Answers to Symptoms and Health (MsFLASH) research network. Main Outcome Measures Women completed the FSDS-R, Female Sexual Function Index (FSFI), and Menopausal Quality of Life Scale (MENQOL). Those who reported a change in the past month on the MENQOL sexual were categorized into a high sexual concerns group, while all others were categorized into a low sexual concerns group. Results Women were an average of 54.6 years old (SD 3.1) and mostly Caucasian (77.4%), college educated (60.2%), married/living as married (64.5%), and postmenopausal (79.6%). The FSDS-R item number 1 “Distressed about sex life” was: (i) highly correlated with FSDS-R total scores (r = 0.90); (ii) moderately correlated with FSFI total scores (r = −0.38) and FSFI desire (r = −0.37) and satisfaction domains (r = −0.40); and (iii) showed one of the largest mean differences between high and low sexual concerns groups (P Guthrie KA, Larson JC, Newton KM, Lau RJ, Learman LA, and Shifren JL. Using an FSDS-R item to screen for sexually related distress: A MsFLASH analysis. Sex Med 2015;3:7–13. PMID:25844170

  19. Item Banking with Embedded Standards

    Science.gov (United States)

    MacCann, Robert G.; Stanley, Gordon

    2009-01-01

    An item banking method that does not use Item Response Theory (IRT) is described. This method provides a comparable grading system across schools that would be suitable for low-stakes testing. It uses the Angoff standard-setting method to obtain item ratings that are stored with each item. An example of such a grading system is given, showing how…

  20. Measuring organizational effectiveness in information and communication technology companies using item response theory.

    Science.gov (United States)

    Trierweiller, Andréa Cristina; Peixe, Blênio César Severo; Tezza, Rafael; Pereira, Vera Lúcia Duarte do Valle; Pacheco, Waldemar; Bornia, Antonio Cezar; de Andrade, Dalton Francisco

    2012-01-01

    The aim of this paper is to measure the effectiveness of the organizations Information and Communication Technology (ICT) from the point of view of the manager, using Item Response Theory (IRT). There is a need to verify the effectiveness of these organizations which are normally associated to complex, dynamic, and competitive environments. In academic literature, there is disagreement surrounding the concept of organizational effectiveness and its measurement. A construct was elaborated based on dimensions of effectiveness towards the construction of the items of the questionnaire which submitted to specialists for evaluation. It demonstrated itself to be viable in measuring organizational effectiveness of ICT companies under the point of view of a manager through using Two-Parameter Logistic Model (2PLM) of the IRT. This modeling permits us to evaluate the quality and property of each item placed within a single scale: items and respondents, which is not possible when using other similar tools.

  1. Prevalence of item level negative symptoms in first episode psychosis diagnoses.

    LENUS (Irish Health Repository)

    Lyne, John

    2012-03-01

    The relevance of negative symptoms across the diagnostic spectrum of the psychoses remains uncertain. The purpose of this study was to report on prevalence of item and subscale level negative symptoms across the first episode psychosis (FEP) diagnostic spectrum in an epidemiological sample, and to ascertain whether items and subscales were more prevalent in a schizophrenia spectrum diagnoses group compared to an \\'all other psychotic diagnoses\\' group. We measured negative symptoms in 330 patients presenting with FEP using the Scale for Assessment of Negative Symptoms (SANS), and ascertained diagnosis using the Structured Clinical Interview for DSM IV. Prevalence of SANS items and subscales were tabulated across all psychotic diagnoses, and logistic regression analysis determined which items and subscales were predictive of schizophrenia spectrum diagnoses. SANS items were most prevalent in schizophrenia spectrum conditions but frequently presented in other FEP diagnoses, particularly substance induced psychotic disorder and Major Depressive Disorder. Brief psychotic disorder and bipolar disorders had low levels of negative symptoms. SANS items and subscales which significantly predicted schizophrenia spectrum diagnoses, were also frequently present in some of the other psychotic diagnoses. Conclusions: SANS items have high prevalence in FEP, and while commonest in schizophrenia spectrum conditions are not restricted to this diagnostic subgroup.

  2. EOQ Model for Delayed Deteriorating Items with Shortages and Trade Credit Policy

    Directory of Open Access Journals (Sweden)

    R Sundararajan

    2015-08-01

    Full Text Available This paper deals with a deterministic inventory model for deteriorating items under the condition of permissible delay in payments with constant demand rate is a function of time which differs from before and after deterioration for a single item. Shortages are allowed and completely backlogged which is a function of time. Under these assumptions, this paper develops a retailer's model for obtaining an optimal cycle length and ordering quantity in deteriorating items of an inventory model. Thus, our objective is retailer's cost minimization problem to nd an optimal replenishment policy under various parameters. The convexity of the objective function is derived and the numerical examples are provided to support the proposed model. Sensitivity analysis of the optimal solution with respect to major parameters of the model is included and the implications are discussed.

  3. P2-19: The Effect of item Repetition on Item-Context Association Depends on the Prior Exposure of Items

    Directory of Open Access Journals (Sweden)

    Hongmi Lee

    2012-10-01

    Full Text Available Previous studies have reported conflicting findings on whether item repetition has beneficial or detrimental effects on source memory. To reconcile such contradictions, we investigated whether the degree of pre-exposure of items can be a potential modulating factor. The experimental procedures spanned two consecutive days. On Day 1, participants were exposed to a set of unfamiliar faces. On Day 2, the same faces presented on the previous day were used again in half of the participants, whereas novel faces were used for the other half. Day 2 procedures consisted of three successive phases: item repetition, source association, and source memory test. In the item repetition phase, half of the face stimuli were repeatedly presented while participants were making male/female judgments. During the source association phase, both the repeated and the unrepeated faces appeared in one of the four locations on the screen. Finally, participants were tested on the location in which a given face was presented during the previous phase and reported the confidence of their memory. Source memory accuracy was measured as the percentage of correct non-guess trials. As results, we found a significant interaction between prior exposure and repetition. Repetition impaired source memory when the items had been pre-exposed on Day 1, while it led to greater accuracy in novel ones. These results show that pre-experimental exposure can modulate the effects of repetition on associative binding between an item and its contextual information, suggesting that pre-existing representation and novelty signal interact to form new episodic memory.

  4. A hierarchy of distress and invariant item ordering in the General Health Questionnaire-12.

    Science.gov (United States)

    Doyle, F; Watson, R; Morgan, K; McBride, O

    2012-06-01

    Invariant item ordering (IIO) is defined as the extent to which items have the same ordering (in terms of item difficulty/severity - i.e. demonstrating whether items are difficult [rare] or less difficult [common]) for each respondent who completes a scale. IIO is therefore crucial for establishing a scale hierarchy that is replicable across samples, but no research has demonstrated IIO in scales of psychological distress. We aimed to determine if a hierarchy of distress with IIO exists in a large general population sample who completed a scale measuring distress. Data from 4107 participants who completed the 12-item General Health Questionnaire (GHQ-12) from the Northern Ireland Health and Social Wellbeing Survey 2005-6 were analysed. Mokken scaling was used to determine the dimensionality and hierarchy of the GHQ-12, and items were investigated for IIO. All items of the GHQ-12 formed a single, strong unidimensional scale (H=0.58). IIO was found for six of the 12 items (H-trans=0.55), and these symptoms reflected the following hierarchy: anhedonia, concentration, participation, coping, decision-making and worthlessness. The cross-sectional analysis needs replication. The GHQ-12 showed a hierarchy of distress, but IIO is only demonstrated for six of the items, and the scale could therefore be shortened. Adopting brief, hierarchical scales with IIO may be beneficial in both clinical and research contexts. Copyright © 2011 Elsevier B.V. All rights reserved.

  5. Subjective assessment of acute mountain sickness: investigating the relationship between the Lake Louise Self-Report, a visual analogue scale and psychological well-being scales.

    Science.gov (United States)

    Frühauf, Anika; Burtscher, Martin; Pocecco, Elena; Faulhaber, Martin; Kopp, Martin

    2016-01-01

    There is an ongoing discussion how to assess acute mountain sickness (AMS) in real life conditions. Next to more-item scales with a cut off like the Lake Louise Self-Report (LLS), some authors suggested to use visual analog scales (VAS) to assess AMS. This study tried to contribute to this question using VAS items used for the Subjective Ratings of Drug Effects, including an additional single item for AMS. Furthermore, we investigated if instruments developed to assess psychological well-being might predict AMS assessed via LLS or VAS. 32 (19 Female) adult persons with known AMS susceptibility filled in questionnaires (Feeling Scale, Felt Arousal Scale, Activation Deactivation Check List, LLS, VAS) at a height of 3650 m above sea level. Correlation and regression analysis suggest a moderate to high relationship between the LLS score and the VAS items, including one VAS item asking for the severity of AMS, as well as psychological well-being. In conclusion, using VAS items to assess AMS can be a more precise alternative to questionnaires like LLS, for people knowledgeable with AMS. Furthermore, researchers should be aware that psychological well-being might be an important parameter influencing the assessment of AMS.

  6. Using an FSDS-R Item to Screen for Sexually Related Distress: A MsFLASH Analysis

    Directory of Open Access Journals (Sweden)

    Janet S. Carpenter, PhD, RN, FAAN

    2015-03-01

    Conclusions: A single FSDS-R item may be a useful screening tool to quickly identify midlife women with sexually related distress when it is not feasible to administer the entire scale, though further validation is warranted. Carpenter JS, Reed SD, Guthrie KA, Larson JC, Newton KM, Lau RJ, Learman LA, and Shifren JL. Using an FSDS-R item to screen for sexually related distress: A MsFLASH analysis. Sex Med 2015;3:7–13.

  7. Development and psychometric evaluation of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions.

    Science.gov (United States)

    Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike

    2018-01-01

    To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.

  8. Identification of metallic items that caused nickel dermatitis in Danish patients.

    Science.gov (United States)

    Thyssen, Jacob P; Menné, Torkil; Johansen, Jeanne D

    2010-09-01

    Nickel allergy is prevalent as assessed by epidemiological studies. In an attempt to further identify and characterize sources that may result in nickel allergy and dermatitis, we analysed items identified by nickel-allergic dermatitis patients as causative of nickel dermatitis by using the dimethylglyoxime (DMG) test. Dermatitis patients with nickel allergy of current relevance were identified over a 2-year period in a tertiary referral patch test centre. When possible, their work tools and personal items were examined with the DMG test. Among 95 nickel-allergic dermatitis patients, 70 (73.7%) had metallic items investigated for nickel release. A total of 151 items were investigated, and 66 (43.7%) gave positive DMG test reactions. Objects were nearly all purchased or acquired after the introduction of the EU Nickel Directive. Only one object had been inherited, and only two objects had been purchased outside of Denmark. DMG testing is valuable as a screening test for nickel release and should be used to identify relevant exposures in nickel-allergic patients. Mainly consumer items, but also work tools used in an occupational setting, released nickel in dermatitis patients. This study confirmed 'risk items' from previous studies, including mobile phones.

  9. Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions

    Directory of Open Access Journals (Sweden)

    Yoon Soo ePark

    2016-02-01

    Full Text Available This study investigates the impact of item parameter drift (IPD on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effect on item parameters and examinee ability.

  10. Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions.

    Science.gov (United States)

    Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan

    2016-01-01

    This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.

  11. Optimizing the Use of Response Times for Item Selection in Computerized Adaptive Testing

    Science.gov (United States)

    Choe, Edison M.; Kern, Justin L.; Chang, Hua-Hua

    2018-01-01

    Despite common operationalization, measurement efficiency of computerized adaptive testing should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response…

  12. Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.

    Science.gov (United States)

    Muraki, Eiji

    1999-01-01

    Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…

  13. Semi-Parametric Item Response Functions in the Context of Guessing. CRESST Report 844

    Science.gov (United States)

    Falk, Carl F.; Cai, Li

    2015-01-01

    We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…

  14. Assessing Impact, DIF, and DFF in Accommodated Item Scores: A Comparison of Multilevel Measurement Model Parameterizations

    Science.gov (United States)

    Beretvas, S. Natasha; Cawthon, Stephanie W.; Lockhart, L. Leland; Kaye, Alyssa D.

    2012-01-01

    This pedagogical article is intended to explain the similarities and differences between the parameterizations of two multilevel measurement model (MMM) frameworks. The conventional two-level MMM that includes item indicators and models item scores (Level 1) clustered within examinees (Level 2) and the two-level cross-classified MMM (in which item…

  15. A MATHEMATICAL MODEL OF THE MILITARY TRANSPORT AIRCRAFT MOVEMENT AT CARGO ITEM DROP

    Directory of Open Access Journals (Sweden)

    2016-01-01

    Full Text Available The controllability of military transport aircraft deteriorates at heavy single piece landing. To solve this problem and a specific methodology for pilotage of the pre-emption, and automation tools are being developed. Preliminary study ofpilotage technique and authomatic control algorythm demand a reliable mathematical model of aircraft dynamics at cargo item drop. Such model should take into account significant change in the position of the aircraft center of mass and aircraft inertia tensor. Simplified models were based on modeling the movement of the center of mass and rotation around the cen- ter of mass of the aircraft. Such models do not take into account the inertial forces and moments of moving a cargo item. This circumstance does not allow to obtain reliable results in the simulation. The article presents the description of the complete mathematical model of the movement of military transport aircraft in landing of a cargo item. Examines the com- plex material system of solids and a detailed description of the properties of its components. The equations of motion of the aircraft as a system carrier (aircraft without a cargo item and wear (of moving a cargo item bodies to reflect the changes in the inertia tensor. The functioning of the power plant, steering actuators, flight control system, an exhaust chute, the sen- sors of the primary information are taken into account. The equations of motion for systems of bodies projected on the air- craft reference plane are being recorded. This approach takes into account changes of the inertia tensor and the position of the main central axes of inertia in the process of landing of a cargo item. It allows us to simulate the condition of the air- craft at all speeds of the pitch, normal overload, and masses of single piece and placement, as evidenced by the high con- vergence of modeling results with data from flight tests.

  16. A single bout of resistance exercise can enhance episodic memory performance.

    Science.gov (United States)

    Weinberg, Lisa; Hasni, Anita; Shinohara, Minoru; Duarte, Audrey

    2014-11-01

    Acute aerobic exercise can be beneficial to episodic memory. This benefit may occur because exercise produces a similar physiological response as physical stressors. When administered during consolidation, acute stress, both physical and psychological, consistently enhances episodic memory, particularly memory for emotional materials. Here we investigated whether a single bout of resistance exercise performed during consolidation can produce episodic memory benefits 48 h later. We used a one-leg knee extension/flexion task for the resistance exercise. To assess the physiological response to the exercise, we measured salivary alpha amylase (a biomarker of central norepinephrine), heart rate, and blood pressure. To test emotional episodic memory, we used a remember-know recognition memory paradigm with equal numbers of positive, negative, and neutral IAPS images as stimuli. The group that performed the exercise, the active group, had higher overall recognition accuracy than the group that did not exercise, the passive group. We found a robust effect of valence across groups, with better performance on emotional items as compared to neutral items and no difference between positive and negative items. This effect changed based on the physiological response to the exercise. Within the active group, participants with a high physiological response to the exercise were impaired for neutral items as compared to participants with a low physiological response to the exercise. Our results demonstrate that a single bout of resistance exercise performed during consolidation can enhance episodic memory and that the effect of valence on memory depends on the physiological response to the exercise. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Applying automatic item generation to create cohesive physics testlets

    Science.gov (United States)

    Mindyarto, B. N.; Nugroho, S. E.; Linuwih, S.

    2018-03-01

    Computer-based testing has created the demand for large numbers of items. This paper discusses the production of cohesive physics testlets using an automatic item generation concepts and procedures. The testlets were composed by restructuring physics problems to reveal deeper understanding of the underlying physical concepts by inserting a qualitative question and its scientific reasoning question. A template-based testlet generator was used to generate the testlet variants. Using this methodology, 1248 testlet variants were effectively generated from 25 testlet templates. Some issues related to the effective application of the generated physics testlets in practical assessments were discussed.

  18. Using Automated Processes to Generate Test Items And Their Associated Solutions and Rationales to Support Formative Feedback

    Directory of Open Access Journals (Sweden)

    Mark Gierl

    2015-08-01

    Full Text Available Automatic item generation is the process of using item models to produce assessment tasks using computer technology. An item model is similar to a template that highlights the elements in the task that must be manipulated to produce new items. The purpose of our study is to describe an innovative method for generating large numbers of diverse and heterogeneous items along with their solutions and associated rationales to support formative feedback. We demonstrate the method by generating items in two diverse content areas, mathematics and nonverbal reasoning

  19. Quick assessment of hopelessness: a cross-sectional study

    Directory of Open Access Journals (Sweden)

    Cheung Yin

    2006-03-01

    Full Text Available Abstract Background Lengthy questionnaires reduce data quality and impose a burden on respondents. Previous researchers proposed that a single item ("My future seems dark to me" and a 4-item component of the Beck's Hopelessness Scale (BHS can summarise most of the information the BHS provides. There is no clear indication of what BHS cutoff values are useful in identifying people with suicide tendency. Methods In a population-based study of Chinese people aged between 15 and 59 in Hong Kong, the Chinese version of the BHS and the Centre for Epidemiologic Studies – Depression scale were administered by trained interviewers and suicidal ideation and suicidal attempts were self-reported. Receiver operating characteristics curve analysis and regression analysis were used to compare the performance of the BHS and its components in identifying people with suicidality and depression. Smoothed level of suicidal tendency was assessed in relation to scores on the BHS and its component to identify thresholds. Results It is found that the 4-item component and, to a lesser extent, the single item of the BHS perform in ways similar to the BHS. There are non-linear relationship between suicidality and scores on the BHS and the 4-item component; cutoff values identified accordingly have sensitivity and specificity of about 65%. Conclusion The 4-item component is a useful alternative to the BHS. Shortening of psycho-social measurement scales should be considered in order to reduce burden on patients or respondents and to improve response rate.

  20. Psychometric properties of the PROMIS Physical Function item bank in patients receiving physical therapy.

    Directory of Open Access Journals (Sweden)

    Martine H P Crins

    Full Text Available The Patient-Reported Outcomes Measurement Information System (PROMIS is a universally applicable set of instruments, including item banks, short forms and computer adaptive tests (CATs, measuring patient-reported health across different patient populations. PROMIS CATs are highly efficient and the use in practice is considered feasible with little administration time, offering standardized and routine patient monitoring. Before an item bank can be used as CAT, the psychometric properties of the item bank have to be examined. Therefore, the objective was to assess the psychometric properties of the Dutch-Flemish PROMIS Physical Function item bank (DF-PROMIS-PF in Dutch patients receiving physical therapy.Cross-sectional study.805 patients >18 years, who received any kind of physical therapy in primary care in the past year, completed the full DF-PROMIS-PF (121 items.Unidimensionality was examined by Confirmatory Factor Analysis and local dependence and monotonicity were evaluated. A Graded Response Model was fitted. Construct validity was examined with correlations between DF-PROMIS-PF T-scores and scores on two legacy instruments (SF-36 Health Survey Physical Functioning scale [SF36-PF10] and the Health Assessment Questionnaire Disability-Index [HAQ-DI]. Reliability (standard errors of theta was assessed.The results for unidimensionality were mixed (scaled CFI = 0.924, TLI = 0.923, RMSEA = 0.045, 1th factor explained 61.5% of variance. Some local dependence was found (8.2% of item pairs. The item bank showed a broad coverage of the physical function construct (threshold-parameters range: -4.28-2.33 and good construct validity (correlation with SF36-PF10 = 0.84 and HAQ-DI = -0.85. Furthermore, the DF-PROMIS-PF showed greater reliability over a broader score-range than the SF36-PF10 and HAQ-DI.The psychometric properties of the DF-PROMIS-PF item bank are sufficient. The DF-PROMIS-PF can now be used as short forms or CAT to measure the level of

  1. Developing and testing items for the South African Personality Inventory (SAPI

    Directory of Open Access Journals (Sweden)

    Carin Hill

    2013-11-01

    Research purpose: This article reports on the process of identifying items for, and provides a quantitative evaluation of, the South African Personality Inventory (SAPI items. Motivation for the study: The study intended to develop an indigenous and psychometrically sound personality instrument that adheres to the requirements of South African legislation and excludes cultural bias. Research design, approach and method: The authors used a cross-sectional design. They measured the nine SAPI clusters identified in the qualitative stage of the SAPI project in 11 separate quantitative studies. Convenience sampling yielded 6735 participants. Statistical analysis focused on the construct validity and reliability of items. The authors eliminated items that showed poor performance, based on common psychometric criteria, and selected the best performing items to form part of the final version of the SAPI. Main findings: The authors developed 2573 items from the nine SAPI clusters. Of these, 2268 items were valid and reliable representations of the SAPI facets. Practical/managerial implications: The authors developed a large item pool. It measures personality in South Africa. Researchers can refine it for the SAPI. Furthermore, the project illustrates an approach that researchers can use in projects that aim to develop culturally-informed psychological measures. Contribution/value-add: Personality assessment is important for recruiting, selecting and developing employees. This study contributes to the current knowledge about the early processes researchers follow when they develop a personality instrument that measures personality fairly in different cultural groups, as the SAPI does.

  2. Ethical imperatives against item restriction in the Supplemental Nutrition Assistance Program.

    Science.gov (United States)

    Chrisinger, Benjamin W

    2017-07-01

    The Supplemental Nutrition Assistance Program (SNAP, formerly known as food stamps) is the federal government's largest form of food assistance, and a frequent focus of political and scholarly debate. Previous discourse in the public health community and recent proposals in state legislatures have suggested limiting the use of SNAP benefits on unhealthy food items, such as sugar-sweetened beverages (SSBs). This paper identifies two possible underlying motivations for item restriction, health and morals, and analyzes the level of empirical support for claims about the current state of the program, as well as expectations about how item restriction would change participant outcomes. It also assesses how item restriction would reduce individual agency of low-income individuals, and identifies mechanisms by which this may adversely affect program participants. Finally, this paper offers alternative policies to promote healthier purchasing and eating among SNAP participants that can be pursued without reducing individual agency. Health advocates and officials must more fully weigh the attendant risks of implementing SNAP item restrictions, including the reduction of individual agency of a vulnerable population. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

    Science.gov (United States)

    Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

    2015-01-01

    A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is…

  4. A single-system model predicts recognition memory and repetition priming in amnesia.

    Science.gov (United States)

    Berry, Christopher J; Kessels, Roy P C; Wester, Arie J; Shanks, David R

    2014-08-13

    We challenge the claim that there are distinct neural systems for explicit and implicit memory by demonstrating that a formal single-system model predicts the pattern of recognition memory (explicit) and repetition priming (implicit) in amnesia. In the current investigation, human participants with amnesia categorized pictures of objects at study and then, at test, identified fragmented versions of studied (old) and nonstudied (new) objects (providing a measure of priming), and made a recognition memory judgment (old vs new) for each object. Numerous results in the amnesic patients were predicted in advance by the single-system model, as follows: (1) deficits in recognition memory and priming were evident relative to a control group; (2) items judged as old were identified at greater levels of fragmentation than items judged new, regardless of whether the items were actually old or new; and (3) the magnitude of the priming effect (the identification advantage for old vs new items) overall was greater than that of items judged new. Model evidence measures also favored the single-system model over two formal multiple-systems models. The findings support the single-system model, which explains the pattern of recognition and priming in amnesia primarily as a reduction in the strength of a single dimension of memory strength, rather than a selective explicit memory system deficit. Copyright © 2014 the authors 0270-6474/14/3410963-12$15.00/0.

  5. Racial differences in hypertension knowledge: effects of differential item functioning.

    Science.gov (United States)

    Ayotte, Brian J; Trivedi, Ranak; Bosworth, Hayden B

    2009-01-01

    Health-related knowledge is an important component in the self-management of chronic illnesses. The objective of this study was to more accurately assess racial differences in hypertension knowledge by using a latent variable modeling approach that controlled for sociodemographic factors and accounted for measurement issues in the assessment of hypertension knowledge. Cross-sectional data from 1,177 participants (45% African American; 35% female) were analyzed using a multiple indicator multiple causes (MIMIC) modeling approach. Available sociodemographic data included race, education, sex, financial status, and age. All participants completed six items on a hypertension knowledge questionnaire. Overall, the final model suggested that females, Whites, and patients with at least a high school diploma had higher latent knowledge scores than males, African Americans, and patients with less than a high school diploma, respectively. The model also detected differential item functioning (DIF) based on race for two of the items. Specifically, the error rate for African Americans was lower than would be expected given the lower level of latent knowledge on the items, on the questions related to: (a) the association between high blood pressure and kidney disease, and (b) the increased risk African Americans have for developing hypertension. Not accounting for DIF resulted in the difference between Whites and African Americans to be underestimated. These results are discussed in the context of the need for careful measurement of health-related constructs, and how measurement-related issues can result in an inaccurate estimation of racial differences in hypertension knowledge.

  6. Development of six PROMIS pediatrics proxy-report item banks.

    Science.gov (United States)

    Irwin, Debra E; Gross, Heather E; Stucky, Brian D; Thissen, David; DeWitt, Esi Morgan; Lai, Jin Shei; Amtmann, Dagmar; Khastou, Leyla; Varni, James W; DeWalt, Darren A

    2012-02-22

    Pediatric self-report should be considered the standard for measuring patient reported outcomes (PRO) among children. However, circumstances exist when the child is too young, cognitively impaired, or too ill to complete a PRO instrument and a proxy-report is needed. This paper describes the development process including the proxy cognitive interviews and large-field-test survey methods and sample characteristics employed to produce item parameters for the Patient Reported Outcomes Measurement Information System (PROMIS) pediatric proxy-report item banks. The PROMIS pediatric self-report items were converted into proxy-report items before undergoing cognitive interviews. These items covered six domains (physical function, emotional distress, social peer relationships, fatigue, pain interference, and asthma impact). Caregivers (n = 25) of children ages of 5 and 17 years provided qualitative feedback on proxy-report items to assess any major issues with these items. From May 2008 to March 2009, the large-scale survey enrolled children ages 8-17 years to complete the self-report version and caregivers to complete the proxy-report version of the survey (n = 1548 dyads). Caregivers of children ages 5 to 7 years completed the proxy report survey (n = 432). In addition, caregivers completed other proxy instruments, PedsQL™ 4.0 Generic Core Scales Parent Proxy-Report version, PedsQL™ Asthma Module Parent Proxy-Report version, and KIDSCREEN Parent-Proxy-52. Item content was well understood by proxies and did not require item revisions but some proxies clearly noted that determining an answer on behalf of their child was difficult for some items. Dyads and caregivers of children ages 5-17 years old were enrolled in the large-scale testing. The majority were female (85%), married (70%), Caucasian (64%) and had at least a high school education (94%). Approximately 50% had children with a chronic health condition, primarily asthma, which was diagnosed or treated within 6

  7. Bad Questions: An Essay Involving Item Response Theory

    Science.gov (United States)

    Thissen, David

    2016-01-01

    David Thissen, a professor in the Department of Psychology and Neuroscience, Quantitative Program at the University of North Carolina, has consulted and served on technical advisory committees for assessment programs that use item response theory (IRT) over the past couple decades. He has come to the conclusion that there are usually two purposes…

  8. Longitudinal investigation of source memory reveals different developmental trajectories for item memory and binding

    OpenAIRE

    Riggins, Tracy

    2013-01-01

    The present study used a cohort-sequential design to examine developmental changes in children's ability to bind items in memory during early and middle childhood. Three cohorts of children (aged 4, 6, or 8 years) were followed longitudinally for three years. Each year, children completed a source memory paradigm assessing memory for items and binding. Results suggest linear increases in memory for individual items (facts or sources) between 4 and 10 years of age, but that memory for correct ...

  9. Effects of Learning Experience on Forgetting Rates of Item and Associative Memories

    Science.gov (United States)

    Yang, Jiongjiong; Zhan, Lexia; Wang, Yingying; Du, Xiaoya; Zhou, Wenxi; Ning, Xueling; Sun, Qing; Moscovitch, Morris

    2016-01-01

    Are associative memories forgotten more quickly than item memories, and does the level of original learning differentially influence forgetting rates? In this study, we addressed these questions by having participants learn single words and word pairs once (Experiment 1), three times (Experiment 2), and six times (Experiment 3) in a massed…

  10. Assessing cross-cultural item bias in questionnaires : Acculturation and the Measurement of Social Support and Family Cohesion for Adolescents

    NARCIS (Netherlands)

    Hemert, Dianne A. van; Baerveldt, Chris; Vermande, Marjolijn

    2001-01-01

    Amethod is presented for evaluating the presence and size of cross-cultural item biases. The examined items concern parental support and family cohesion in a Likert-type questionnaire for adolescents in The Netherlands. Each evaluated item has two versions, a collectivist and an individualistic one,

  11. The Technical Quality of Test Items Generated Using a Systematic Approach to Item Writing.

    Science.gov (United States)

    Siskind, Theresa G.; Anderson, Lorin W.

    The study was designed to examine the similarity of response options generated by different item writers using a systematic approach to item writing. The similarity of response options to student responses for the same item stems presented in an open-ended format was also examined. A non-systematic (subject matter expertise) approach and a…

  12. Validity and Reliability of the 8-Item Work Limitations Questionnaire.

    Science.gov (United States)

    Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

    2017-12-01

    Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.

  13. The Influence of Task Demands, Verbal Ability and Executive Functions on Item and Source Memory in Autism Spectrum Disorder

    Science.gov (United States)

    Semino, Sara; Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.

    2018-01-01

    Autism Spectrum Disorder (ASD) is generally associated with difficulties in contextual source memory but not single item memory. There are surprising inconsistencies in the literature, however, that the current study seeks to address by examining item and source memory in age and ability matched groups of 22 ASD and 21 comparison adults. Results…

  14. The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency.

    Science.gov (United States)

    Rose, Matthias; Bjorner, Jakob B; Gandek, Barbara; Bruce, Bonnie; Fries, James F; Ware, John E

    2014-05-01

    To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n>2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range. Copyright © 2014. Published by Elsevier Inc.

  15. Conditioning factors of test-taking engagement in PIAAC: an exploratory IRT modelling approach considering person and item characteristics

    Directory of Open Access Journals (Sweden)

    Frank Goldhammer

    2017-11-01

    Full Text Available Abstract Background A potential problem of low-stakes large-scale assessments such as the Programme for the International Assessment of Adult Competencies (PIAAC is low test-taking engagement. The present study pursued two goals in order to better understand conditioning factors of test-taking disengagement: First, a model-based approach was used to investigate whether item indicators of disengagement constitute a continuous latent person variable by domain. Second, the effects of person and item characteristics were jointly tested using explanatory item response models. Methods Analyses were based on the Canadian sample of Round 1 of the PIAAC, with N = 26,683 participants completing test items in the domains of literacy, numeracy, and problem solving. Binary item disengagement indicators were created by means of item response time thresholds. Results The results showed that disengagement indicators define a latent dimension by domain. Disengagement increased with lower educational attainment, lower cognitive skills, and when the test language was not the participant’s native language. Gender did not exert any effect on disengagement, while age had a positive effect for problem solving only. An item’s location in the second of two assessment modules was positively related to disengagement, as was item difficulty. The latter effect was negatively moderated by cognitive skill, suggesting that poor test-takers are especially likely to disengage with more difficult items. Conclusions The negative effect of cognitive skill, the positive effect of item difficulty, and their negative interaction effect support the assumption that disengagement is the outcome of individual expectations about success (informed disengagement.

  16. Development and psychometric analysis of the Brief DSM-5 Alcohol Use Disorder Diagnostic Assessment: Towards effective diagnosis in college students.

    Science.gov (United States)

    Hagman, Brett T

    2017-11-01

    The Diagnostic and Statistical Manual of Mental Disorders (5th edition) Alcohol Use Disorder (DSM-5 AUD) criteria have been modified to reflect a single, continuous disorder. It is critical that we develop brief assessment measures that can accurately assess for DSM-5 AUD criteria in college students to assist in screening, referral, and brief intervention services implemented on college campuses. The present study sought to develop and assess for the psychometric properties of a brief 13-item measure designed to capture the full spectrum of the DSM-5 AUD criteria in a sample of college students. Participants were past-year drinkers (N = 923) between the ages of 18 to 30 enrolled at 3 universities. Respondents completed a 30-min anonymous battery of questionnaires online. The Brief DSM-5 AUD Assessment consisted of 13 items designed to reflect the DSM-5 AUD criteria. Results indicated a high degree of internal consistency reliability with high item-to-scale correlations. Confirmatory factor analyses indicated that a dominant single factor emerged with good model fit. The Item Response Theory (IRT) analyses indicated that the difficulty parameters for each criterion were intermixed along the upper portion of the underlying AUD severity continuum, and the discrimination parameters were all high. Additional analysis indicated that those with a DSM-5 AUD had greater levels of alcohol and other drug use and problem severity in comparison to those without a DSM-5 AUD. Study findings provide empirical support for the reliability and validity of the Brief 13-item DSM-5 Assessment. It should be routinely included into research and clinical practice efforts. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  17. Qualitative Development and Content Validation of the PROMIS Pediatric Sleep Health Items.

    Science.gov (United States)

    Bevans, Katherine B; Meltzer, Lisa J; De La Motte, Anna; Kratchman, Amy; Viél, Dominique; Forrest, Christopher B

    2018-04-25

    To develop the Patient Reported Outcome Measurement Information System (PROMIS) Pediatric Sleep Health item pool and evaluate its content validity. Participants included 8 expert sleep clinician-researchers, 64 children ages 8-17 years, and 54 parents of children ages 5-17 years. We started with item concepts and expressions from the PROMIS Sleep Disturbance and Sleep Related Impairment adult measures. Additional pediatric sleep health concepts were generated by expert (n = 8), child (n = 28), and parent (n = 33) concept elicitation interviews and a systematic review of existing pediatric sleep health questionnaires. Content validity of the item pool was evaluated with item translatability review, readability analysis, and child (n = 36) and parent (n = 21) cognitive interviews. The final pediatric Sleep Health item pool includes 43 items that assess sleep disturbance (children's capacity to fall and stay asleep, sleep quality, dreams, and parasomnias) and sleep-related impairments (daytime sleepiness, low energy, difficulty waking up, and the impact of sleep and sleepiness on cognition, affect, behavior, and daily activities). Items are translatable and relevant and well understood by children ages 8-17 and parents of children ages 5-17. Rigorous qualitative procedures were used to develop and evaluate the content validity of the PROMIS Pediatric Sleep Health item pool. Once the item pool's psychometric properties are established, the scales will be useful for measuring children's subjective experiences of sleep.

  18. Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.

    Science.gov (United States)

    Shou, Yiyun; Sellbom, Martin; Xu, Jing

    2018-05-01

    There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  19. Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

    Science.gov (United States)

    Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

    2018-02-01

    The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Psychometric evaluation of Persian Nomophobia Questionnaire: Differential item functioning and measurement invariance across gender.

    Science.gov (United States)

    Lin, Chung-Ying; Griffiths, Mark D; Pakpour, Amir H

    2018-03-01

    Background and aims Research examining problematic mobile phone use has increased markedly over the past 5 years and has been related to "no mobile phone phobia" (so-called nomophobia). The 20-item Nomophobia Questionnaire (NMP-Q) is the only instrument that assesses nomophobia with an underlying theoretical structure and robust psychometric testing. This study aimed to confirm the construct validity of the Persian NMP-Q using Rasch and confirmatory factor analysis (CFA) models. Methods After ensuring the linguistic validity, Rasch models were used to examine the unidimensionality of each Persian NMP-Q factor among 3,216 Iranian adolescents and CFAs were used to confirm its four-factor structure. Differential item functioning (DIF) and multigroup CFA were used to examine whether males and females interpreted the NMP-Q similarly, including item content and NMP-Q structure. Results Each factor was unidimensional according to the Rach findings, and the four-factor structure was supported by CFA. Two items did not quite fit the Rasch models (Item 14: "I would be nervous because I could not know if someone had tried to get a hold of me;" Item 9: "If I could not check my smartphone for a while, I would feel a desire to check it"). No DIF items were found across gender and measurement invariance was supported in multigroup CFA across gender. Conclusions Due to the satisfactory psychometric properties, it is concluded that the Persian NMP-Q can be used to assess nomophobia among adolescents. Moreover, NMP-Q users may compare its scores between genders in the knowledge that there are no score differences contributed by different understandings of NMP-Q items.

  1. Sharing the cost of redundant items

    DEFF Research Database (Denmark)

    Hougaard, Jens Leth; Moulin, Hervé

    2014-01-01

    We ask how to share the cost of finitely many public goods (items) among users with different needs: some smaller subsets of items are enough to serve the needs of each user, yet the cost of all items must be covered, even if this entails inefficiently paying for redundant items. Typical examples...... are network connectivity problems when an existing (possibly inefficient) network must be maintained. We axiomatize a family cost ratios based on simple liability indices, one for each agent and for each item, measuring the relative worth of this item across agents, and generating cost allocation rules...... additive in costs....

  2. Assessing the specificity of posttraumatic stress disorder's dysphoric items within the dysphoria model.

    Science.gov (United States)

    Armour, Cherie; Shevlin, Mark

    2013-10-01

    The factor structure of posttraumatic stress disorder (PTSD) currently used by the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV), has received limited support. A four-factor dysphoria model is widely supported. However, the dysphoria factor of this model has been hailed as a nonspecific factor of PTSD. The present study investigated the specificity of the dysphoria factor within the dysphoria model by conducting a confirmatory factor analysis while statistically controlling for the variance attributable to depression. The sample consisted of 429 individuals who met the diagnostic criteria for PTSD in the National Comorbidity Survey. The results concluded that there was no significant attenuation in any of the PTSD items. This finding is pertinent given several proposals for the removal of dysphoric items from the diagnostic criteria set of PTSD in the upcoming DSM-5.

  3. Instemmingsgeneigdheid en verskillende item- en responsformate in 'n gesommeerde selfbeoordelingskaal

    Directory of Open Access Journals (Sweden)

    Nadene Hanekom

    1998-06-01

    Full Text Available This study examines the degree of acquiescence present when the item and response formats of a summated rating scale are varied. It is often recommended that acquiescence response bias in rating scales may be controlled by using both positively and negatively worded items. Such items are generally worded in the Likert-type format of statements. The purpose of the study was to establish whether items in question format would result in a smaller degree of acquiescence than items worded as statements. the response format was also varied (five- and seven-point options to determine whether this would influence the reliability and degree of acquiescence in the scales. A twenty-item Locus of Control (LC questionnaire was used, but each item was complemented by its opposite, resulting in 40 items. The subjects, divided randomly into two groups, were second year students who had to complete four versions of the questionnaire, plus a shortened version of Bass's scale for measuring acquiescence. The LC version were questions or statements each combined with a five- or seven-point respons format. Partial counterbalancing was introduced by testing on two separate occasions, presenting the tests to the two groups in the opposite order. The degree of acquiescence was assessed by correlating the items with their opposite, and by correlating scores on each version with scores on the acquiescence questionnaire. No major difference were found between the various item and response format in relation to acquiescence. Opsomming Hierdie ondersoek is uitgevoer om te bepaal of die mate van instemmingsgeneigdheid deur die item- en responsformaat van 'n gesommeerde selfbeoordelingskaal beinvloed word. Daar word dikwels aanbeveel dat die gebruik van positief- sowel as negatiefbewoorde items in 'n vraelys instemmingsgeneigdheid beperk. Suike items word gewoonlik in die tradisionele Likertformaat as stellings geformuleer. Die doel van die ondersoek was om te bepaal of items

  4. Addressing challenges in single species assessments via a simple state-space assessment model

    DEFF Research Database (Denmark)

    Nielsen, Anders

    Single-species and age-structured fish stock assessments still remains the main tool for managing fish stocks. A simple state-space assessment model is presented as an alternative to (semi) deterministic procedures and the full parametric statistical catch at age models. It offers a solution...... to some of the key challenges of these models. Compared to the deterministic procedures it solves a list of problems originating from falsely assuming that age classified catches are known without errors and allows quantification of uncertainties of estimated quantities of interest. Compared to full...

  5. Re-Examining Test Item Issues in the TIMSS Mathematics and Science Assessments

    Science.gov (United States)

    Wang, Jianjun

    2011-01-01

    As the largest international study ever taken in history, the Trend in Mathematics and Science Study (TIMSS) has been held as a benchmark to measure U.S. student performance in the global context. In-depth analyses of the TIMSS project are conducted in this study to examine key issues of the comparative investigation: (1) item flaws in mathematics…

  6. A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.

    Science.gov (United States)

    Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily

    2018-02-23

    The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.

  7. The randomly renewed general item and the randomly inspected item with exponential life distribution

    International Nuclear Information System (INIS)

    Schneeweiss, W.G.

    1979-01-01

    For a randomly renewed item the probability distributions of the time to failure and of the duration of down time and the expectations of these random variables are determined. Moreover, it is shown that the same theory applies to randomly checked items with exponential probability distribution of life such as electronic items. The case of periodic renewals is treated as an example. (orig.) [de

  8. Reliability and construct validity of the Spanish version of the 6-item CTS symptoms scale for outcomes assessment in carpal tunnel syndrome.

    Science.gov (United States)

    Rosales, Roberto S; Martin-Hidalgo, Yolanda; Reboso-Morales, Luis; Atroshi, Isam

    2016-03-03

    The purpose of this study was to assess the reliability and construct validity of the Spanish version of the 6-item carpal tunnel syndrome (CTS) symptoms scale (CTS-6). In this cross-sectional study 40 patients diagnosed with CTS based on clinical and neurophysiologic criteria, completed the standard Spanish versions of the CTS-6 and the disabilities of the arm, shoulder and hand (QuickDASH) scales on two occasions with a 1-week interval. Internal-consistency reliability was assessed with the Cronbach alpha coefficient and test-retest reliability with the intraclass correlation coefficient, two way random effect model and absolute agreement definition (ICC2,1). Cross-sectional precision was analyzed with the Standard Error of the Measurement (SEM). Longitudinal precision for test-retest reliability coefficient was assessed with the Standard Error of the Measurement difference (SEMdiff) and the Minimal Detectable Change at 95 % confidence level (MDC95). For assessing construct validity it was hypothesized that the CTS-6 would have a strong positive correlation with the QuickDASH, analyzed with the Pearson correlation coefficient (r). The standard Spanish version of the CTS-6 presented a Cronbach alpha of 0.81 with a SEM of 0.3. Test-retest reliability showed an ICC of 0.85 with a SRMdiff of 0.36 and a MDC95 of 0.7. The correlation between CTS-6 and the QuickDASH was concordant with the a priori formulated construct hypothesis (r 0.69) CONCLUSIONS: The standard Spanish version of the 6-item CTS symptoms scale showed good internal consistency, test-retest reliability and construct validity for outcomes assessment in CTS. The CTS-6 will be useful to clinicians and researchers in Spanish speaking parts of the world. The use of standardized outcome measures across countries also will facilitate comparison of research results in carpal tunnel syndrome.

  9. Why sample selection matters in exploratory factor analysis: implications for the 12-item World Health Organization Disability Assessment Schedule 2.0.

    Science.gov (United States)

    Gaskin, Cadeyrn J; Lambert, Sylvie D; Bowe, Steven J; Orellana, Liliana

    2017-03-11

    Sample selection can substantially affect the solutions generated using exploratory factor analysis. Validation studies of the 12-item World Health Organization (WHO) Disability Assessment Schedule 2.0 (WHODAS 2.0) have generally involved samples in which substantial proportions of people had no, or minimal, disability. With the WHODAS 2.0 oriented towards measuring disability across six life domains (cognition, mobility, self-care, getting along, life activities, and participation in society), performing factor analysis with samples of people with disability may be more appropriate. We determined the influence of the sampling strategy on (a) the number of factors extracted and (b) the factor structure of the WHODAS 2.0. Using data from adults aged 50+ from the six countries in Wave 1 of the WHO's longitudinal Study on global AGEing and adult health (SAGE), we repeatedly selected samples (n = 750) using two strategies: (1) simple random sampling that reproduced nationally representative distributions of WHODAS 2.0 summary scores for each country (i.e., positively skewed distributions with many zero scores indicating the absence of disability), and (2) stratified random sampling with weights designed to obtain approximately symmetric distributions of summary scores for each country (i.e. predominantly including people with varying degrees of disability). Samples with skewed distributions typically produced one-factor solutions, except for the two countries with the lowest percentages of zero scores, in which the majority of samples produced two factors. Samples with approximately symmetric distributions, generally produced two- or three-factor solutions. In the two-factor solutions, the getting along domain items loaded on one factor (commonly with a cognition domain item), with remaining items loading on a second factor. In the three-factor solutions, the getting along and self-care domain items loaded separately on two factors and three other domains

  10. Recommendations to improve the positive and negative syndrome scale (PANSS) based on item response theory.

    Science.gov (United States)

    Levine, Stephen Z; Rabinowitz, Jonathan; Rizopoulos, Dimitris

    2011-08-15

    The adequacy of the Positive and Negative Syndrome Scale (PANSS) items in measuring symptom severity in schizophrenia was examined using Item Response Theory (IRT). Baseline PANSS assessments were analyzed from two multi-center clinical trials of antipsychotic medication in chronic schizophrenia (n=1872). Generally, the results showed that the PANSS (a) item ratings discriminated symptom severity best for the negative symptoms; (b) has an excess of "Severe" and "Extremely severe" rating options; and (c) assessments are more reliable at medium than very low or high levels of symptom severity. Analysis also showed that the detection of statistically and non-statistically significant differences in treatment were highly similar for the original and IRT-modified PANSS. In clinical trials of chronic schizophrenia, the PANSS appears to require the following modifications: fewer rating options, adjustment of 'Lack of judgment and insight', and improved severe symptom assessment. 2011 Elsevier Ltd. All rights reserved.

  11. Validation of the MOS Social Support Survey 6-item (MOS-SSS-6) measure with two large population-based samples of Australian women.

    Science.gov (United States)

    Holden, Libby; Lee, Christina; Hockey, Richard; Ware, Robert S; Dobson, Annette J

    2014-12-01

    This study aimed to validate a 6-item 1-factor global measure of social support developed from the Medical Outcomes Study Social Support Survey (MOS-SSS) for use in large epidemiological studies. Data were obtained from two large population-based samples of participants in the Australian Longitudinal Study on Women's Health. The two cohorts were aged 53-58 and 28-33 years at data collection (N = 10,616 and 8,977, respectively). Items selected for the 6-item 1-factor measure were derived from the factor structure obtained from unpublished work using an earlier wave of data from one of these cohorts. Descriptive statistics, including polychoric correlations, were used to describe the abbreviated scale. Cronbach's alpha was used to assess internal consistency and confirmatory factor analysis to assess scale validity. Concurrent validity was assessed using correlations between the new 6-item version and established 19-item version, and other concurrent variables. In both cohorts, the new 6-item 1-factor measure showed strong internal consistency and scale reliability. It had excellent goodness-of-fit indices, similar to those of the established 19-item measure. Both versions correlated similarly with concurrent measures. The 6-item 1-factor MOS-SSS measures global functional social support with fewer items than the established 19-item measure.

  12. Longitudinal Investigation of Source Memory Reveals Different Developmental Trajectories for Item Memory and Binding

    Science.gov (United States)

    Riggins, Tracy

    2014-01-01

    The present study used a cohort-sequential design to examine developmental changes in children's ability to bind items in memory during early and middle childhood. Three cohorts of children (aged 4, 6, or 8 years) were followed longitudinally for 3 years. Each year, children completed a source memory paradigm assessing memory for items and…

  13. Development of six PROMIS pediatrics proxy-report item banks

    Directory of Open Access Journals (Sweden)

    Irwin Debra E

    2012-02-01

    Full Text Available Abstract Background Pediatric self-report should be considered the standard for measuring patient reported outcomes (PRO among children. However, circumstances exist when the child is too young, cognitively impaired, or too ill to complete a PRO instrument and a proxy-report is needed. This paper describes the development process including the proxy cognitive interviews and large-field-test survey methods and sample characteristics employed to produce item parameters for the Patient Reported Outcomes Measurement Information System (PROMIS pediatric proxy-report item banks. Methods The PROMIS pediatric self-report items were converted into proxy-report items before undergoing cognitive interviews. These items covered six domains (physical function, emotional distress, social peer relationships, fatigue, pain interference, and asthma impact. Caregivers (n = 25 of children ages of 5 and 17 years provided qualitative feedback on proxy-report items to assess any major issues with these items. From May 2008 to March 2009, the large-scale survey enrolled children ages 8-17 years to complete the self-report version and caregivers to complete the proxy-report version of the survey (n = 1548 dyads. Caregivers of children ages 5 to 7 years completed the proxy report survey (n = 432. In addition, caregivers completed other proxy instruments, PedsQL™ 4.0 Generic Core Scales Parent Proxy-Report version, PedsQL™ Asthma Module Parent Proxy-Report version, and KIDSCREEN Parent-Proxy-52. Results Item content was well understood by proxies and did not require item revisions but some proxies clearly noted that determining an answer on behalf of their child was difficult for some items. Dyads and caregivers of children ages 5-17 years old were enrolled in the large-scale testing. The majority were female (85%, married (70%, Caucasian (64% and had at least a high school education (94%. Approximately 50% had children with a chronic health condition, primarily

  14. A Multiple-Item Scale for Assessing E-Government Service Quality

    Science.gov (United States)

    Papadomichelaki, Xenia; Mentzas, Gregoris

    A critical element in the evolution of e-governmental services is the development of sites that better serve the citizens’ needs. To deliver superior service quality, we must first understand how citizens perceive and evaluate online citizen service. This involves defining what e-government service quality is, identifying its underlying dimensions, and determining how it can be conceptualized and measured. In this article we conceptualise an e-government service quality model (e-GovQual) and then we develop, refine, validate, confirm and test a multiple-item scale for measuring e-government service quality for public administration sites where citizens seek either information or services.

  15. Validation of a 4-item Negative Symptom Assessment (NSA-4): a short, practical clinical tool for the assessment of negative symptoms in schizophrenia.

    Science.gov (United States)

    Alphs, Larry; Morlock, Robert; Coon, Cheryl; Cazorla, Pilar; Szegedi, Armin; Panagides, John

    2011-06-01

    The 16-item Negative Symptom Assessment (NSA-16) scale is a validated tool for evaluating negative symptoms of schizophrenia. The psychometric properties and predictive power of a four-item version (NSA-4) were compared with the NSA-16. Baseline data from 561 patients with predominant negative symptoms of schizophrenia who participated in two identically designed clinical trials were evaluated. Ordered logistic regression analysis of ratings using NSA-4 and NSA-16 were compared with ratings using several other standard tools to determine predictive validity and construct validity. Internal consistency and test--retest reliability were also analyzed. NSA-16 and NSA-4 scores were both predictive of scores on the NSA global rating (odds ratio = 0.83-0.86) and the Clinical Global Impressions--Severity scale (odds ratio = 0.91-0.93). NSA-16 and NSA-4 showed high correlation with each other (Pearson r = 0.85), similar high correlation with other measures of negative symptoms (demonstrating convergent validity), and lesser correlations with measures of other forms of psychopathology (demonstrating divergent validity). NSA-16 and NSA-4 both showed acceptable internal consistency (Cronbach α, 0.85 and 0.64, respectively) and test--retest reliability (intraclass correlation coefficient, 0.87 and 0.82). This study demonstrates that NSA-4 offers accuracy comparable to the NSA-16 in rating negative symptoms in patients with schizophrenia. Copyright © 2011 John Wiley & Sons, Ltd.

  16. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Science.gov (United States)

    2010-04-01

    ... 17 Commodity and Securities Exchanges 3 2010-04-01 2010-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  17. Information and processes underlying semantic and episodic memory across tasks, items, and individuals.

    Science.gov (United States)

    Cox, Gregory E; Hemmer, Pernille; Aue, William R; Criss, Amy H

    2018-04-01

    The development of memory theory has been constrained by a focus on isolated tasks rather than the processes and information that are common to situations in which memory is engaged. We present results from a study in which 453 participants took part in five different memory tasks: single-item recognition, associative recognition, cued recall, free recall, and lexical decision. Using hierarchical Bayesian techniques, we jointly analyzed the correlations between tasks within individuals-reflecting the degree to which tasks rely on shared cognitive processes-and within items-reflecting the degree to which tasks rely on the same information conveyed by the item. Among other things, we find that (a) the processes involved in lexical access and episodic memory are largely separate and rely on different kinds of information, (b) access to lexical memory is driven primarily by perceptual aspects of a word, (c) all episodic memory tasks rely to an extent on a set of shared processes which make use of semantic features to encode both single words and associations between words, and (d) recall involves additional processes likely related to contextual cuing and response production. These results provide a large-scale picture of memory across different tasks which can serve to drive the development of comprehensive theories of memory. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  18. Affect, Behavior, Cognition, and Desire in the Big Five: An Analysis of Item Content and Structure

    Science.gov (United States)

    Wilt, Joshua; Revelle, William

    2015-01-01

    Personality psychology is concerned with affect (A), behavior (B), cognition (C) and desire (D), and personality traits have been defined conceptually as abstractions used to either explain or summarize coherent ABC (and sometimes D) patterns over time and space. However, this conceptual definition of traits has not been reflected in their operationalization, possibly resulting in theoretical and practical limitations to current trait inventories. Thus, the goal of this project was to determine the affective, behavioral, cognitive and desire (ABCD) components of Big-Five personality traits. The first study assessed the ABCD content of items measuring Big-Five traits in order to determine the ABCD composition of traits and identify items measuring relatively high amounts of only one ABCD content. The second study examined the correlational structure of scales constructed from items assessing ABCD content via a large, web-based study. An assessment of Big-Five traits that delineates ABCD components of each trait is presented, and the discussion focuses on how this assessment builds upon current approaches of assessing personality. PMID:26279606

  19. Developing a short version of the Toronto Structured Interview for Alexithymia using item response theory.

    Science.gov (United States)

    Sekely, Angela; Taylor, Graeme J; Bagby, R Michael

    2018-03-17

    The Toronto Structured Interview for Alexithymia (TSIA) was developed to provide a structured interview method for assessing alexithymia. One drawback of this instrument is the amount of time it takes to administer and score. The current study used item response theory (IRT) methods to analyze data from a large heterogeneous multi-language sample (N = 842) to investigate whether a subset of items could be selected to create a short version of the instrument. Samejima's (1969) graded response model was used to fit the item responses. Items providing maximum information were retained in the short model, resulting in the elimination of 12-items from the original 24-items. Despite the 50% reduction in the number of items, 65.22% of the information was retained. Further studies are needed to validate the short version. A short version of the TSIA is potentially of practical value to clinicians and researchers with time constraints. Copyright © 2018. Published by Elsevier B.V.

  20. Validation of the Spanish versions of the long (26 items) and short (12 items) forms of the Self-Compassion Scale (SCS).

    Science.gov (United States)

    Garcia-Campayo, Javier; Navarro-Gil, Mayte; Andrés, Eva; Montero-Marin, Jesús; López-Artal, Lorena; Demarzo, Marcelo Marcos Piva

    2014-01-10

    Self-compassion is a key psychological construct for assessing clinical outcomes in mindfulness-based interventions. The aim of this study was to validate the Spanish versions of the long (26 item) and short (12 item) forms of the Self-Compassion Scale (SCS). The translated Spanish versions of both subscales were administered to two independent samples: Sample 1 was comprised of university students (n = 268) who were recruited to validate the long form, and Sample 2 was comprised of Aragon Health Service workers (n = 271) who were recruited to validate the short form. In addition to SCS, the Mindful Attention Awareness Scale (MAAS), the State-Trait Anxiety Inventory-Trait (STAI-T), the Beck Depression Inventory (BDI) and the Perceived Stress Questionnaire (PSQ) were administered. Construct validity, internal consistency, test-retest reliability and convergent validity were tested. The Confirmatory Factor Analysis (CFA) of the long and short forms of the SCS confirmed the original six-factor model in both scales, showing goodness of fit. Cronbach's α for the 26 item SCS was 0.87 (95% CI = 0.85-0.90) and ranged between 0.72 and 0.79 for the 6 subscales. Cronbach's α for the 12-item SCS was 0.85 (95% CI = 0.81-0.88) and ranged between 0.71 and 0.77 for the 6 subscales. The long (26-item) form of the SCS showed a test-retest coefficient of 0.92 (95% CI = 0.89-0.94). The Intraclass Correlation (ICC) for the 6 subscales ranged from 0.84 to 0.93. The short (12-item) form of the SCS showed a test-retest coefficient of 0.89 (95% CI: 0.87-0.93). The ICC for the 6 subscales ranged from 0.79 to 0.91. The long and short forms of the SCS exhibited a significant negative correlation with the BDI, the STAI and the PSQ, and a significant positive correlation with the MAAS. The correlation between the total score of the long and short SCS form was r = 0.92. The Spanish versions of the long (26-item) and short (12-item) forms of the SCS are valid and

  1. The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

    Science.gov (United States)

    Sahin, Alper; Anil, Duygu

    2017-01-01

    This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…

  2. Development of the Quantitative Reasoning Items on the National Survey of Student Engagement

    Directory of Open Access Journals (Sweden)

    Amber D. Dumford

    2015-01-01

    Full Text Available As society’s needs for quantitative skills become more prevalent, college graduates require quantitative skills regardless of their career choices. Therefore, it is important that institutions assess students’ engagement in quantitative activities during college. This study chronicles the process taken by the National Survey of Student Engagement (NSSE to develop items that measure students’ participation in quantitative reasoning (QR activities. On the whole, findings across the quantitative and qualitative analyses suggest good overall properties for the developed QR items. The items show great promise to explore and evaluate the frequency with which college students participate in QR-related activities. Each year, hundreds of institutions across the United States and Canada participate in NSSE, and, with the addition of these new items on the core survey, every participating institution will have information on this topic. Our hope is that these items will spur conversations on campuses about students’ use of quantitative reasoning activities.

  3. Approximation Preserving Reductions among Item Pricing Problems

    Science.gov (United States)

    Hamane, Ryoso; Itoh, Toshiya; Tomita, Kouhei

    When a store sells items to customers, the store wishes to determine the prices of the items to maximize its profit. Intuitively, if the store sells the items with low (resp. high) prices, the customers buy more (resp. less) items, which provides less profit to the store. So it would be hard for the store to decide the prices of items. Assume that the store has a set V of n items and there is a set E of m customers who wish to buy those items, and also assume that each item i ∈ V has the production cost di and each customer ej ∈ E has the valuation vj on the bundle ej ⊆ V of items. When the store sells an item i ∈ V at the price ri, the profit for the item i is pi = ri - di. The goal of the store is to decide the price of each item to maximize its total profit. We refer to this maximization problem as the item pricing problem. In most of the previous works, the item pricing problem was considered under the assumption that pi ≥ 0 for each i ∈ V, however, Balcan, et al. [In Proc. of WINE, LNCS 4858, 2007] introduced the notion of “loss-leader, ” and showed that the seller can get more total profit in the case that pi < 0 is allowed than in the case that pi < 0 is not allowed. In this paper, we derive approximation preserving reductions among several item pricing problems and show that all of them have algorithms with good approximation ratio.

  4. Dual representation of item positions in verbal short-term memory: Evidence for two access modes.

    Science.gov (United States)

    Lange, Elke B; Verhaeghen, Paul; Cerella, John

    Memory sets of N = 1~5 digits were exposed sequentially from left-to-right across the screen, followed by N recognition probes. Probes had to be compared to memory list items on identity only (Sternberg task) or conditional on list position. Positions were probed randomly or in left-to-right order. Search functions related probe response times to set size. Random probing led to ramped, "Sternbergian" functions whose intercepts were elevated by the location requirement. Sequential probing led to flat search functions-fast responses unaffected by set size. These results suggested that items in STM could be accessed either by a slow search-on-identity followed by recovery of an associated location tag, or in a single step by following item-to-item links in study order. It is argued that this dual coding of location information occurs spontaneously at study, and that either code can be utilised at retrieval depending on test demands.

  5. An emotional functioning item bank of 24 items for computerized adaptive testing (CAT) was established

    DEFF Research Database (Denmark)

    Petersen, Morten Aa.; Gamper, Eva-Maria; Costantini, Anna

    2016-01-01

    of the widely used EORTC Quality of Life questionnaire (QLQ-C30). STUDY DESIGN AND SETTING: On the basis of literature search and evaluations by international samples of experts and cancer patients, 38 candidate items were developed. The psychometric properties of the items were evaluated in a large...... international sample of cancer patients. This included evaluations of dimensionality, item response theory (IRT) model fit, differential item functioning (DIF), and of measurement precision/statistical power. RESULTS: Responses were obtained from 1,023 cancer patients from four countries. The evaluations showed...... that 24 items could be included in a unidimensional IRT model. DIF did not seem to have any significant impact on the estimation of EF. Evaluations indicated that the CAT measure may reduce sample size requirements by up to 50% compared to the QLQ-C30 EF scale without reducing power. CONCLUSION...

  6. Factoring handedness data: I. Item analysis.

    Science.gov (United States)

    Messinger, H B; Messinger, M I

    1995-12-01

    Recently in this journal Peters and Murphy challenged the validity of factor analyses done on bimodal handedness data, suggesting instead that right- and left-handers be studied separately. But bimodality may be avoidable if attention is paid to Oldfield's questionnaire format and instructions for the subjects. Two characteristics appear crucial: a two-column LEFT-RIGHT format for the body of the instrument and what we call Oldfield's Admonition: not to indicate strong preference for handedness item, such as write, unless "... the preference is so strong that you would never try to use the other hand unless absolutely forced to...". Attaining unimodality of an item distribution would seem to overcome the objections of Peters and Murphy. In a 1984 survey in Boston we used Oldfield's ten-item questionnaire exactly as published. This produced unimodal item distributions. With reflection of the five-point item scale and a logarithmic transformation, we achieved a degree of normalization for the items. Two surveys elsewhere based on Oldfield's 20-item list but with changes in the questionnaire format and the instructions, yielded markedly different item distributions with peaks at each extreme and sometimes in the middle as well.

  7. Translation Fidelity of Psychological Scales: An Item Response Theory Analysis of an Individualism-Collectivism Scale.

    Science.gov (United States)

    Bontempo, Robert

    1993-01-01

    Describes a method for assessing the quality of translations based on item response theory (IRT). Results from the IRT technique with French and Chinese versions of a scale measuring individualism-collectivism for samples of 250 U.S., 357 French, and 290 Chinese undergraduates show how several biased items are detected. (SLD)

  8. The medial temporal lobes distinguish between within-item and item-context relations during autobiographical memory retrieval.

    Science.gov (United States)

    Sheldon, Signy; Levine, Brian

    2015-12-01

    During autobiographical memory retrieval, the medial temporal lobes (MTL) relate together multiple event elements, including object (within-item relations) and context (item-context relations) information, to create a cohesive memory. There is consistent support for a functional specialization within the MTL according to these relational processes, much of which comes from recognition memory experiments. In this study, we compared brain activation patterns associated with retrieving within-item relations (i.e., associating conceptual and sensory-perceptual object features) and item-context relations (i.e., spatial relations among objects) with respect to naturalistic autobiographical retrieval. We developed a novel paradigm that cued participants to retrieve information about past autobiographical events, non-episodic within-item relations, and non-episodic item-context relations with the perceptuomotor aspects of retrieval equated across these conditions. We used multivariate analysis techniques to extract common and distinct patterns of activity among these conditions within the MTL and across the whole brain, both in terms of spatial and temporal patterns of activity. The anterior MTL (perirhinal cortex and anterior hippocampus) was preferentially recruited for generating within-item relations later in retrieval whereas the posterior MTL (posterior parahippocampal cortex and posterior hippocampus) was preferentially recruited for generating item-context relations across the retrieval phase. These findings provide novel evidence for functional specialization within the MTL with respect to naturalistic memory retrieval. © 2015 Wiley Periodicals, Inc.

  9. Differential item functioning of the patient-reported outcomes information system (PROMIS®) pain interference item bank by language (Spanish versus English).

    Science.gov (United States)

    Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D

    2017-06-01

    About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.

  10. Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds: Multiple-Choice versus Constructed-Response Training Items

    Science.gov (United States)

    Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.

    2016-01-01

    Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…

  11. A strategy for optimizing item-pool management

    NARCIS (Netherlands)

    Ariel, A.; van der Linden, Willem J.; Veldkamp, Bernard P.

    2006-01-01

    Item-pool management requires a balancing act between the input of new items into the pool and the output of tests assembled from it. A strategy for optimizing item-pool management is presented that is based on the idea of a periodic update of an optimal blueprint for the item pool to tune item

  12. Empirical weighting of Standardised Mini Mental State Examination items among nursing home residents

    DEFF Research Database (Denmark)

    Uhrskov Sørensen, Lisbeth; Foldspang, Anders; Gulmann, Nils Christian

    2001-01-01

    psychiatrist. The two assessments were mutually blinded. Multiple conditional forward logistic regression was used to select the items that most strongly predicted organic disorder as assessed by the psychiatrist. The weighted score had significantly better validity parameters, performed better on a receiver...

  13. Differential items functioning to assess aggressiveness in college students / Funcionamento diferencial de itens para avaliar a agressividade de universitários

    Directory of Open Access Journals (Sweden)

    Fermino Fernandes Sisto

    2008-01-01

    Full Text Available In this research evidences of construct validity were searched analyzing the differential functioning items related to aggressiveness. The participants were 445 college students of both genders, attending the courses of Engineering, Computing and Psychology. The scale of aggressiveness composed by 81 items was collectively applied, in the classroom, to the students who consented to participate in the study. The items of the instrument were studied by means of the Rasch model. Twenty-eight items presented differential functioning item, 15 were characterized as typical for females and 13 for males. The reliability coefficients were 0.99 to the items and 0.86 to the persons. It was concluded that the aggressiveness can be measured separately on the basis of gender.

  14. Is a single item stress measure independently associated with subsequent severe injury: a prospective cohort study of 16,385 forest industry employees.

    Science.gov (United States)

    Salminen, Simo; Kouvonen, Anne; Koskinen, Aki; Joensuu, Matti; Väänänen, Ari

    2014-06-02

    A previous review showed that high stress increases the risk of occupational injury by three- to five-fold. However, most of the prior studies have relied on short follow-ups. In this prospective cohort study we examined the effect of stress on recorded hospitalised injuries in an 8-year follow-up. A total of 16,385 employees of a Finnish forest company responded to the questionnaire. Perceived stress was measured with a validated single-item measure, and analysed in relation recorded hospitalised injuries from 1986 to 2008. We used Cox proportional hazard regression models to examine the prospective associations between work stress, injuries and confounding factors. Highly stressed participants were approximately 40% more likely to be hospitalised due to injury over the follow-up period than participants with low stress. This association remained significant after adjustment for age, gender, marital status, occupational status, educational level, and physical work environment. High stress is associated with an increased risk of severe injury.

  15. The assessment of nonverbal behavior in schizophrenia through the Formal Psychological Assessment.

    Science.gov (United States)

    Granziol, Umberto; Spoto, Andrea; Vidotto, Giulio

    2018-03-01

    The nonverbal behavior (NVB) of people diagnosed with schizophrenia consistently interacts with their symptoms during the assessment. Previous studies frequently observed such an interaction when a prevalence of negative symptoms occurred. Nonetheless, a list of NVBs linked to negative symptoms needs to be defined. Furthermore, a list of items that can exhaustively assess such NVBs is still needed. The present study aims to introduce both lists by using the Formal Psychological Assessment. A deep analysis was performed on both the scientific literature and the DSM-5 for constructing the set of nonverbal behaviors; similarly, an initial list of 138 items investigating the behaviors was obtained from instruments used to assess schizophrenia. The Formal Psychological Assessment was then applied to reduce the preliminary list. A final list of 23 items necessary and sufficient to investigate the NVBs emerged. The list also allowed us to analyze specific relations among items. The present study shows how it is possible to deepen a patient's negative symptomatology, starting with the relations between items and the NVBs they investigate. Finally, this study examines the advantages and clinical implications of defining an assessment tool based on the found list of items. Copyright © 2017 John Wiley & Sons, Ltd.

  16. The practical impact of differential item functioning analyses in a health-related quality of life instrument

    DEFF Research Database (Denmark)

    Scott, Neil W; Fayers, Peter M; Aaronson, Neil K

    2009-01-01

    Differential item functioning (DIF) analyses are commonly used to evaluate health-related quality of life (HRQoL) instruments. There is, however, a lack of consensus as to how to assess the practical impact of statistically significant DIF results.......Differential item functioning (DIF) analyses are commonly used to evaluate health-related quality of life (HRQoL) instruments. There is, however, a lack of consensus as to how to assess the practical impact of statistically significant DIF results....

  17. Trends in Sodium Content of Menu Items in Large Chain Restaurants in the U.S.

    Science.gov (United States)

    Wolfson, Julia A; Moran, Alyssa J; Jarlenski, Marian P; Bleich, Sara N

    2018-01-01

    Consuming too much sodium is associated with increased risk for cardiovascular disease, and restaurant foods are a primary source of sodium. This study assessed recent trends in sodium content of menu items in U.S. chain restaurants. Data from 21,557 menu items in 66 top-earning chain restaurants available from 2012 to 2016 were obtained from the MenuStat project and analyzed in 2017. Generalized linear models were used to examine changes in calorie-adjusted, per-item sodium content of menu items offered in all years (2012-2016) and items offered in 2012 only compared with items newly introduced in 2013, 2014, 2015, and 2016. Overall, calorie-adjusted sodium content in newly introduced menu items declined by 104 mg from 2012 to 2016 (prestaurant type; sodium content, particularly for main course items, was high. Sodium declined by 83 mg in fast food restaurants, 19 mg in fast casual restaurants, and 163 mg in full service restaurants. Sodium in appetizer and side items newly introduced in 2016 increased by 266 mg compared with items on the menu in 2012 only (prestaurants. However, sodium content of core and new menu items remain high, and reductions are inconsistent across menu categories and restaurant types. Copyright © 2018 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  18. Developing an item bank to measure the coping strategies of people with hereditary retinal diseases.

    Science.gov (United States)

    Prem Senthil, Mallika; Khadka, Jyoti; De Roach, John; Lamey, Tina; McLaren, Terri; Campbell, Isabella; Fenwick, Eva K; Lamoureux, Ecosse L; Pesudovs, Konrad

    2018-05-05

    Our understanding of the coping strategies used by people with visual impairment to manage stress related to visual loss is limited. This study aims to develop a sophisticated coping instrument in the form of an item bank implemented via Computerised adaptive testing (CAT) for hereditary retinal diseases. Items on coping were extracted from qualitative interviews with patients which were supplemented by items from a literature review. A systematic multi-stage process of item refinement was carried out followed by expert panel discussion and cognitive interviews. The final coping item bank had 30 items. Rasch analysis was used to assess the psychometric properties. A CAT simulation was carried out to estimate an average number of items required to gain precise measurement of hereditary retinal disease-related coping. One hundred eighty-nine participants answered the coping item bank (median age = 58 years). The coping scale demonstrated good precision and targeting. The standardised residual loadings for items revealed six items grouped together. Removal of the six items reduced the precision of the main coping scale and worsened the variance explained by the measure. Therefore, the six items were retained within the main scale. Our CAT simulation indicated that, on average, less than 10 items are required to gain a precise measurement of coping. This is the first study to develop a psychometrically robust coping instrument for hereditary retinal diseases. CAT simulation indicated that on an average, only four and nine items were required to gain measurement at moderate and high precision, respectively.

  19. Investigating Separate and Concurrent Approaches for Item Parameter Drift in 3PL Item Response Theory Equating

    Science.gov (United States)

    Arce-Ferrer, Alvaro J.; Bulut, Okan

    2017-01-01

    This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…

  20. Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

    Science.gov (United States)

    Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

    2014-01-01

    We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930

  1. Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior

    Science.gov (United States)

    Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia

    2016-01-01

    The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…

  2. Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS: An item response theory approach

    Directory of Open Access Journals (Sweden)

    JOSEPH P. EIMICKE

    2009-06-01

    Full Text Available The aims of this paper are to present findings related to differential item functioning (DIF in the Patient Reported Outcome Measurement Information System (PROMIS depression item bank, and to discuss potential threats to the validity of results from studies of DIF. The 32 depression items studied were modified from several widely used instruments. DIF analyses of gender, age and education were performed using a sample of 735 individuals recruited by a survey polling firm. DIF hypotheses were generated by asking content experts to indicate whether or not they expected DIF to be present, and the direction of the DIF with respect to the studied comparison groups. Primary analyses were conducted using the graded item response model (for polytomous, ordered response category data with likelihood ratio tests of DIF, accompanied by magnitude measures. Sensitivity analyses were performed using other item response models and approaches to DIF detection. Despite some caveats, the items that are recommended for exclusion or for separate calibration were "I felt like crying" and "I had trouble enjoying things that I used to enjoy." The item, "I felt I had no energy," was also flagged as evidencing DIF, and recommended for additional review. On the one hand, false DIF detection (Type 1 error was controlled to the extent possible by ensuring model fit and purification. On the other hand, power for DIF detection might have been compromised by several factors, including sparse data and small sample sizes. Nonetheless, practical and not just statistical significance should be considered. In this case the overall magnitude and impact of DIF was small for the groups studied, although impact was relatively large for some individuals.

  3. Single mothers' self-assessment of health: a systematic exploration of the literature.

    Science.gov (United States)

    Rousou, E; Kouta, C; Middleton, N; Karanikola, M

    2013-12-01

    This study aimed to explore single mothers' self-assessed level of health status compared to partnered mothers and the relevant factors associated with it. The number of single-mother families is increasing worldwide. A large body of international research reveals that single mothers experience poorer physical and mental health than their married counterparts. An important contributory factor for this health disparity appears to be socio-economic disadvantage. A systematic search of the literature was conducted using the keywords 'lone' or 'single' and 'mother*' or 'parent*' or 'family structure' in combination with 'health'. EMBASE, CINAHL, COCHRANE and PUBMED databases were searched for quantitative research studies published in the past decade. Eleven quantitative research articles with self-assessment of health status in single mothers were identified. Single mothers report lower levels of health status compared to partnered mothers. These inequalities appear to be associated with financial hardship and lack of social support. Both these factors increase single mothers' susceptibility to stress and illness. Despite the study limitations (e.g. results based mainly on secondary data from household surveys), it provides evidence that single motherhood places women in an adverse social position that is associated with prolonged stress mainly due to unemployment, economic hardship and social exclusion, which affects negatively their health status. These findings can be seen as a challenge for health professionals, especially those working in the community sector and policy makers too, to establish supportive measures for this vulnerable group focused on socio-economic factors. © 2013 International Council of Nurses.

  4. Item analysis of ADAS-Cog: effect of baseline cognitive impairment in a clinical AD trial.

    Science.gov (United States)

    Sevigny, Jeffrey J; Peng, Yahong; Liu, Lian; Lines, Christopher R

    2010-03-01

    We explored the association of Alzheimer's disease (AD) Assessment Scale (ADAS-Cog) item scores with AD severity using cross-sectional and longitudinal data from the same study. Post hoc analyses were performed using placebo data from a 12-month trial of patients with mild-to-moderate AD (N =281 randomized, N =209 completed). Baseline distributions of ADAS-Cog item scores by Mini-Mental State Examination (MMSE) score and Clinical Dementia Rating (CDR) sum of boxes score (measures of dementia severity) were estimated using local and nonparametric regressions. Mixed-effect models were used to characterize ADAS-Cog item score changes over time by dementia severity (MMSE: mild =21-26, moderate =14-20; global CDR: mild =0.5-1, moderate =2). In the cross-sectional analysis of baseline ADAS-Cog item scores, orientation was the most sensitive item to differentiate patients across levels of cognitive impairment. Several items showed a ceiling effect, particularly in milder AD. In the longitudinal analysis of change scores over 12 months, orientation was the only item with noticeable decline (8%-10%) in mild AD. Most items showed modest declines (5%-20%) in moderate AD.

  5. 76 FR 60474 - Commercial Item Handbook

    Science.gov (United States)

    2011-09-29

    ... DEPARTMENT OF DEFENSE Defense Acquisition Regulations System Commercial Item Handbook AGENCY.... SUMMARY: DoD has updated its Commercial Item Handbook. The purpose of the Handbook is to help acquisition personnel develop sound business strategies for procuring commercial items. DoD is seeking industry input on...

  6. The emotion dysregulation inventory: Psychometric properties and item response theory calibration in an autism spectrum disorder sample.

    Science.gov (United States)

    Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A

    2018-04-06

    Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical

  7. Dependability and Treatment Sensitivity of Multi-Item Direct Behavior Rating Scales for Interpersonal Peer Conflict

    Science.gov (United States)

    Daniels, Brian; Volpe, Robert J.; Briesch, Amy M.; Gadow, Kenneth D.

    2017-01-01

    Direct behavior rating (DBR) represents a feasible method for monitoring student behavior in the classroom; however, limited work to date has focused on the use of multi-item scales. The purposes of the study were to examine the (a) dependability of data obtained from a multi-item DBR designed to assess peer conflict and (b) treatment sensitivity…

  8. Single-instruction multiple-data execution

    CERN Document Server

    Hughes, Christopher J

    2015-01-01

    Having hit power limitations to even more aggressive out-of-order execution in processor cores, many architects in the past decade have turned to single-instruction-multiple-data (SIMD) execution to increase single-threaded performance. SIMD execution, or having a single instruction drive execution of an identical operation on multiple data items, was already well established as a technique to efficiently exploit data parallelism. Furthermore, support for it was already included in many commodity processors. However, in the past decade, SIMD execution has seen a dramatic increase in the set of

  9. Short Scales for the Assessment of Personality Traits: Development and Validation of the Portuguese Ten-Item Personality Inventory (TIPI)

    Science.gov (United States)

    Nunes, Andreia; Limpo, Teresa; Lima, César F.; Castro, São Luís

    2018-01-01

    The importance of quickly assessing personality traits in many studies prompted the development of brief scales such as the Ten-Item Personality Inventory (TIPI), a measure of five personality traits (extraversion, agreeableness, conscientiousness, emotional stability, and openness). In the current study, we present the Portuguese version of TIPI and examine its psychometric properties, based on a sample of 333 Portuguese adults aged 18 to 65 years. The results revealed reliability coefficients similar to the original version (α = 0.39–0.72), very good 4-week test–retest reliability (n = 81, rs > 0.71), expected factorial structure, high convergent validity with the Big-Five Inventory (rs > 0.60), and correlations with self-esteem, affect, and aggressiveness similar to those found with standard measures of personality traits. Overall, our findings suggest that the Portuguese TIPI is a reliable and valid alternative to longer measures: it offers a promising tool for research contexts in which the available time for personality assessment is highly limited. PMID:29674989

  10. Short Scales for the Assessment of Personality Traits: Development and Validation of the Portuguese Ten-Item Personality Inventory (TIPI).

    Science.gov (United States)

    Nunes, Andreia; Limpo, Teresa; Lima, César F; Castro, São Luís

    2018-01-01

    The importance of quickly assessing personality traits in many studies prompted the development of brief scales such as the Ten-Item Personality Inventory (TIPI), a measure of five personality traits (extraversion, agreeableness, conscientiousness, emotional stability, and openness). In the current study, we present the Portuguese version of TIPI and examine its psychometric properties, based on a sample of 333 Portuguese adults aged 18 to 65 years. The results revealed reliability coefficients similar to the original version (α = 0.39-0.72), very good 4-week test-retest reliability ( n = 81, r s > 0.71), expected factorial structure, high convergent validity with the Big-Five Inventory ( r s > 0.60), and correlations with self-esteem, affect, and aggressiveness similar to those found with standard measures of personality traits. Overall, our findings suggest that the Portuguese TIPI is a reliable and valid alternative to longer measures: it offers a promising tool for research contexts in which the available time for personality assessment is highly limited.

  11. Calibration of communication skills items in OSCE checklists according to the MAAS-Global.

    Science.gov (United States)

    Setyonugroho, Winny; Kropmans, Thomas; Kennedy, Kieran M; Stewart, Brian; van Dalen, Jan

    2016-01-01

    Communication skills (CS) are commonly assessed using 'communication items' in Objective Structured Clinical Examination (OSCE) station checklists. Our aim is to calibrate the communication component of OSCE station checklists according to the MAAS-Global which is a valid and reliable standard to assess CS in undergraduate medical education. Three raters independently compared 280 checklists from 4 disciplines contributing to the undergraduate year 4 OSCE against the 17 items of the MAAS-Global standard. G-theory was used to analyze the reliability of this calibration procedure. G-Kappa was 0.8. For two raters G-Kappa is 0.72 and it fell to 0.57 for one rater. 46% of the checklist items corresponded to section three of the MAAS-Global (i.e. medical content of the consultation), whilst 12% corresponded to section two (i.e. general CS), and 8.2% to section one (i.e. CS for each separate phase of the consultation). 34% of the items were not considered to be CS. A G-Kappa of 0.8 confirms a reliable and valid procedure for calibrating OSCE CS checklist items using the MAAS-Global. We strongly suggest that such a procedure is more widely employed to arrive at a stable (valid and reliable) judgment of the communication component in existing checklists for medical students' communication behaviours. It is possible to measure the 'true' caliber of CS in OSCE stations. Students' results are thereby comparable between and across stations, students and institutions. A reliable calibration procedure requires only two raters. Copyright © 2015. Published by Elsevier Ireland Ltd.

  12. Item Analysis in Introductory Economics Testing.

    Science.gov (United States)

    Tinari, Frank D.

    1979-01-01

    Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)

  13. A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating

    Directory of Open Access Journals (Sweden)

    Michalis P Michaelides

    2010-10-01

    Full Text Available Many studies have investigated the topic of change or drift in item parameter estimates in the context of Item Response Theory. Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.

  14. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

    Science.gov (United States)

    Michaelides, Michalis P

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.

  15. Connecting single-stock assessment models through correlated survival

    DEFF Research Database (Denmark)

    Albertsen, Christoffer Moesgaard; Nielsen, Anders; Thygesen, Uffe Høgsbro

    2017-01-01

    times. We propose a simple alternative. In three case studies each with two stocks, we improve the single-stock models, as measured by Akaike information criterion, by adding correlation in the cohort survival. To limit the number of parameters, the correlations are parameterized through...... the corresponding partial correlations. We consider six models where the partial correlation matrix between stocks follows a band structure ranging from independent assessments to complex correlation structures. Further, a simulation study illustrates the importance of handling correlated data sufficiently...... by investigating the coverage of confidence intervals for estimated fishing mortality. The results presented will allow managers to evaluate stock statuses based on a more accurate evaluation of model output uncertainty. The methods are directly implementable for stocks with an analytical assessment and do...

  16. Barriers and benefits to desired behaviors for single use plastic items in northeast Ohio's Lake Erie basin.

    Science.gov (United States)

    Bartolotta, Jill F; Hardy, Scott D

    2018-02-01

    Given the growing saliency of plastic marine debris, and the impact of plastics on beaches and aquatic environments in the Laurentian Great Lakes, applied research is needed to support municipal and nongovernmental campaigns to prevent debris from reaching the water's edge. This study addresses this need by examining the barriers and benefits to positive behavior for two plastic debris items in northeast Ohio's Lake Erie basin: plastic bags and plastic water bottles. An online survey is employed to gather data on the use and disposal of these plastic items and to solicit recommendations on how to positively change behavior to reduce improper disposal. Results support a ban on plastic bags and plastic water bottles, with more enthusiasm for a bag ban. Financial incentives are also seen as an effective way to influence behavior change, as are location-specific solutions focused on education and outreach. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. A Comparison of the 27-Item and 12-Item Intolerance of Uncertainty Scales

    Science.gov (United States)

    Khawaja, Nigar G.; Yu, Lai Ngo Heidi

    2010-01-01

    The 27-item Intolerance of Uncertainty Scale (IUS) has become one of the most frequently used measures of Intolerance of Uncertainty. More recently, an abridged, 12-item version of the IUS has been developed. The current research used clinical (n = 50) and non-clinical (n = 56) samples to examine and compare the psychometric properties of both…

  18. The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

    Science.gov (United States)

    Liou, Pey-Yan; Bulut, Okan

    2017-12-01

    The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.

  19. More is not Always Better: The Relation between Item Response and Item Response Time in Raven’s Matrices

    Directory of Open Access Journals (Sweden)

    Frank Goldhammer

    2015-03-01

    Full Text Available The role of response time in completing an item can have very different interpretations. Responding more slowly could be positively related to success as the item is answered more carefully. However, the association may be negative if working faster indicates higher ability. The objective of this study was to clarify the validity of each assumption for reasoning items considering the mode of processing. A total of 230 persons completed a computerized version of Raven’s Advanced Progressive Matrices test. Results revealed that response time overall had a negative effect. However, this effect was moderated by items and persons. For easy items and able persons the effect was strongly negative, for difficult items and less able persons it was less negative or even positive. The number of rules involved in a matrix problem proved to explain item difficulty significantly. Most importantly, a positive interaction effect between the number of rules and item response time indicated that the response time effect became less negative with an increasing number of rules. Moreover, exploratory analyses suggested that the error type influenced the response time effect.

  20. The continuity between DSM-5 obsessive-compulsive personality disorder traits and obsessive-compulsive symptoms in adolescence: an item response theory study.

    Science.gov (United States)

    De Caluwé, Elien; Rettew, David C; De Clercq, Barbara

    2014-11-01

    Various studies have shown that obsessive-compulsive symptoms exist as part of not only obsessive-compulsive disorder (OCD) but also obsessive-compulsive personality disorder (OCPD). Despite these shared characteristics, there is an ongoing debate on the inclusion of OCPD into the recently developed DSM-5 obsessive-compulsive and related disorders (OCRDs) category. The current study aims to clarify whether this inclusion can be justified from an item response theory approach. The validity of the continuity model for understanding the association between OCD and OCPD was explored in 787 Dutch community and referred adolescents (70% female, 12-20 years old, mean = 16.16, SD = 1.40) studied between July 2011 and January 2013, relying on item response theory (IRT) analyses of self-reported OCD symptoms (Youth Obsessive-Compulsive Symptoms Scale [YOCSS]) and OCPD traits (Personality Inventory for DSM-5 [PID-5]). The results support the continuity hypothesis, indicating that both OCD and OCPD can be represented along a single underlying spectrum. OCD, and especially the obsessive symptom domain, can be considered as the extreme end of OCPD traits. The current study empirically supports the classification of OCD and OCPD along a single dimension. This integrative perspective in OC-related pathology addresses the dimensional nature of traits and psychopathology and may improve the transparency and validity of assessment procedures. © Copyright 2014 Physicians Postgraduate Press, Inc.

  1. Negative effects of item repetition on source memory.

    Science.gov (United States)

    Kim, Kyungmi; Yi, Do-Joon; Raye, Carol L; Johnson, Marcia K

    2012-08-01

    In the present study, we explored how item repetition affects source memory for new item-feature associations (picture-location or picture-color). We presented line drawings varying numbers of times in Phase 1. In Phase 2, each drawing was presented once with a critical new feature. In Phase 3, we tested memory for the new source feature of each item from Phase 2. Experiments 1 and 2 demonstrated and replicated the negative effects of item repetition on incidental source memory. Prior item repetition also had a negative effect on source memory when different source dimensions were used in Phases 1 and 2 (Experiment 3) and when participants were explicitly instructed to learn source information in Phase 2 (Experiments 4 and 5). Importantly, when the order between Phases 1 and 2 was reversed, such that item repetition occurred after the encoding of critical item-source combinations, item repetition no longer affected source memory (Experiment 6). Overall, our findings did not support predictions based on item predifferentiation, within-dimension source interference, or general interference from multiple traces of an item. Rather, the findings were consistent with the idea that prior item repetition reduces attention to subsequent presentations of the item, decreasing the likelihood that critical item-source associations will be encoded.

  2. How French subjects describe well-being from food and eating habits? Development, item reduction and scoring definition of the Well-Being related to Food Questionnaire (Well-BFQ©).

    Science.gov (United States)

    Guillemin, I; Marrel, A; Arnould, B; Capuron, L; Dupuy, A; Ginon, E; Layé, S; Lecerf, J-M; Prost, M; Rogeaux, M; Urdapilleta, I; Allaert, F-A

    2016-01-01

    Providing well-being and maintaining good health are main objectives subjects seek from diet. This manuscript describes the development and preliminary validation of an instrument assessing well-being associated with food and eating habits in a general healthy population. Qualitative data from 12 groups of discussion (102 subjects) conducted with healthy subjects were used to develop the core of the Well-being related to Food Questionnaire (Well-BFQ). Twelve other groups of discussion with subjects with joint (n = 34), digestive (n = 32) or repetitive infection complaints (n = 30) were performed to develop items specific to these complaints. Five main themes emerged from the discussions and formed the modular backbone of the questionnaire: "Grocery shopping", "Cooking", "Dining places", "Commensality", "Eating and drinking". Each module has a common structure: items about subject's food behavior and items about immediate and short-term benefits. An additional theme - "Eating habits and health" - assesses subjects' beliefs about expected benefits of food and eating habits on health, disease prevention and protection, and quality of ageing. A preliminary validation was conducted with 444 subjects with balanced diet; non-balanced diet; and standard diet. The structure of the questionnaire was further determined using principal component analyses exploratory factor analyses, with confirmation of the sub-sections food behaviors, immediate benefits (pleasure, security, relaxation), direct short-term benefits (digestion and satiety, energy and psychology), and deferred long-term benefits (eating habits and health). Thirty-three subscales and 14 single items were further defined. Confirmatory analyses confirmed the structure, with overall moderate to excellent convergent and divergent validity and internal consistency reliability. The Well-BFQ is a unique, modular tool that comprehensively assesses the full picture of well-being related to food and eating habits in

  3. Psychometric Consequences of Subpopulation Item Parameter Drift

    Science.gov (United States)

    Huggins-Manley, Anne Corinne

    2017-01-01

    This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…

  4. Few items in the thyroid-related quality of life instrument ThyPRO exhibited differential item functioning.

    Science.gov (United States)

    Watt, Torquil; Groenvold, Mogens; Hegedüs, Laszlo; Bonnema, Steen Joop; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue

    2014-02-01

    To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis. A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (∆R(2) > 0.02). Scale level was estimated by the sum score, after purification. Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1-2.3 points on the 0-100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively. Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.

  5. Loglinear multidimensional IRT models for polytomously scired Items

    NARCIS (Netherlands)

    Kelderman, Henk

    1988-01-01

    A loglinear item response theory (IRT) model is proposed that relates polytomously scored item responses to a multidimensional latent space. Each item may have a different response function where each item response may be explained by one or more latent traits. Item response functions may follow a

  6. 48 CFR 852.214-72 - Alternate item(s).

    Science.gov (United States)

    2010-10-01

    ... AND FORMS SOLICITATION PROVISIONS AND CONTRACT CLAUSES Texts of Provisions and Clauses 852.214-72... 2008) Bids on []* will be given equal consideration along with bids on []** and any such bids received... [].** * Contracting officer will insert an alternate item that is considered acceptable. ** Contracting officer will...

  7. Modeling and Stability Assessment of Single-Phase Grid Synchronization Techniques

    DEFF Research Database (Denmark)

    Golestan, Saeed; Guerrero, Josep M.; Vasquez, Juan

    2018-01-01

    (GSTs) is of vital importance. This task is most often based on obtaining a linear time-invariant (LTI) model for the GST and applying standard stability tests to it. Another option is modeling and dynamics/stability assessment of GSTs in the linear time-periodic (LTP) framework, which has received...... a very little attention. In this letter, the procedure of deriving the LTP model for single-phase GSTs is first demonstrated. The accuracy of the LTP model in predicting the GST dynamic behavior and stability is then evaluated and compared with that of the LTI one. Two well-known single-phase GSTs, i...

  8. Development and Evaluation of the PROMIS® Pediatric Positive Affect Item Bank, Child-Report and Parent-Proxy Editions.

    Science.gov (United States)

    Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B

    2018-03-01

    The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.

  9. Development of a Comprehensive Assessment of Food Parenting Practices: The Home Self-Administered Tool for Environmental Assessment of Activity and Diet Family Food Practices Survey.

    Science.gov (United States)

    Vaughn, Amber E; Dearth-Wesley, Tracy; Tabak, Rachel G; Bryant, Maria; Ward, Dianne S

    2017-02-01

    Parents' food parenting practices influence children's dietary intake and risk for obesity and chronic disease. Understanding the influence and interactions between parents' practices and children's behavior is limited by a lack of development and psychometric testing and/or limited scope of current measures. The Home Self-Administered Tool for Environmental Assessment of Activity and Diet (HomeSTEAD) was created to address this gap. This article describes development and psychometric testing of the HomeSTEAD family food practices survey. Between August 2010 and May 2011, a convenience sample of 129 parents of children aged 3 to 12 years were recruited from central North Carolina and completed the self-administered HomeSTEAD survey on three occasions during a 12- to 18-day window. Demographic characteristics and child diet were assessed at Time 1. Child height and weight were measured during the in-home observations (following Time 1 survey). Exploratory factor analysis with Time 1 data was used to identify potential scales. Scales with more than three items were examined for scale reduction. Following this, mean scores were calculated at each time point. Construct validity was assessed by examining Spearman rank correlations between mean scores (Time 1) and children's diet (fruits and vegetables, sugar-sweetened beverages, snacks, sweets) and body mass index (BMI) z scores. Repeated measures analysis of variance was used to examine differences in mean scores between time points, and single-measure intraclass correlations were calculated to examine test-retest reliability between time points. Exploratory factor analysis identified 24 factors and retained 124 items; however, scale reduction narrowed items to 86. The final instrument captures five coercive control practices (16 items), seven autonomy support practices (24 items), and 12 structure practices (46 items). All scales demonstrated good internal reliability (α>.62), 18 factors demonstrated construct

  10. The Protective Behavioral Strategies for Marijuana Scale: Further examination using item response theory.

    Science.gov (United States)

    Pedersen, Eric R; Huang, Wenjing; Dvorak, Robert D; Prince, Mark A; Hummer, Justin F

    2017-08-01

    Given recent state legislation legalizing marijuana for recreational purposes and majority popular opinion favoring these laws, we developed the Protective Behavioral Strategies for Marijuana scale (PBSM) to identify strategies that may mitigate the harms related to marijuana use among those young people who choose to use the drug. In the current study, we expand on the initial exploratory study of the PBSM to further validate the measure with a large and geographically diverse sample (N = 2,117; 60% women, 30% non-White) of college students from 11 different universities across the United States. We sought to develop a psychometrically sound item bank for the PBSM and to create a short assessment form that minimizes respondent burden and time. Quantitative item analyses, including exploratory and confirmatory factor analyses with item response theory (IRT) and evaluation of differential item functioning (DIF), revealed an item bank of 36 items that was examined for unidimensionality and good content coverage, as well as a short form of 17 items that is free of bias in terms of gender (men vs. women), race (White vs. non-White), ethnicity (Hispanic vs. non-Hispanic), and recreational marijuana use legal status (state recreational marijuana was legal for 25.5% of participants). We also provide a scoring table for easy transformation from sum scores to IRT scale scores. The PBSM item bank and short form associated strongly and negatively with past month marijuana use and consequences. The measure may be useful to researchers and clinicians conducting intervention and prevention programs with young adults. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  11. An Extended Validity Argument for Assessing Feedback Culture.

    Science.gov (United States)

    Rougas, Steven; Clyne, Brian; Cianciolo, Anna T; Chan, Teresa M; Sherbino, Jonathan; Yarris, Lalena M

    2015-01-01

    NEGEA 2015 CONFERENCE ABSTRACT (EDITED): Measuring an Organization's Culture of Feedback: Can It Be Done? Steven Rougas and Brian Clyne. CONSTRUCT: This study sought to develop a construct for measuring formative feedback culture in an academic emergency medicine department. Four archetypes (Market, Adhocracy, Clan, Hierarchy) reflecting an organization's values with respect to focus (internal vs. external) and process (flexibility vs. stability and control) were used to characterize one department's receptiveness to formative feedback. The prevalence of residents' identification with certain archetypes served as an indicator of the department's organizational feedback culture. New regulations have forced academic institutions to implement wide-ranging changes to accommodate competency-based milestones and their assessment. These changes challenge residencies that use formative feedback from faculty as a major source of data for determining training advancement. Though various approaches have been taken to improve formative feedback to residents, there currently exists no tool to objectively measure the organizational culture that surrounds this process. Assessing organizational culture, commonly used in the business sector to represent organizational health, may help residency directors gauge their program's success in fostering formative feedback. The Organizational Culture Assessment Instrument (OCAI) is widely used, extensively validated, applicable to survey research, and theoretically based and may be modifiable to assess formative feedback culture in the emergency department. Using a modified Delphi technique and several iterations of focus groups amongst educators at one institution, four of the original six OCAI domains (which each contain 4 possible responses) were modified to create a 16-item Formative Feedback Culture Tool (FFCT) that was administered to 26 residents (response rate = 55%) at a single academic emergency medicine department. The mean

  12. Discriminant content validity: a quantitative methodology for assessing content of theory-based measures, with illustrative applications.

    Science.gov (United States)

    Johnston, Marie; Dixon, Diane; Hart, Jo; Glidewell, Liz; Schröder, Carin; Pollard, Beth

    2014-05-01

    In studies involving theoretical constructs, it is important that measures have good content validity and that there is not contamination of measures by content from other constructs. While reliability and construct validity are routinely reported, to date, there has not been a satisfactory, transparent, and systematic method of assessing and reporting content validity. In this paper, we describe a methodology of discriminant content validity (DCV) and illustrate its application in three studies. Discriminant content validity involves six steps: construct definition, item selection, judge identification, judgement format, single-sample test of content validity, and assessment of discriminant items. In three studies, these steps were applied to a measure of illness perceptions (IPQ-R) and control cognitions. The IPQ-R performed well with most items being purely related to their target construct, although timeline and consequences had small problems. By contrast, the study of control cognitions identified problems in measuring constructs independently. In the final study, direct estimation response formats for theory of planned behaviour constructs were found to have as good DCV as Likert format. The DCV method allowed quantitative assessment of each item and can therefore inform the content validity of the measures assessed. The methods can be applied to assess content validity before or after collecting data to select the appropriate items to measure theoretical constructs. Further, the data reported for each item in Appendix S1 can be used in item or measure selection. Statement of contribution What is already known on this subject? There are agreed methods of assessing and reporting construct validity of measures of theoretical constructs, but not their content validity. Content validity is rarely reported in a systematic and transparent manner. What does this study add? The paper proposes discriminant content validity (DCV), a systematic and transparent method

  13. 78 FR 31399 - Recoupment of Nonrecurring Costs (NCs) on Sales of U.S. Items

    Science.gov (United States)

    2013-05-24

    ... updates policy, responsibilities, and procedures for calculating and assessing NC recoupment charges on... Control Act, as amended'') for calculating and assessing NC recoupment charges on sales of items developed... adversely affect in a material way the economy; a section of the economy; productivity; competition; jobs...

  14. Weighting and Aggregation in Life Cycle Assessment: Do Present Aggregated Single Scores Provide Correct Decision Support?

    DEFF Research Database (Denmark)

    Kalbar, Pradip; Birkved, Morten; Nygaard, Simon Elsborg

    2016-01-01

    This study investigates the prevailing practice of obtaining single scores in life cycle assessment (LCA) and identifies potential lacunas in impact assessment methodology related to the results of aggregation into endpoints and single scores. In order to conduct this investigation, a detailed...... approach was adopted to facilitate identification of three main problems related to the single-score calculation approach. The prevailing ReCiPe single-score calculation method does not account for either the effect of so-called dominating alternatives (i.e., alternatives having high values across all...

  15. Reliability and validity of the 12-item WHODAS 2.0 in patients with Kashin-Beck disease.

    Science.gov (United States)

    Younus, Mohammad Imran; Wang, Di-Miao; Yu, Fang-Fang; Fang, Hua; Guo, Xiong

    2017-09-01

    The purpose of this study was to check the reliability and validity of the 12-item Chinese version of the World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) for the assessment of disability in patients with Kashin-Beck disease (KBD). We recruited 219 patients with KBD from the high-risk KBD area in the Shaanxi province, using stratified multistage random sampling. We assessed each patient using the Chinese version of the 12-item WHODAS 2.0 and the Western Ontario and McMaster Universities Index of Osteoarthritis (WOMAC). Statistical evaluations of the instruments consisted of Cronbach's alpha, intraclass correlation coefficient (ICC), confirmatory factor analysis (CFA), and Pearson's correlation coefficient. Cronbach's alpha and ICC for the six domains ranged from 0.704 to 0.906 and 0.690 to 0.852, respectively. A six-factor structure fits the data well (CFI = 0.967, TLI = 0.944, RMSEA = 0.08). Regarding convergent validity, the four domains of the 12-item WHODAS 2.0 (getting around, self-care, life activity, and participation) showed moderate-to-strong correlation for all three domains of the WOMAC (0.428 < |r| < 0.804). Regarding divergent validity, the two domains of the 12-item WHODAS 2.0 (understanding and communication, and getting along with people) showed weak correlation for the three domains of WOMAC (0.182 < |r| < 0.295). The Chinese version of 12-item WHODAS 2.0 questionnaire is a reliable and valid instrument when administered to KBD patients.

  16. Avoiding and Correcting Bias in Score-Based Latent Variable Regression with Discrete Manifest Items

    Science.gov (United States)

    Lu, Irene R. R.; Thomas, D. Roland

    2008-01-01

    This article considers models involving a single structural equation with latent explanatory and/or latent dependent variables where discrete items are used to measure the latent variables. Our primary focus is the use of scores as proxies for the latent variables and carrying out ordinary least squares (OLS) regression on such scores to estimate…

  17. Macrostructural Treatment of Multi-word Lexical Items

    Directory of Open Access Journals (Sweden)

    Alenka Vrbinc

    2011-05-01

    Full Text Available The paper discusses the macrostructural treatment of multi-word lexical items in mono- and bilingual dictionaries. First, the classification of multi-word lexical items is presented, and special attention is paid to the discussion of compounds – a specific group of multi-word lexical items that is most commonly afforded headword status but whose inclusion in the headword list may also depend on spelling. Then the inclusion of multi-word lexical items in monolingual dictionaries is dealt with in greater detail, while the results of a short survey on the inclusion of five randomly chosen multi-word lexical items in seven English monolingual dictionaries are presented. The proposals as to how to treat these five multi-word lexical items in bilingual dictionaries are presented in the section about the inclusion of multi-word lexical items in bilingual dictionaries. The conclusion is that it is most important to take the users’ needs into consideration and to make any dictionary as user friendly as possible.

  18. Losing Items in the Psychogeriatric Nursing Home

    Directory of Open Access Journals (Sweden)

    J. van Hoof PhD

    2016-09-01

    Full Text Available Introduction: Losing items is a time-consuming occurrence in nursing homes that is ill described. An explorative study was conducted to investigate which items got lost by nursing home residents, and how this affects the residents and family caregivers. Method: Semi-structured interviews and card sorting tasks were conducted with 12 residents with early-stage dementia and 12 family caregivers. Thematic analysis was applied to the outcomes of the sessions. Results: The participants stated that numerous personal items and assistive devices get lost in the nursing home environment, which had various emotional, practical, and financial implications. Significant amounts of time are spent on trying to find items, varying from 1 hr up to a couple of weeks. Numerous potential solutions were identified by the interviewees. Discussion: Losing items often goes together with limitations to the participation of residents. Many family caregivers are reluctant to replace lost items, as these items may get lost again.

  19. The Music Therapy Session Assessment Scale (MT-SAS): Validation of a new tool for music therapy process evaluation.

    Science.gov (United States)

    Raglio, Alfredo; Gnesi, Marco; Monti, Maria Cristina; Oasi, Osmano; Gianotti, Marta; Attardo, Lapo; Gontero, Giulia; Morotti, Lara; Boffelli, Sara; Imbriani, Chiara; Montomoli, Cristina; Imbriani, Marcello

    2017-11-01

    Music therapy (MT) interventions are aimed at creating and developing a relationship between patient and therapist. However, there is a lack of validated observational instruments to consistently evaluate the MT process. The purpose of this study was the validation of Music Therapy Session Assessment Scale (MT-SAS), designed to assess the relationship between therapist and patient during active MT sessions. Videotapes of a single 30-min session per patient were considered. A pilot study on the videotapes of 10 patients was carried out to help refine the items, define the scoring system and improve inter-rater reliability among the five raters. Then, a validation study on 100 patients with different clinical conditions was carried out. The Italian MT-SAS was used throughout the process, although we also provide an English translation. The final scale consisted of 7 binary items accounting for eye contact, countenance, and nonverbal and sound-music communication. In the pilot study, raters were found to share an acceptable level of agreement in their assessments. Explorative factorial analysis disclosed a single homogeneous factor including 6 items (thus supporting an ordinal total score), with only the item about eye contact being unrelated to the others. Moreover, the existence of 2 different archetypal profiles of attuned and disattuned behaviours was highlighted through multiple correspondence analysis. As suggested by the consistent results of 2 different analyses, MT-SAS is a reliable tool that globally evaluates sonorous-musical and nonverbal behaviours related to emotional attunement and empathetic relationship between patient and therapist during active MT sessions. Copyright © 2017 John Wiley & Sons, Ltd.

  20. TINGKAT PERSEDIAAN SPARE PART FORKLIFT MEREK KOMATSU DENGAN PENDEKATAN MODEL PERSEDIAAN SINGLE ITEM

    Directory of Open Access Journals (Sweden)

    Wahid Ahmad Jauhari

    2006-04-01

    Full Text Available The control and maintenance of inventories is a problem common to all enterprises in any sector of a given economy. Two fundamental question that must be answered in controlling the inventory are when to replenish the inventory and how much to order for replenishment. The (Q,r inventory models attempt to answer the two question under a variety of circumstances. Studies have shown, (1 that a company that ignores lead-time demand variability may suffer great financial damage, (2 that the gamma distribution provides the most common best fit to lead-time demand for variety of inventories items, (3 that a fixed lead-time demand assumption or a normal approximation to it will often yield significant errors (Namit and Chen, 1998.This research performed an efficient and accurate algorithm for solving (Q,r inventory model with gamma lead-time demand.

  1. Using peers to assess handoffs: a pilot study.

    Science.gov (United States)

    Dine, C Jessica; Wingate, Nicholas; Rosen, Ilene M; Myers, Jennifer S; Lapin, Jennifer; Kogan, Jennifer R; Shea, Judy A

    2013-08-01

    Handoffs among post-graduate year 1 (PGY1) trainees occur with high frequency. Peer assessment of handoff competence would add a new perspective on how well the handoff information helped them to provide optimal patient care. The goals of this study were to test the feasibility of the approach of an instrument for peer assessment of handoffs by meeting criteria of being able to use technology to capture evaluations in real time, exhibiting strong psychometric properties, and having high PGY1 satisfaction scores. An iPad® application was built for a seven-item handoff instrument. Over a two-month period, post-call PGY1s completed assessments of three co-PGY1s from whom they received handoffs the prior evening. Internal Medicine PGY1s at the University of Pennsylvania. ANOVA was used to explore interperson score differences (validity). Generalizability analyses provided estimates of score precision (reproducibility). PGY1s completed satisfaction surveys about the process. Sixty-two PGY1s (100 %) participated in the study. 59 % of the targeted evaluations were completed. The major limitations were network connectivity and inability to find the post-call trainee. PGY1 scores on the single item of "overall competency" ranged from 4 to 9 with a mean of 7.31 (SD 1.09). Generalizability coefficients approached 0.60 for 10 evaluations per PGY1 for a single rotation and 12 evaluations per PGY1 across multiple rotations. The majority of PGY1s believed that they could adequately assess handoff competence and that the peer assessment process was valuable (70 and 77 %, respectively). Psychometric properties of an instrument for peer assessment of handoffs are encouraging. Obtaining 10 or 12 evaluations per PGY1 allowed for reliable assessment of handoff skills. Peer evaluations of handoffs using mobile technology were feasible, and were well received by PGY1s.

  2. A Method for Generating Educational Test Items That Are Aligned to the Common Core State Standards

    Science.gov (United States)

    Gierl, Mark J.; Lai, Hollis; Hogan, James B.; Matovinovic, Donna

    2015-01-01

    The demand for test items far outstrips the current supply. This increased demand can be attributed, in part, to the transition to computerized testing, but, it is also linked to dramatic changes in how 21st century educational assessments are designed and administered. One way to address this growing demand is with automatic item generation.…

  3. ‘Forget me (not?’ – Remembering forget-items versus un-cued items in directed forgetting

    Directory of Open Access Journals (Sweden)

    Bastian eZwissler

    2015-11-01

    Full Text Available Humans need to be able to selectively control their memories. Here, we investigate the underlying processes in item-method directed forgetting and compare the classic active memory cues in this paradigm with a passive instruction. Typically, individual items are presented and each is followed by either a forget- or remember-instruction. On a surprise test of all items, memory is then worse for to-be-forgotten items (TBF compared to to-be-remembered items (TBR. This is thought to result from selective rehearsal of TBR, or from active inhibition of TBF, or from both. However, evidence suggests that if a forget instruction initiates active processing, paradoxical effects may also arise. To investigate the underlying mechanisms, four experiments were conducted where un-cued items (UI were introduced and recognition performance was compared between TBR, TBF and UI stimuli. Accuracy was encouraged via a performance-dependent monetary bonus. Across all experiments, including perceptually fully matched variants, memory accuracy for TBF was reduced compared to TBR, but better than for UI. Moreover, participants used a more conservative response criterion when responding to TBF stimuli. Thus, ironically, the F cue results in active processing, but this does not have inhibitory effects that would impair recognition memory beyond a un-cued baseline condition. This casts doubts on inhibitory accounts of item-method directed forgetting and is also difficult to reconcile with pure selective rehearsal of TBR. While the F-cue does induce active processing, this does not result in particularly successful forgetting. The pattern seems most consistent with the notion of ironic processing.

  4. TEDS-M 2008 User Guide for the International Database. Supplement 4: TEDS-M Released Mathematics and Mathematics Pedagogy Knowledge Assessment Items

    Science.gov (United States)

    Brese, Falk, Ed.

    2012-01-01

    The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…

  5. Passive ultra high frequency radio frequency identification systems for single-item identification in food supply chains

    Directory of Open Access Journals (Sweden)

    Paolo Barge

    2017-02-01

    Full Text Available In the food industry, composition, size, and shape of items are much less regular than in other commodities sectors. In addition, a wide variety of packaging, composed by different materials, is employed. As material, size and shape of items to which the tag should be attached strongly influence the minimum power requested for tag functioning, performance improvements can be achieved only selecting suitable radio frequency (RF identifiers for the specific combination of food product and packaging. When dealing with logistics units, the dynamic reading of a vast number of tags could originate simultaneous broadcasting of signals (tag-to-tag collisions that could affect reading rates and the overall reliability of the identification procedure. This paper reports the results of an analysis of the reading performance of ultra high frequency radio frequency identification systems for multiple static and dynamic electronic identification of food packed products in controlled conditions. Products were considered when arranged on a logistics pallet. The effects on reading rate of different factors, among which the product type, the gate configuration, the field polarisation, the power output of the RF reader, the interrogation protocol configuration as well as the transit speed, the number of tags and their interactions were statistically analysed and compared.

  6. Assessing Health Status in Inflammatory Bowel Disease using a Novel Single-Item Numeric Rating Scale

    Science.gov (United States)

    Surti, Bijal; Spiegel, Brennan; Ippoliti, Andrew; Vasiliauskas, Eric; Simpson, Peter; Shih, David; Targan, Stephan; McGovern, Dermot; Melmed, Gil Y.

    2014-01-01

    Background Current instruments used to measure disease activity and health-related quality of life (HRQOL) in patients with Crohn’s disease (CD) and ulcerative colitis (UC) are often cumbersome, time-consuming, and expensive; although used in clinical trials, they are not convenient for clinical practice. A numeric rating scale (NRS) is a quick, inexpensive, and convenient patient-reported outcome (PRO) that can capture the patient’s overall perception of health. Aims To assess the validity, reliability, and responsiveness of an NRS and evaluate its use in clinical practice in patients with CD and UC. Methods We prospectively evaluated patient-reported NRS scores and measured correlations between NRS and a range of severity measures, including physician-reported NRS, Crohn’s disease activity index (CDAI), Harvey-Bradshaw index (HBI), inflammatory bowel disease questionnaire (IBDQ), and C-reactive protein (CRP) in patients with CD. Subsequently, we evaluated the correlation between the NRS and standard measures of health status (HBI or simple colitis clinical activity index [SCCAI]) and laboratory tests (sedimentation rate [ESR], CRP, and fecal calprotectin) in patients with CD and UC. Results The patient-reported NRS showed excellent correlation with CDAI (R2=0.59, p<0.0001), IBDQ (R2=0.66, p<0.0001), and HBI (R2=0.32, p<0.0001) in patients with CD. The NRS showed poor, but statistically significant correlation with SCCAI (R2=0.25, p<0.0001) in patients with UC. The NRS did not correlate with CRP, ESR, or calprotectin. The NRS was reliable and responsive to change. Conclusions The NRS is a valid, reliable, and responsive measure that may be useful to evaluate patients with CD and possibly UC. PMID:23250673

  7. Assessing the measurement of aerosol single scattering albedo by Cavity Attenuated Phase-Shift Single Scattering Monitor (CAPS PMssa)

    Science.gov (United States)

    Perim de Faria, Julia; Bundke, Ulrich; Onasch, Timothy B.; Freedman, Andrew; Petzold, Andreas

    2016-04-01

    The necessity to quantify the direct impact of aerosol particles on climate forcing is already well known; assessing this impact requires continuous and systematic measurements of the aerosol optical properties. Two of the main parameters that need to be accurately measured are the aerosol optical depth and single scattering albedo (SSA, defined as the ratio of particulate scattering to extinction). The measurement of single scattering albedo commonly involves the measurement of two optical parameters, the scattering and the absorption coefficients. Although there are well established technologies to measure both of these parameters, the use of two separate instruments with different principles and uncertainties represents potential sources of significant errors and biases. Based on the recently developed cavity attenuated phase shift particle extinction monitor (CAPS PM_{ex) instrument, the CAPS PM_{ssa instrument combines the CAPS technology to measure particle extinction with an integrating sphere capable of simultaneously measuring the scattering coefficient of the same sample. The scattering channel is calibrated to the extinction channel, such that the accuracy of the single scattering albedo measurement is only a function of the accuracy of the extinction measurement and the nephelometer truncation losses. This gives the instrument an accurate and direct measurement of the single scattering albedo. In this study, we assess the measurements of both the extinction and scattering channels of the CAPS PM_{ssa through intercomparisons with Mie theory, as a fundamental comparison, and with proven technologies, such as integrating nephelometers and filter-based absorption monitors. For comparison, we use two nephelometers, a TSI 3563 and an Aurora 4000, and two measurements of the absorption coefficient, using a Particulate Soot Absorption Photometer (PSAP) and a Multi Angle Absorption Photometer (MAAP). We also assess the indirect absorption coefficient

  8. "Detecting Differential Item Functioning and Differential Step Functioning due to Differences that ""Should"" Matter"

    Directory of Open Access Journals (Sweden)

    Tess Miller

    2010-07-01

    Full Text Available This study illustrates the use of differential item functioning (DIF and differential step functioning (DSF analyses to detect differences in item difficulty that are related to experiences of examinees, such as their teachers' instructional practices, that are relevant to the knowledge, skill, or ability the test is intended to measure. This analysis is in contrast to the typical use of DIF or DSF to detect differences related to characteristics of examinees, such as gender, language, or cultural knowledge, that should be irrelevant. Using data from two forms of Ontario's Grade 9 Assessment of Mathematics, analyses were performed comparing groups of students defined by their teachers' instructional practices. All constructed-response items were tested for DIF using the Mantel Chi-Square, standardized Liu Agresti cumulative common log-odds ratio, and standardized Cox's noncentrality parameter. Items exhibiting moderate to large DIF were subsequently tested for DSF. In contrast to typical DIF or DSF analyses, which inform item development, these analyses have the potential to inform instructional practice.

  9. Prediction of true test scores from observed item scores and ancillary data.

    Science.gov (United States)

    Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

    2015-05-01

    In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.

  10. APOLLO: a quality assessment service for single and multiple protein models.

    Science.gov (United States)

    Wang, Zheng; Eickholt, Jesse; Cheng, Jianlin

    2011-06-15

    We built a web server named APOLLO, which can evaluate the absolute global and local qualities of a single protein model using machine learning methods or the global and local qualities of a pool of models using a pair-wise comparison approach. Based on our evaluations on 107 CASP9 (Critical Assessment of Techniques for Protein Structure Prediction) targets, the predicted quality scores generated from our machine learning and pair-wise methods have an average per-target correlation of 0.671 and 0.917, respectively, with the true model quality scores. Based on our test on 92 CASP9 targets, our predicted absolute local qualities have an average difference of 2.60 Å with the actual distances to native structure. http://sysbio.rnet.missouri.edu/apollo/. Single and pair-wise global quality assessment software is also available at the site.

  11. A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure.

    Science.gov (United States)

    Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C

    2014-12-01

    It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.

  12. Negative affectivity in cardiovascular disease: Evaluating Type D personality assessment using item response theory

    NARCIS (Netherlands)

    Emons, Wilco H.M.; Meijer, R.R.; Denollet, Johan

    2007-01-01

    Objective: Individuals with increased levels of both negative affectivity (NA) and social inhibition (SI)—referred to as type-D personality—are at increased risk of adverse cardiac events. We used item response theory (IRT) to evaluate NA, SI, and type-D personality as measured by the DS14. The

  13. Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients.

    Science.gov (United States)

    Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa

    2017-11-01

    The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.

  14. Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7.

    Science.gov (United States)

    Kisala, Pamela A; Tulsky, David S; Kalpakjian, Claire Z; Heinemann, Allen W; Pohlig, Ryan T; Carle, Adam; Choi, Seung W

    2015-05-01

    To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.

  15. The 10-item Remembered Relationship with Parents (RRP10) scale

    DEFF Research Database (Denmark)

    Denollet, Johan; Smolderen, Kim G E; van den Broek, Krista C

    2007-01-01

    Dysfunctional parenting styles are associated with poor mental and physical health. The 10-item Remembered Relationship with Parents (RRP(10)) scale retrospectively assesses Alienation (dysfunctional communication and intimacy) and Control (overprotection by parents), with an emphasis...... on deficiencies in empathic parenting. We examined the 2-factor structure of the RRP(10) and its relationship with adult depression....

  16. A Comparison of Item Fit Statistics for Mixed IRT Models

    Science.gov (United States)

    Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B.

    2010-01-01

    In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…

  17. Detection and validation of unscalable item score patterns using item response theory: an illustration with Harter's Self-Perception Profile for Children.

    Science.gov (United States)

    Meijer, Rob R; Egberink, Iris J L; Emons, Wilco H M; Sijtsma, Klaas

    2008-05-01

    We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985) Self-Perception Profile for Children (Harter, 1985) in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.

  18. 76 FR 68376 - Recoupment of Nonrecurring Costs (NCs) on Sales of U.S. Items

    Science.gov (United States)

    2011-11-04

    ... amended, and section 9701 of title 31, United States Code (U.S.C.), for calculating and assessing NC... section of the economy; productivity; competition; jobs; the environment; public health or safety; or....) for calculating and assessing NC recoupment charges on sales of items developed for or by the...

  19. Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds : Multiple-Choice Versus Constructed-Response Training Items

    NARCIS (Netherlands)

    Stevenson, C.E.; Heiser, W.J.; Resing, W.C.M.

    2016-01-01

    Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC

  20. Measuring everyday functional competence using the Rasch assessment of everyday activity limitations (REAL) item bank

    NARCIS (Netherlands)

    Oude Voshaar, Martijn A.H.; Ten Klooster, Peter M.; Vonkeman, Harald E.; van de Laar, Mart A.F.J.

    2017-01-01

    Objective: Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Study

  1. Evaluating the validity of the Work Role Functioning Questionnaire (Canadian French version) using classical test theory and item response theory.

    Science.gov (United States)

    Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal

    2017-01-01

    The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.

  2. Using Procedure Based on Item Response Theory to Evaluate Classification Consistency Indices in the Practice of Large-Scale Assessment

    Directory of Open Access Journals (Sweden)

    Shanshan Zhang

    2017-09-01

    Full Text Available In spite of the growing interest in the methods of evaluating the classification consistency (CC indices, only few researches are available in the field of applying these methods in the practice of large-scale educational assessment. In addition, only few studies considered the influence of practical factors, for example, the examinee ability distribution, the cut score location and the score scale, on the performance of CC indices. Using the newly developed Lee's procedure based on the item response theory (IRT, the main purpose of this study is to investigate the performance of CC indices when practical factors are taken into consideration. A simulation study and an empirical study were conducted under comprehensive conditions. Results suggested that with negatively skewed distribution, the CC indices were larger than with other distributions. Interactions occurred among ability distribution, cut score location, and score scale. Consequently, Lee's IRT procedure is reliable to be used in the field of large-scale educational assessment, and when reporting the indices, it should be treated with caution as testing conditions may vary a lot.

  3. Summarizing activity limitations in children with chronic illnesses living in the community: a measurement study of scales using supplemented interRAI items

    Directory of Open Access Journals (Sweden)

    Phillips Charles D

    2012-01-01

    Full Text Available Abstract Background To test the validity and reliability of scales intended to measure activity limitations faced by children with chronic illnesses living in the community. The scales were based on information provided by caregivers to service program personnel almost exclusively trained as social workers. The items used to measure activity limitations were interRAI items supplemented so that they were more applicable to activity limitations in children with chronic illnesses. In addition, these analyses may shed light on the possibility of gathering functional information that can span the life course as well as spanning different care settings. Methods Analyses included testing the internal consistency, predictive, concurrent, discriminant and construct validity of two activity limitation scales. The scales were developed using assessment data gathered in the United States of America (USA from over 2,700 assessments of children aged 4 to 20 receiving Medicaid Early and Periodic Screening, Diagnostic and Treatment (EPSDT services, specifically Personal Care Services to assist children in overcoming activity limitations. The Medicaid program in the USA pays for health care services provided to children in low-income households. Data were collected in a single, large state in the southwestern USA in late 2008 and early 2009. A similar sample of children was assessed in 2010, and the analyses were replicated using this sample. Results The two scales exhibited excellent internal consistency. Evidence on the concurrent, predictive, discriminant, and construct validity of the proposed scales was strong. Quite importantly, scale scores were not correlated with (confounded with a child's developmental stage or age. The results for these scales and items were consistent across the two independent samples. Conclusions Unpaid caregivers, usually parents, can provide assessors lacking either medical or nursing training with reliable and valid information

  4. Summarizing activity limitations in children with chronic illnesses living in the community: a measurement study of scales using supplemented interRAI items.

    Science.gov (United States)

    Phillips, Charles D; Patnaik, Ashweeta; Moudouni, Darcy K; Naiser, Emily; Dyer, James A; Hawes, Catherine; Fournier, Constance J; Miller, Thomas R; Elliott, Timothy R

    2012-01-23

    To test the validity and reliability of scales intended to measure activity limitations faced by children with chronic illnesses living in the community. The scales were based on information provided by caregivers to service program personnel almost exclusively trained as social workers. The items used to measure activity limitations were interRAI items supplemented so that they were more applicable to activity limitations in children with chronic illnesses. In addition, these analyses may shed light on the possibility of gathering functional information that can span the life course as well as spanning different care settings. Analyses included testing the internal consistency, predictive, concurrent, discriminant and construct validity of two activity limitation scales. The scales were developed using assessment data gathered in the United States of America (USA) from over 2,700 assessments of children aged 4 to 20 receiving Medicaid Early and Periodic Screening, Diagnostic and Treatment (EPSDT) services, specifically Personal Care Services to assist children in overcoming activity limitations. The Medicaid program in the USA pays for health care services provided to children in low-income households. Data were collected in a single, large state in the southwestern USA in late 2008 and early 2009. A similar sample of children was assessed in 2010, and the analyses were replicated using this sample. The two scales exhibited excellent internal consistency. Evidence on the concurrent, predictive, discriminant, and construct validity of the proposed scales was strong. Quite importantly, scale scores were not correlated with (confounded with) a child's developmental stage or age. The results for these scales and items were consistent across the two independent samples. Unpaid caregivers, usually parents, can provide assessors lacking either medical or nursing training with reliable and valid information on the activity limitations of children. One can summarize these

  5. Gender Differences in Scientific Literacy of HKPISA 2006: A Multidimensional Differential Item Functioning and Multilevel Mediation Study

    Science.gov (United States)

    Wong, Kwan Yin

    The aim of this study is to investigate the effect of gender differences of 15-year-old students on scientific literacy and their impacts on students’ motivation to pursue science education and careers (Future-oriented Science Motivation) in Hong Kong. The data for this study was collected from the Program for International Student Assessment in Hong Kong (HKPISA). It was carried out in 2006. A total of 4,645 students were randomly selected from 146 secondary schools including government, aided and private schools by two-stage stratified sampling method for the assessment. HKPISA 2006, like most of other large-scale international assessments, presents its assessment frameworks in multidimensional subscales. To fulfill the requirements of this multidimensional assessment framework, this study deployed new approaches to model and investigate gender differences in cognitive and affective latent traits of scientific literacy by using multidimensional differential item functioning (MDIF) and multilevel mediation (MLM). Compared with mean score difference t-test, MDIF improves the precision of each subscales measure at item level and the gender differences in science performance can be accurately estimated. In the light of Eccles et al (1983) Expectancy-value Model of Achievement-related Choices (Eccles’ Model), MLM examines the pattern of gender effects on Future-oriented Science Motivation mediated through cognitive and affective factors. As for MLM investigation, Single-Group Confirmatory Factor Analysis (Single-Group CFA) was used to confirm the applicability and validity of six affective factors which was, originally prepared by OECD. These six factors are Science Self-concept, Personal Value of Science, Interest in Science Learning, Enjoyment of Science Learning, Instrumental Motivation to Learn Science and Future-oriented Science Motivation. Then, Multiple Group CFA was used to verify measurement invariance of these factors across gender groups. The results of

  6. Application of Group-Level Item Response Models in the Evaluation of Consumer Reports about Health Plan Quality

    Science.gov (United States)

    Reise, Steven P.; Meijer, Rob R.; Ainsworth, Andrew T.; Morales, Leo S.; Hays, Ron D.

    2006-01-01

    Group-level parametric and non-parametric item response theory models were applied to the Consumer Assessment of Healthcare Providers and Systems (CAHPS[R]) 2.0 core items in a sample of 35,572 Medicaid recipients nested within 131 health plans. Results indicated that CAHPS responses are dominated by within health plan variation, and only weakly…

  7. Psychometric validation of the Persian nine-item Internet Gaming Disorder Scale - Short Form: Does gender and hours spent online gaming affect the interpretations of item descriptions?

    Science.gov (United States)

    Wu, Tzu-Yi; Lin, Chung-Ying; Årestedt, Kristofer; Griffiths, Mark D; Broström, Anders; Pakpour, Amir H

    2017-06-01

    Background and aims The nine-item Internet Gaming Disorder Scale - Short Form (IGDS-SF9) is brief and effective to evaluate Internet Gaming Disorder (IGD) severity. Although its scores show promising psychometric properties, less is known about whether different groups of gamers interpret the items similarly. This study aimed to verify the construct validity of the Persian IGDS-SF9 and examine the scores in relation to gender and hours spent online gaming among 2,363 Iranian adolescents. Methods Confirmatory factor analysis (CFA) and Rasch analysis were used to examine the construct validity of the IGDS-SF9. The effects of gender and time spent online gaming per week were investigated by multigroup CFA and Rasch differential item functioning (DIF). Results The unidimensionality of the IGDS-SF9 was supported in both CFA and Rasch. However, Item 4 (fail to control or cease gaming activities) displayed DIF (DIF contrast = 0.55) slightly over the recommended cutoff in Rasch but was invariant in multigroup CFA across gender. Items 4 (DIF contrast = -0.67) and 9 (jeopardize or lose an important thing because of gaming activity; DIF contrast = 0.61) displayed DIF in Rasch and were non-invariant in multigroup CFA across time spent online gaming. Conclusions Given the Persian IGDS-SF9 was unidimensional, it is concluded that the instrument can be used to assess IGD severity. However, users of the instrument are cautioned concerning the comparisons of the sum scores of the IGDS-SF9 across gender and across adolescents spending different amounts of time online gaming.

  8. Examination of a clinical teaching effectiveness instrument used for summative faculty assessment.

    Science.gov (United States)

    Bierer, S Beth; Hull, Alan L

    2007-12-01

    This study explores whether a clinical teaching effectiveness (CTE) instrument provides valid scores for summative faculty assessment. The sample included all CTE instruments (n = 10,087) that learners (N = 1,194) completed to assess clinical teachers (N = 872) during 1 academic year. The authors investigated response processes (e.g., missing data, straight-line responses, level of learner), internal structure (e.g., confirmatory and exploratory factor analysis), teaching ratings by learner group (medical student or resident), and relation to other variables (e.g., correlation with global rating). Response processes identified a high prevalence of straight-line responses (same rating across all items) and differential patterns of missing data by learner group. Medical students rated their teachers higher than residents, and CTE scores had different factor structures depending on learner group. High correlation coefficients of CTE items with a single rating of overall teaching performance suggest that learners consider global performance when assessing clinical teaching performance.

  9. Bayesian modeling of measurement error in predictor variables using item response theory

    NARCIS (Netherlands)

    Fox, Gerardus J.A.; Glas, Cornelis A.W.

    2000-01-01

    This paper focuses on handling measurement error in predictor variables using item response theory (IRT). Measurement error is of great important in assessment of theoretical constructs, such as intelligence or the school climate. Measurement error is modeled by treating the predictors as unobserved

  10. What Do You Think You Are Measuring? A Mixed-Methods Procedure for Assessing the Content Validity of Test Items and Theory-Based Scaling

    Science.gov (United States)

    Koller, Ingrid; Levenson, Michael R.; Glück, Judith

    2017-01-01

    The valid measurement of latent constructs is crucial for psychological research. Here, we present a mixed-methods procedure for improving the precision of construct definitions, determining the content validity of items, evaluating the representativeness of items for the target construct, generating test items, and analyzing items on a theoretical basis. To illustrate the mixed-methods content-scaling-structure (CSS) procedure, we analyze the Adult Self-Transcendence Inventory, a self-report measure of wisdom (ASTI, Levenson et al., 2005). A content-validity analysis of the ASTI items was used as the basis of psychometric analyses using multidimensional item response models (N = 1215). We found that the new procedure produced important suggestions concerning five subdimensions of the ASTI that were not identifiable using exploratory methods. The study shows that the application of the suggested procedure leads to a deeper understanding of latent constructs. It also demonstrates the advantages of theory-based item analysis. PMID:28270777

  11. Reliability of a single objective measure in assessing sleepiness.

    Science.gov (United States)

    Sunwoo, Bernie Y; Jackson, Nicholas; Maislin, Greg; Gurubhagavatula, Indira; George, Charles F; Pack, Allan I

    2012-01-01

    To evaluate reliability of single objective tests in assessing sleepiness. Subjects who completed polysomnography underwent a 4-nap multiple sleep latency test (MSLT) the following day. Prior to each nap opportunity on MSLT, subjects performed the psychomotor vigilance test (PVT) and divided attention driving task (DADT). Results of single versus multiple test administrations were compared using the intraclass correlation coefficient (ICC) and adjusted for test administration order effects to explore time of day effects. Measures were explored as continuous and binary (i.e., impaired or not impaired). Community-based sample evaluated at a tertiary, university-based sleep center. 372 adult commercial vehicle operators oversampled for increased obstructive sleep apnea risk. N/A. AS CONTINUOUS MEASURES, ICC WERE AS FOLLOWS: MSLT 0.45, PVT median response time 0.69, PVT number of lapses 0.51, 10-min DADT tracking error 0.87, 20-min DADT tracking error 0.90. Based on binary outcomes, ICC were: MSLT 0.63, PVT number of lapses 0.85, 10-min DADT 0.95, 20-min DADT 0.96. Statistically significant time of day effects were seen in both the MSLT and PVT but not the DADT. Correlation between ESS and different objective tests was strongest for MSLT, range [-0.270 to -0.195] and persisted across all time points. Single DADT and PVT administrations are reliable measures of sleepiness. A single MSLT administration can reasonably discriminate individuals with MSL < 8 minutes. These results support the use of a single administration of some objective tests of sleepiness when performed under controlled conditions in routine clinical care.

  12. Reducing the item number to obtain the same-length self-assessment scales: a systematic approach using result of graphical loglinear rasch models

    DEFF Research Database (Denmark)

    Nielsen, Tine; Kreiner, Svend

    2011-01-01

    The Revised Danish Learning Styles Inventory (R-D-LSI) (Nielsen 2005), which is an adaptation of Sternberg- Wagner Thinking Styles Inventory (Sternberg, 1997), comprises 14 subscales, each measuring a separate learning style. Of these 14 subscales, 9 are eight items long and 5 are seven items long...... Inventory (D-SA-LSI) comprising 14 subscales each with an item length of seven. The systematic approach to item reduction based on results of GLLRM will be presented and exemplified by its application to the R-D-LSI....

  13. Software Note: Using BILOG for Fixed-Anchor Item Calibration

    Science.gov (United States)

    DeMars, Christine E.; Jurich, Daniel P.

    2012-01-01

    The nonequivalent groups anchor test (NEAT) design is often used to scale item parameters from two different test forms. A subset of items, called the anchor items or common items, are administered as part of both test forms. These items are used to adjust the item calibrations for any differences in the ability distributions of the groups taking…

  14. Inventions on presenting textual items in Graphical User Interface

    OpenAIRE

    Mishra, Umakant

    2014-01-01

    Although a GUI largely replaces textual descriptions by graphical icons, the textual items are not completely removed. The textual items are inevitably used in window titles, message boxes, help items, menu items and popup items. Textual items are necessary for communicating messages that are beyond the limitation of graphical messages. However, it is necessary to harness the textual items on the graphical interface in such a way that they complement each other to produce the best effect. One...

  15. Memory for Items and Relationships among Items Embedded in Realistic Scenes: Disproportionate Relational Memory Impairments in Amnesia

    Science.gov (United States)

    Hannula, Deborah E.; Tranel, Daniel; Allen, John S.; Kirchhoff, Brenda A.; Nickel, Allison E.; Cohen, Neal J.

    2014-01-01

    Objective The objective of this study was to examine the dependence of item memory and relational memory on medial temporal lobe (MTL) structures. Patients with amnesia, who either had extensive MTL damage or damage that was relatively restricted to the hippocampus, were tested, as was a matched comparison group. Disproportionate relational memory impairments were predicted for both patient groups, and those with extensive MTL damage were also expected to have impaired item memory. Method Participants studied scenes, and were tested with interleaved two-alternative forced-choice probe trials. Probe trials were either presented immediately after the corresponding study trial (lag 1), five trials later (lag 5), or nine trials later (lag 9) and consisted of the studied scene along with a manipulated version of that scene in which one item was replaced with a different exemplar (item memory test) or was moved to a new location (relational memory test). Participants were to identify the exact match of the studied scene. Results As predicted, patients were disproportionately impaired on the test of relational memory. Item memory performance was marginally poorer among patients with extensive MTL damage, but both groups were impaired relative to matched comparison participants. Impaired performance was evident at all lags, including the shortest possible lag (lag 1). Conclusions The results are consistent with the proposed role of the hippocampus in relational memory binding and representation, even at short delays, and suggest that the hippocampus may also contribute to successful item memory when items are embedded in complex scenes. PMID:25068665

  16. Applying Hierarchical Model Calibration to Automatically Generated Items.

    Science.gov (United States)

    Williamson, David M.; Johnson, Matthew S.; Sinharay, Sandip; Bejar, Isaac I.

    This study explored the application of hierarchical model calibration as a means of reducing, if not eliminating, the need for pretesting of automatically generated items from a common item model prior to operational use. Ultimately the successful development of automatic item generation (AIG) systems capable of producing items with highly similar…

  17. 41 CFR 101-27.404 - Review of items.

    Science.gov (United States)

    2010-07-01

    ... 41 Public Contracts and Property Management 2 2010-07-01 2010-07-01 true Review of items. 101-27.404 Section 101-27.404 Public Contracts and Property Management Federal Property Management...-Elimination of Items From Inventory § 101-27.404 Review of items. Except for standby or reserve stocks, items...

  18. Towards an authoring system for item construction

    NARCIS (Netherlands)

    Rikers, Jos H.A.N.

    1988-01-01

    The process of writing test items is analyzed, and a blueprint is presented for an authoring system for test item writing to reduce invalidity and to structure the process of item writing. The developmental methodology is introduced, and the first steps in the process are reported. A historical

  19. Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

    Science.gov (United States)

    Baghaei, Purya; Ravand, Hamdollah

    2016-01-01

    In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…

  20. Measuring depression after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Depression item bank and linkage with PHQ-9.

    Science.gov (United States)

    Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Bombardier, Charles H; Pohlig, Ryan T; Heinemann, Allen W; Carle, Adam; Choi, Seung W

    2015-05-01

    To develop a calibrated spinal cord injury-quality of life (SCI-QOL) item bank, computer adaptive test (CAT), and short form to assess depressive symptoms experienced by individuals with SCI, transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a crosswalk to the Patient Health Questionnaire (PHQ)-9. We used grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, item response theory (IRT) analyses, and statistical linking techniques to transform scores to a PROMIS metric and to provide a crosswalk with the PHQ-9. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. Spinal Cord Injury--Quality of Life (SCI-QOL) Depression Item Bank Individuals with SCI were involved in all phases of SCI-QOL development. A sample of 716 individuals with traumatic SCI completed 35 items assessing depression, 18 of which were PROMIS items. After removing 7 non-PROMIS items, factor analyses confirmed a unidimensional pool of items. We used a graded response IRT model to estimate slopes and thresholds for the 28 retained items. The SCI-QOL Depression measure correlated 0.76 with the PHQ-9. The SCI-QOL Depression item bank provides a reliable and sensitive measure of depressive symptoms with scores reported in terms of general population norms. We provide a crosswalk to the PHQ-9 to facilitate comparisons between measures. The item bank may be administered as a CAT or as a short form and is suitable for research and clinical applications.