Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina
This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
G. K. Huysamen
Full Text Available This article briefly describes the main procedures for performing differential item functioning (DIF analyses and points out some of the statistical and extra-statistical implications of these methods. Research findings on the sources of DIF, including those associated with translated tests, are reviewed. As DIF analyses are oblivious of correlations between a test and relevant criteria, the elimination of differentially functioning items does not necessarily improve predictive validity or reduce any predictive bias. The implications of the results of past DIF research for test development in the multilingual and multi-cultural South African society are considered. Opsomming Hierdie artikel beskryf kortliks die hoofprosedures vir die ontleding van differensiële itemfunksionering (DIF en verwys na sommige van die statistiese en buite-statistiese implikasies van hierdie metodes. ’n Oorsig word verskaf van navorsingsbevindings oor die bronne van DIF, insluitend dié by vertaalde toetse. Omdat DIF-ontledings nie die korrelasies tussen ’n toets en relevante kriteria in ag neem nie, sal die verwydering van differensieel-funksionerende items nie noodwendig voorspellingsgeldigheid verbeter of voorspellingsydigheid verminder nie. Die implikasies van vorige DIF-navorsingsbevindings vir toetsontwikkeling in die veeltalige en multikulturele Suid-Afrikaanse gemeenskap word oorweeg.
Scott, Neil W; Fayers, Peter M; Aaronson, Neil K
Differential item functioning (DIF) analyses are commonly used to evaluate health-related quality of life (HRQoL) instruments. There is, however, a lack of consensus as to how to assess the practical impact of statistically significant DIF results.......Differential item functioning (DIF) analyses are commonly used to evaluate health-related quality of life (HRQoL) instruments. There is, however, a lack of consensus as to how to assess the practical impact of statistically significant DIF results....
Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Full Text Available This study illustrates the use of differential item functioning (DIF and differential step functioning (DSF analyses to detect differences in item difficulty that are related to experiences of examinees, such as their teachers' instructional practices, that are relevant to the knowledge, skill, or ability the test is intended to measure. This analysis is in contrast to the typical use of DIF or DSF to detect differences related to characteristics of examinees, such as gender, language, or cultural knowledge, that should be irrelevant. Using data from two forms of Ontario's Grade 9 Assessment of Mathematics, analyses were performed comparing groups of students defined by their teachers' instructional practices. All constructed-response items were tested for DIF using the Mantel Chi-Square, standardized Liu Agresti cumulative common log-odds ratio, and standardized Cox's noncentrality parameter. Items exhibiting moderate to large DIF were subsequently tested for DSF. In contrast to typical DIF or DSF analyses, which inform item development, these analyses have the potential to inform instructional practice.
Full Text Available University students in Croatia, Slovenia, and Sweden (N = 1129 were examined by means of the Emotional Skills and Competence Questionnaire (Takšić, 1998. Results showed a significant effect for the sex factor only on the total-score scale, women scoring higher than men, but significant effects were obtained for country, as well as for sex, on the Express and Label (EL and Perceive and Understand (PU subscales. Sweden showed higher scores than Croatia and Slovenia on the EL scale, and Slovenia showed higher scores than Croatia and Sweden on the PU scale. In subsequent analyses of differential item functioning (DIF, comparisons were carried out for pairs of countries. The analyses revealed that a large proportion of the items in the total-score scale were potentially biased, most so for the Croatian-Swedish comparison, less for the Slovenian-Swedish comparison, and least for the Croatian-Slovenian comparison. These findings give doubts about the validity of mean score differences in comparisons of countries. However, DIF analyses of sex differences within each country show very few DIF items, indicating that the ESCQ instrument works well within each cultural/linguistic setting. Possible explanations of the findings are discussed, and improvements for future studies are suggested.
Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models
Akkermans, Wies; Muraki, Eiji
For trinary partial credit items the shape of the item information and the item discrimination function is examined in relation to the item parameters. In particular, it is shown that these functions are unimodal if δ2 – δ1 < 4 ln 2 and bimodal otherwise. The locations and values of the maxima are
Scott, Neil W.; Fayers, Peter M.; Aaronson, Neil K.
Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise...
Knol Dirk L
Full Text Available Abstract Background For the Low Vision Quality Of Life questionnaire (LVQOL it is unknown whether the psychometric properties are satisfactory when an item response theory (IRT perspective is considered. This study evaluates some essential psychometric properties of the LVQOL questionnaire in an IRT model, and investigates differential item functioning (DIF. Methods Cross-sectional data were used from an observational study among visually-impaired patients (n = 296. Calibration was performed for every dimension of the LVQOL in the graded response model. Item goodness-of-fit was assessed with the S-X2-test. DIF was assessed on relevant background variables (i.e. age, gender, visual acuity, eye condition, rehabilitation type and administration type with likelihood-ratio tests for DIF. The magnitude of DIF was interpreted by assessing the largest difference in expected scores between subgroups. Measurement precision was assessed by presenting test information curves; reliability with the index of subject separation. Results All items of the LVQOL dimensions fitted the model. There was significant DIF on several items. For two items the maximum difference between expected scores exceeded one point, and DIF was found on multiple relevant background variables. Item 1 'Vision in general' from the "Adjustment" dimension and item 24 'Using tools' from the "Reading and fine work" dimension were removed. Test information was highest for the "Reading and fine work" dimension. Indices for subject separation ranged from 0.83 to 0.94. Conclusions The items of the LVQOL showed satisfactory item fit to the graded response model; however, two items were removed because of DIF. The adapted LVQOL with 21 items is DIF-free and therefore seems highly appropriate for use in heterogeneous populations of visually impaired patients.
Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel
We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
Scott, Neil W; Fayers, Peter M; Aaronson, Neil K
Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise ...... when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application....
Tutz, Gerhard; Berger, Moritz
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
JOSEPH P. EIMICKE
Full Text Available The aims of this paper are to present findings related to differential item functioning (DIF in the Patient Reported Outcome Measurement Information System (PROMIS depression item bank, and to discuss potential threats to the validity of results from studies of DIF. The 32 depression items studied were modified from several widely used instruments. DIF analyses of gender, age and education were performed using a sample of 735 individuals recruited by a survey polling firm. DIF hypotheses were generated by asking content experts to indicate whether or not they expected DIF to be present, and the direction of the DIF with respect to the studied comparison groups. Primary analyses were conducted using the graded item response model (for polytomous, ordered response category data with likelihood ratio tests of DIF, accompanied by magnitude measures. Sensitivity analyses were performed using other item response models and approaches to DIF detection. Despite some caveats, the items that are recommended for exclusion or for separate calibration were "I felt like crying" and "I had trouble enjoying things that I used to enjoy." The item, "I felt I had no energy," was also flagged as evidencing DIF, and recommended for additional review. On the one hand, false DIF detection (Type 1 error was controlled to the extent possible by ensuring model fit and purification. On the other hand, power for DIF detection might have been compromised by several factors, including sparse data and small sample sizes. Nonetheless, practical and not just statistical significance should be considered. In this case the overall magnitude and impact of DIF was small for the groups studied, although impact was relatively large for some individuals.
McDonough, Christine M; Jette, Alan M; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M; Rasch, Elizabeth K
To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. In-person and semistructured interviews and Internet and telephone surveys. Sample of SSA claimants (n=1017) and a normative sample of adults from the U.S. general population (n=999). Not applicable. Model fit statistics. The final item pool consisted of 139 items. Within the claimant sample, 58.7% were white; 31.8% were black; 46.6% were women; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution, which included more items and allowed separate characterization of: (1) changing and maintaining body position, (2) whole body mobility, (3) upper body function, and (4) upper extremity fine motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples, respectively, were: Comparative Fit Index=.93 and .98; Tucker-Lewis Index=.92 and .98; and root mean square error approximation=.05 and .04. The factor structure of the physical function item pool closely resembled the hypothesized content model. The 4 scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L
The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
Fukuhara, Hirotaka; Kamata, Akihito
A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms
Wang, Wen-Chung; Shih, Ching-Lin
Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…
Demonstrates how the mean and covariance structure analysis model of D. Sorbom (1974) can be used to detect uniform and nonuniform differential item functioning (DIF) on polytomous ordered response items assumed to approximate a continuous scale. Uses results from 773 civil service employees administered the Kirton Adaption-Innovation Inventory…
Full Text Available Abstract Background The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC is a widely used patient reported outcome in osteoarthritis. An important, but frequently overlooked, aspect of validating health outcome measures is to establish if items exhibit differential item functioning (DIF. That is, if respondents have the same underlying level of an attribute, does the item give the same score in different subgroups or is it biased towards one subgroup or another. The aim of the study was to explore DIF in the Likert format WOMAC for the first time in a UK osteoarthritis population with respect to demographic, social, clinical and psychological factors. Methods The sample comprised a community sample of 763 people with osteoarthritis who participated in the Somerset and Avon Survey of Health. The WOMAC was explored for DIF by gender, age, social deprivation, social class, employment status, distress, body mass index and clinical factors. Ordinal regression models were used to identify DIF items. Results After adjusting for age, two items were identified for the physical functioning subscale as having DIF with age identified as the DIF factor for 2 items, gender for 1 item and body mass index for 1 item. For the WOMAC pain subscale, for people with hip osteoarthritis one item was identified with age-related DIF. The impact of the DIF items rarely had a significant effect on the conclusions of group comparisons. Conclusions Overall, the WOMAC performed well with only a small number of DIF items identified. However, as DIF items were identified in for the WOMAC physical functioning subscale it would be advisable to analyse data taking into account the possible impact of the DIF items when weight, gender or especially age effects, are the focus of interest in UK-based osteoarthritis studies. Similarly for the WOMAC pain subscale in people with hip osteoarthritis it would be worthwhile to analyse data taking into account the
This study investigated test item bias and Differential Item Functioning (DIF) of West African ... items in chemistry function differentially with respect to gender and location. In Aba education zone of Abia, 50 secondary schools were purposively ...
Kleinman, Marjorie; Teresi, Jeanne A
Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
Scheuneman, Janice Dowd; Gerritz, Kalle
Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Full Text Available Abstract Background Previous studies have analyzed the psychometric properties of the World Health Organization Disability Assessment Schedule II (WHO-DAS II using classical omnibus measures of scale quality. These analyses are sample dependent and do not model item responses as a function of the underlying trait level. The main objective of this study was to examine the effectiveness of the WHO-DAS II items and their options in discriminating between changes in the underlying disability level by means of item response analyses. We also explored differential item functioning (DIF in men and women. Methods The participants were 3615 adult general practice patients from 17 regions of Spain, with a first diagnosed major depressive episode. The 12-item WHO-DAS II was administered by the general practitioners during the consultation. We used a non-parametric item response method (Kernel-Smoothing implemented with the TestGraf software to examine the effectiveness of each item (item characteristic curves and their options (option characteristic curves in discriminating between changes in the underliying disability level. We examined composite DIF to know whether women had a higher probability than men of endorsing each item. Results Item response analyses indicated that the twelve items forming the WHO-DAS II perform very well. All items were determined to provide good discrimination across varying standardized levels of the trait. The items also had option characteristic curves that showed good discrimination, given that each increasing option became more likely than the previous as a function of increasing trait level. No gender-related DIF was found on any of the items. Conclusions All WHO-DAS II items were very good at assessing overall disability. Our results supported the appropriateness of the weights assigned to response option categories and showed an absence of gender differences in item functioning.
David Magis; Bruno Facon
Angoff's delta plot is a straightforward and not computationally intensive method to identify differential item functioning (DIF) among dichotomously scored items. This approach was recently improved by proposing an optimal threshold selection and by considering several item purification processes. Moreover, to support practical DIF analyses with the delta plot and these improvements, the R package deltaPlotR was also developed. The purpose of this paper is twofold: to outline the delta plot ...
Kabasakal, Kübra Atalay; Kelecioglu, Hülya
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Vaughn, Brandon K.; Wang, Qiu
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
Scott, Neil W.; Fayers, Peter M.; Aaronson, Neil K.
Differential item functioning (DIF) analyses can be used to explore translation, cultural, gender or other differences in the performance of quality of life (QoL) instruments. These analyses are commonly performed using "baseline" or pretreatment data. We previously reported DIF analyses to examine...
Wright, Keith D.; Oshima, T. C.
This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Watt, Torquil; Grønvold, Mogens; Hegedüs, Laszlo
To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis.......To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis....
Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias
To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Scott, Neil W.; Fayers, Peter M.; Bottomley, Andrew
Differential item functioning (DIF) analyses are increasingly used to evaluate health-related quality of life (HRQoL) instruments, which often include relatively short subscales. Computer simulations were used to explore how various factors including scale length affect analysis of DIF by ordinal...... logistic regression....
Stewart, Lesley A; Clarke, Mike; Rovers, Maroeska; Riley, Richard D; Simmonds, Mark; Stewart, Gavin; Tierney, Jayne F
Systematic reviews and meta-analyses of individual participant data (IPD) aim to collect, check, and reanalyze individual-level data from all studies addressing a particular research question and are therefore considered a gold standard approach to evidence synthesis. They are likely to be used with increasing frequency as current initiatives to share clinical trial data gain momentum and may be particularly important in reviewing controversial therapeutic areas. To develop PRISMA-IPD as a stand-alone extension to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) Statement, tailored to the specific requirements of reporting systematic reviews and meta-analyses of IPD. Although developed primarily for reviews of randomized trials, many items will apply in other contexts, including reviews of diagnosis and prognosis. Development of PRISMA-IPD followed the EQUATOR Network framework guidance and used the existing standard PRISMA Statement as a starting point to draft additional relevant material. A web-based survey informed discussion at an international workshop that included researchers, clinicians, methodologists experienced in conducting systematic reviews and meta-analyses of IPD, and journal editors. The statement was drafted and iterative refinements were made by the project, advisory, and development groups. The PRISMA-IPD Development Group reached agreement on the PRISMA-IPD checklist and flow diagram by consensus. Compared with standard PRISMA, the PRISMA-IPD checklist includes 3 new items that address (1) methods of checking the integrity of the IPD (such as pattern of randomization, data consistency, baseline imbalance, and missing data), (2) reporting any important issues that emerge, and (3) exploring variation (such as whether certain types of individual benefit more from the intervention than others). A further additional item was created by reorganization of standard PRISMA items relating to interpreting results. Wording
Avlund, K; Era, P; Davidsen, M
to geographical locality and gender. Information about self-reported functional ability was gathered from surveys on 75-year-old men and women in Glostrup (Denmark), Göteborg (Sweden) and Jyväskylä (Finland). The data were collected by structured home interviews about mobility and Physical activities of daily......The purpose of this article is to analyse item bias in a measure of self-reported functional ability among 75-year-old people in three Nordic localities. The present item bias analysis examines whether the construction of a functional ability index from several variables results in bias in relation...... living (PADL) in relation to tiredness, reduced speed and dependency and combined into three tiredness-scales, three reduced speed-scales and two dependency-scales. The analysis revealed item bias regarding geographical locality in seven out of eight of the functional ability scales, but nearly no bias...
Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias
To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading .3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...
Petersen, Morten Aa.; Gamper, Eva-Maria; Costantini, Anna
of the widely used EORTC Quality of Life questionnaire (QLQ-C30). STUDY DESIGN AND SETTING: On the basis of literature search and evaluations by international samples of experts and cancer patients, 38 candidate items were developed. The psychometric properties of the items were evaluated in a large...... international sample of cancer patients. This included evaluations of dimensionality, item response theory (IRT) model fit, differential item functioning (DIF), and of measurement precision/statistical power. RESULTS: Responses were obtained from 1,023 cancer patients from four countries. The evaluations showed...... that 24 items could be included in a unidimensional IRT model. DIF did not seem to have any significant impact on the estimation of EF. Evaluations indicated that the CAT measure may reduce sample size requirements by up to 50% compared to the QLQ-C30 EF scale without reducing power. CONCLUSION...
Khalid, Muhammad Naveed; Glas, Cornelis A.W.
Item bias or differential item functioning (DIF) has an important impact on the fairness of psychological and educational testing. In this paper, DIF is seen as a lack of fit to an item response (IRT) model. Inferences about the presence and importance of DIF require a process of so-called test
In international large-scale assessments of educational outcomes, student achievement is often represented by unidimensional constructs. This approach allows for drawing general conclusions about country rankings with respect to the given achievement measure, but it typically does not provide specific diagnostic information which is necessary for systematic comparisons and improvements of educational systems. Useful information could be obtained by exploring the differences in national profiles of student achievement between low-achieving and high-achieving countries. In this study, we aimed to identify the relative weaknesses and strengths of eighth graders' physics achievement in Bosnia and Herzegovina in comparison to the achievement of their peers from Slovenia. For this purpose, we ran a secondary analysis of Trends in International Mathematics and Science Study (TIMSS) 2007 data. The student sample consisted of 4,220 students from Bosnia and Herzegovina and 4,043 students from Slovenia. After analysing the cognitive demands of TIMSS 2007 physics items, the correspondent differential item functioning (DIF)/differential group functioning contrasts were estimated. Approximately 40% of items exhibited large DIF contrasts, indicating significant differences between cultures of physics education in Bosnia and Herzegovina and Slovenia. The relative strength of students from Bosnia and Herzegovina showed to be mainly associated with the topic area 'Electricity and magnetism'. Classes of items which required the knowledge of experimental method, counterintuitive thinking, proportional reasoning and/or the use of complex knowledge structures proved to be differentially easier for students from Slovenia. In the light of the presented results, the common practice of ranking countries with respect to universally established cognitive categories seems to be potentially misleading.
Glas, Cornelis A.W.
In this paper it is shown that differential item functioning can be evaluated using the Lagrange multiplier test or C. R. Rao's efficient score test. The test is presented in the framework of a number of item response theory (IRT) models such as the Rasch model, the one-parameter logistic model, the
Falk, Carl F.; Cai, Li
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Berger, Moritz; Tutz, Gerhard
Detection of differential item functioning (DIF) by use of the logistic modeling approach has a long tradition. One big advantage of the approach is that it can be used to investigate nonuniform (NUDIF) as well as uniform DIF (UDIF). The classical approach allows one to detect DIF by distinguishing between multiple groups. We propose an…
Heo, Moonseong; Kim, Namhee; Faith, Myles S
In countless number of clinical trials, measurements of outcomes rely on instrument questionnaire items which however often suffer measurement error problems which in turn affect statistical power of study designs. The Cronbach alpha or coefficient alpha, here denoted by C(α), can be used as a measure of internal consistency of parallel instrument items that are developed to measure a target unidimensional outcome construct. Scale score for the target construct is often represented by the sum of the item scores. However, power functions based on C(α) have been lacking for various study designs. We formulate a statistical model for parallel items to derive power functions as a function of C(α) under several study designs. To this end, we assume fixed true score variance assumption as opposed to usual fixed total variance assumption. That assumption is critical and practically relevant to show that smaller measurement errors are inversely associated with higher inter-item correlations, and thus that greater C(α) is associated with greater statistical power. We compare the derived theoretical statistical power with empirical power obtained through Monte Carlo simulations for the following comparisons: one-sample comparison of pre- and post-treatment mean differences, two-sample comparison of pre-post mean differences between groups, and two-sample comparison of mean differences between groups. It is shown that C(α) is the same as a test-retest correlation of the scale scores of parallel items, which enables testing significance of C(α). Closed-form power functions and samples size determination formulas are derived in terms of C(α), for all of the aforementioned comparisons. Power functions are shown to be an increasing function of C(α), regardless of comparison of interest. The derived power functions are well validated by simulation studies that show that the magnitudes of theoretical power are virtually identical to those of the empirical power. Regardless
Bilir, Mustafa Kuzey
This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G
Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D
About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.
Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J
Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.
Lou, Meei-Fang; Dai, Yu-Tzu; Huang, Guey-Shiun; Yu, Po-Jui
The purpose of the study was to identify the most efficient items from the Mini-Mental State Examination for assessment of cognitive function. The Mini-Mental State Examination is the most frequently used cognitive screening instrument. However, the Mini-Mental State Examination has been criticized for insensitivity to mild cognitive dysfunction, limited memory assessment and variability in level of difficulty of the individual items. This study used secondary data analysis. Item response theory two-parameter model was used to analyse the data from the admission assessment of mental status by the Mini-Mental State Examination for 801 patients. By using item response analysis, 16 items were selected from the original 30-item Mini-Mental State Examination. The 16 items included mainly the measures of orientation, recall and attention and calculation. The internal consistency of the 16-item Mini-Mental State Examination was 0.84. The proposed new cut-off point for the 16-item Mini-Mental State Examination was 11. The correct classification rate was 0.94, the sensitivity was 100% and the specificity was 97.4%, when compared with the original 30-item Mini-Mental State Examination from the cut-off point of 24. This new cut-off point was determined for the purpose of over-identifying patients at risk so as to ensure early detection of and prevention from the onset of cognitive disturbance. Only a few items are needed to describe the subject's cognitive status. Using item response theory analysis, the study found that the Mini-Mental State Examination could be simplified. Deleting the items with less variation makes this assessment tool not only shorter, easier to administer and less strenuous for respondents, but also enables one to maintain validity as a cognitive function test for clinical setting.
Martijn A H Oude Voshaar
Full Text Available OBJECTIVE: To calibrate the Dutch-Flemish version of the PROMIS physical function (PF item bank in patients with rheumatoid arthritis (RA and to evaluate cross-cultural measurement equivalence with US general population and RA data. METHODS: Data were collected from RA patients enrolled in the Dutch DREAM registry. An incomplete longitudinal anchored design was used where patients completed all 121 items of the item bank over the course of three waves of data collection. Item responses were fit to a generalized partial credit model adapted for longitudinal data and the item parameters were examined for differential item functioning (DIF across country, age, and sex. RESULTS: In total, 690 patients participated in the study at time point 1 (T2, N = 489; T3, N = 311. The item bank could be successfully fitted to a generalized partial credit model, with the number of misfitting items falling within acceptable limits. Seven items demonstrated DIF for sex, while 5 items showed DIF for age in the Dutch RA sample. Twenty-five (20% items were flagged for cross-cultural DIF compared to the US general population. However, the impact of observed DIF on total physical function estimates was negligible. DISCUSSION: The results of this study showed that the PROMIS PF item bank adequately fit a unidimensional IRT model which provides support for applications that require invariant estimates of physical function, such as computer adaptive testing and targeted short forms. More studies are needed to further investigate the cross-cultural applicability of the US-based PROMIS calibration and standardized metric.
Shou, Yiyun; Sellbom, Martin; Xu, Jing
There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Anatchkova, Milena D; Rose, Matthias; Ware, John E
Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning.......Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning....
Bjorner, J. B.; Petersen, M. Aa; Groenvold, M.; Aaronson, N.; Ahlner-Elmqvist, M.; Arraras, J. I.; Brédart, A.; Fayers, P.; Jordhoy, M.; Sprangers, M.; Watson, M.; Young, T.
Background: As part of a larger study whose objective is to develop an abbreviated version of the EORTC QLQ-C30 suitable for research in palliative care, analyses were conducted to determine the feasibility of generating a shorter version of the 4-item emotional functioning (EF) scale that could be
Watt, Torquil; Groenvold, Mogens; Hegedüs, Laszlo; Bonnema, Steen Joop; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue
To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis. A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (∆R(2) > 0.02). Scale level was estimated by the sum score, after purification. Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1-2.3 points on the 0-100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively. Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.
Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
Gao, Yong; Mack, Mick G.; Ragan, Moira A.; Ragan, Brian
In this study the authors used differential item functioning analysis to examine if there were items in the Mental, Emotional, and Bodily Toughness Inventory functioning differently across gender and athletic membership. A total of 444 male (56.3%) and female (43.7%) participants (30.9% athletes and 69.1% non-athletes) responded to the Mental,…
Research purpose: This study assesses the Differential Item Functioning (DIF of the Utrecht Work Engagement Scale (UWES-17 for different South African cultural groups in a South African company. Motivation for the study: Organisations are using the UWES-17 more and more in South Africa to assess work engagement. Therefore, research evidence from psychologists or assessment practitioners on its DIF across different cultural groups is necessary. Research design, approach and method: The researchers conducted a Secondary Data Analysis (SDA on the UWES-17 sample (n = 2429 that they obtained from a cross-sectional survey undertaken in a South African Information and Communication Technology (ICT sector company (n = 24 134. Quantitative item data on the UWES-17 scale enabled the authors to address the research question. Main findings: The researchers found uniform and/or non-uniform DIF on five of the vigour items, four of the dedication items and two of the absorption items. This also showed possible Differential Test Functioning (DTF on the vigour and dedication dimensions. Practical/managerial implications: Based on the DIF, the researchers suggested that organisations should not use the UWES-17 comparatively for different cultural groups or employment decisions in South Africa. Contribution/value add: The study provides evidence on DIF and possible DTF for the UWES-17. However, it also raises questions about possible interaction effects that need further investigation.
Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi
Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Individuals with knee impairments identify items in need of clarification in the Patient Reported Outcomes Measurement Information System (PROMIS®) pain interference and physical function item banks - a qualitative study.
Lynch, Andrew D; Dodds, Nathan E; Yu, Lan; Pilkonis, Paul A; Irrgang, James J
The content and wording of the Patient Reported Outcome Measurement Information System (PROMIS) Physical Function and Pain Interference item banks have not been qualitatively assessed by individuals with knee joint impairments. The purpose of this investigation was to identify items in the PROMIS Physical Function and Pain Interference Item Banks that are irrelevant, unclear, or otherwise difficult to respond to for individuals with impairment of the knee and to suggest modifications based on cognitive interviews. Twenty-nine individuals with knee joint impairments qualitatively assessed items in the Pain Interference and Physical Function Item Banks in a mixed-methods cognitive interview. Field notes were analyzed to identify themes and frequency counts were calculated to identify items not relevant to individuals with knee joint impairments. Issues with clarity were identified in 23 items in the Physical Function Item Bank, resulting in the creation of 43 new or modified items, typically changing words within the item to be clearer. Interpretation issues included whether or not the knee joint played a significant role in overall health and age/gender differences in items. One quarter of the original items (31 of 124) in the Physical Function Item Bank were identified as irrelevant to the knee joint. All 41 items in the Pain Interference Item Bank were identified as clear, although individuals without significant pain substituted other symptoms which interfered with their life. The Physical Function Item Bank would benefit from additional items that are relevant to individuals with knee joint impairments and, by extension, to other lower extremity impairments. Several issues in clarity were identified that are likely to be present in other patient cohorts as well.
Kang, Hyeon-Ah; Su, Ya-Hui; Chang, Hua-Hua
A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales. © 2018 The British Psychological Society.
Kalibatseva, Z; Leong, F T L; Ham, E H
Theoretical and clinical publications suggest the existence of cultural differences in the expression and experience of depression. Measurement non-equivalence remains a potential methodological explanation for the lower prevalence of depression among Asian Americans compared to European Americans. This study compared DSM-IV depressive symptoms among Asian Americans and European Americans using secondary data analysis of the Collaborative Psychiatric Epidemiology Surveys (CPES). The Composite International Diagnostic Interview (CIDI) was used for the assessment of depressive symptoms. Of the entire sample, 310 Asian Americans and 1974 European Americans reported depressive symptoms and were included in the analyses. Measurement variance was examined with an item response theory differential item functioning (IRT DIF) analysis. χ2 analyses indicated that, compared to Asian Americans, European American participants more frequently endorsed affective symptoms such as 'feeling depressed', 'feeling discouraged' and 'cried more often'. The IRT analysis detected DIF for four out of the 15 depression symptom items. At equal levels of depression, Asian Americans endorsed feeling worthless and appetite changes more easily than European Americans, and European Americans endorsed feeling nervous and crying more often than Asian Americans. Asian Americans did not seem to over-report somatic symptoms; however, European Americans seemed to report more affective symptoms than Asian Americans. The results suggest that there was measurement variance in a few of the depression items.
Paap, Muirne C S; Braeken, Johan; Pedersen, Geir; Urnes, Øyvind; Karterud, Sigmund; Wilberg, Theresa; Hummelen, Benjamin
This study aims at evaluating the psychometric properties of the antisocial personality disorder (ASPD) criteria in a large sample of patients, most of whom had one or more personality disorders (PD). PD diagnoses were assessed by experienced clinicians using the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Axis II PDs. Analyses were performed within an item response theory framework. Results of the analyses indicated that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. Differential item functioning across gender was restricted to two criteria and had little impact on the latent ASPD trait level. Patients fulfilling both the adult ASPD criteria and the conduct disorder criteria had similar latent trait distributions as patients fulfilling only the adult ASPD criteria. Overall, the ASPD items fit the purpose of a diagnostic instrument well, that is, distinguishing patients with moderate from those with high antisocial personality scores.
Gao, Yong; Zhu, Weimo
The purpose of this study was to identify subgroup-sensitive physical activities (PA) using differential item functioning (DIF) analysis. A sub-unweighted sample of 1857 (men=923 and women=934) from the 2003-2004 National Health and Nutrition Examination Survey PA questionnaire data was used for the analyses. Using the Mantel-Haenszel, the simultaneous item bias test, and the ANOVA DIF methods, 33 specific leisure-time moderate and/or vigorous PA (MVPA) items were analyzed for DIF across race/ethnicity, gender, education, income, and age groups. Many leisure-time MVPA items were identified as large DIF items. When participating in the same amount of leisure-time MVPA, non-Hispanic blacks were more likely to participate in basketball and dance activities than non-Hispanic whites (NHW); NHW were more likely to participated in golf and hiking than non-Hispanic blacks; Hispanics were more likely to participate in dancing, hiking, and soccer than NHW, whereas NHW were more likely to engage in bicycling, golf, swimming, and walking than Hispanics; women were more likely to participate in aerobics, dancing, stretching, and walking than men, whereas men were more likely to engage in basketball, fishing, golf, running, soccer, weightlifting, and hunting than women; educated persons were more likely to participate in jogging and treadmill exercise than less educated persons; persons with higher incomes were more likely to engage in golf than those with lower incomes; and adults (20-59 yr) were more likely to participate in basketball, dancing, jogging, running, and weightlifting than older adults (60+ yr), whereas older adults were more likely to participate in walking and golf than younger adults. DIF methods are able to identify subgroup-sensitive PA and thus provide useful information to help design group-sensitive, targeted interventions for disadvantaged PA subgroups. © 2011 by the American College of Sports Medicine
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D
The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C
The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina
As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.
Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.
In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
This article focuses on four-parameter logistic (4PL) model as an extension of the usual three-parameter logistic (3PL) model with an upper asymptote possibly different from 1. For a given item with fixed item parameters, Lord derived the value of the latent ability level that maximizes the item information function under the 3PL model. The…
Ip, Edward; Molenberghs, Geert; Chen, Shyh-Huei
The problem of fitting unidimensional item response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that have a strong dimension but also contain minor nuisance dimensions. Fitting a unidimensional model to such multidimensio......The problem of fitting unidimensional item response models to potentially multidimensional data has been extensively studied. The focus of this article is on response data that have a strong dimension but also contain minor nuisance dimensions. Fitting a unidimensional model...... to such multidimensional data is believed to result in ability estimates that represent a combination of the major and minor dimensions. We conjecture that the underlying dimension for the fitted unidimensional model, which we call the functional dimension, represents a nonlinear projection. In this article we investigate...... tool. An example regarding a construct of desire for physical competency is used to illustrate the functional unidimensional approach....
Cunningham, Timothy J.; Berkman, Lisa F.; Gortmaker, Steven L.; Kiefe, Catarina I.; Jacobs, David R.; Seeman, Teresa E.; Kawachi, Ichiro
The psychometric properties of instruments used to measure self-reported experiences of discrimination in epidemiologic studies are rarely assessed, especially regarding construct validity. The authors used 2000–2001 data from the Coronary Artery Risk Development in Young Adults (CARDIA) Study to examine differential item functioning (DIF) in 2 versions of the Experiences of Discrimination (EOD) Index, an index measuring self-reported experiences of racial/ethnic and gender discrimination. DIF may confound interpretation of subgroup differences. Large DIF was observed for 2 of 7 racial/ethnic discrimination items: White participants reported more racial/ethnic discrimination for the “at school” item, and black participants reported more racial/ethnic discrimination for the “getting housing” item. The large DIF by race/ethnicity in the index for racial/ethnic discrimination probably reflects item impact and is the result of valid group differences between blacks and whites regarding their respective experiences of discrimination. The authors also observed large DIF by race/ethnicity for 3 of 7 gender discrimination items. This is more likely to have been due to item bias. Users of the EOD Index must consider the advantages and disadvantages of DIF adjustment (omitting items, constructing separate measures, and retaining items). The EOD Index has substantial usefulness as an instrument that can assess self-reported experiences of discrimination. PMID:22038104
Bauer, Daniel J
The evaluation of measurement invariance is an important step in establishing the validity and comparability of measurements across individuals. Most commonly, measurement invariance has been examined using 1 of 2 primary latent variable modeling approaches: the multiple groups model or the multiple-indicator multiple-cause (MIMIC) model. Both approaches offer opportunities to detect differential item functioning within multi-item scales, and thereby to test measurement invariance, but both approaches also have significant limitations. The multiple groups model allows 1 to examine the invariance of all model parameters but only across levels of a single categorical individual difference variable (e.g., ethnicity). In contrast, the MIMIC model permits both categorical and continuous individual difference variables (e.g., sex and age) but permits only a subset of the model parameters to vary as a function of these characteristics. The current article argues that moderated nonlinear factor analysis (MNLFA) constitutes an alternative, more flexible model for evaluating measurement invariance and differential item functioning. We show that the MNLFA subsumes and combines the strengths of the multiple group and MIMIC models, allowing for a full and simultaneous assessment of measurement invariance and differential item functioning across multiple categorical and/or continuous individual difference variables. The relationships between the MNLFA model and the multiple groups and MIMIC models are shown mathematically and via an empirical demonstration. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
Full Text Available The test of relational reasoning (TORR is designed to assess the ability to identify complex patterns within visuospatial stimuli. The TORR is designed for use in school and university settings, and therefore, its measurement invariance across diverse groups is critical. In this investigation, a large sample, representative of a major university on key demographic variables, was collected, and the resulting data were analyzed using a multi-group, multidimensional item-response theory model-comparison procedure. No significant differential item functioning was found on any of the TORR items across any of the demographic groups of interest. This finding is interpreted as evidence of the cultural fairness of the TORR, and potential test-development choices that may have contributed to that cultural fairness are discussed.
Woods, Carol M.; Grimm, Kevin J.
In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…
Silva, Soraia Micaela; Corrêa, Fernanda Ishida; Faria, Christina Danielli Coelho de Morais; Pereira, Gabriela Santos; Attié, Edna Alves Dos Anjos; Corrêa, João Carlos Ferrari
To evaluate the reproducibility of the Stroke Specific Quality of Life (SS-QOL) items that address the participation component of the International Classification of Functioning, Disability and Health (ICF) and analyse the correlation between the subscore of these 26 items and the total SS-QOL score. Seventy-five stroke survivors participated in this study. Reproducibility was evaluated using the intraclass correlation coefficient (ICC2,1), standard error of measurement (SEM), minimum detectable change (MDC) and the Bland-Altman plot. The correlation between the subscore of the 26 items and the total SS-QOL score was analysed using Spearman's correlation coefficients (rho) and simple linear regression. An alpha risk ≤ 0.05 was considered for all analyses. The SS-QOL items that address the participation component of the ICF demonstrated excellent reliability (intra-rater ICC2,1 = 0.96; inter-rater ICC2,1 = 0.95). The SEM and MDC were adequate. The Bland-Altman plot demonstrated satisfactory agreement. A significant and strong correlation (rho = 0.83) was found between the 26 SS-QOL items that address participation and the total SS-QOL score. Moreover, the evaluation of participation was found to explain 73% of the evaluation of health-related quality of life. The 26 SS-QOL items that address the participation component of the ICF demonstrated adequate reproducibility. Thus, participation, which represents the social aspects of functionality, can be adequately evaluated with these items. Implications for Rehabilitation The 26 Stroke Specific Quality of Life items that address participation proved to be reproducible for the analysis of social participation following a stroke. The findings can lead to a better understanding of the social participation of individuals with chronic hemiparesis and assist in the establishment of adequate treatment for such individuals. The rehabilitation process can be directed towards more specific goals focused on the
Bjorner, Jakob Bue; Pejtersen, Jan Hyld
AIMS: To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). METHODS: We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a ...
Full Text Available This report analyses the differential item functioning (DIF in the Programme for Indicators of Student Achievement PISA2000. The items studied are coming from the Reading Comprehension Test. We analyzed the released items from this year because we wanted to join the detection of DIF and its understanding. The reference group is the sample of United Kingdom and the focal group is the Spanish sample. The procedures of detection are Mantel-Haenszel, Logistic Regression and the standardized mean difference, and their extensions for polytomous items. Two items were flagged and the post-hoc analysis didn’t explain the causes of DIF entirely. Este trabajo analiza el funcionamiento diferencial del ítem (FDI de la prueba de comprensión lectora de la evaluación PISA2000 entre la muestras del Reino Unido y España. Se estudian los ítems liberados con el fin de aunar las fases de detección del FDI con la comprensión de sus causas. En la fase de detección se comparan los resultados de los procedimientos Mantel-Haenszel, Regresión Logística y Medias Estandarizadas en sus versiones para ítems dicotómicos y politómicos. Los resultados muestran que dos ítems presentan funcionamiento diferencial aunque el estudio post-hoc llevado a cabo sobre su contenido no ha podido precisar sus causas.
Beinicke, Andrea; Pässler, Katja; Hell, Benedikt
The study investigates consequences of eliminating items showing gender-specific differential item functioning (DIF) on the psychometric structure of a standard RIASEC interest inventory. Holland's hexagonal model was tested for structural invariance using a confirmatory methodological approach (confirmatory factor analysis and randomization…
Timmerby, Nina; Cosci, Fiammetta; Watson, Maggie; Csillag, Claudio; Schmitt, Florence; Steck, Barbara; Bech, Per; Thastum, Mikael
The Family Assessment Device (FAD) is a 60-item questionnaire widely used to evaluate self-reported family functioning. However, the factor structure as well as the number of items has been questioned. A shorter and more user-friendly version of the original FAD-scale, the 36-item FAD, has therefore previously been proposed, based on findings in a nonclinical population of adults. We aimed in this study to evaluate the brief 36-item version of the FAD in a clinical population. Data from a European multinational study, examining factors associated with levels of family functioning in adult cancer patients' families, were used. Both healthy and ill parents completed the 60-item version FAD. The psychometric analyses conducted were Principal Component Analysis and Mokken-analysis. A total of 564 participants were included. Based on the psychometric analysis we confirmed that the 36-item version of the FAD has robust psychometric properties and can be used in clinical populations. The present analysis confirmed that the 36-item version of the FAD (18 items assessing 'well-being' and 18 items assessing 'dysfunctional' family function) is a brief scale where the summed total score is a valid measure of the dimensions of family functioning. This shorter version of the FAD is, in accordance with the concept of 'measurement-based care', an easy to use scale that could be considered when the aim is to evaluate self-reported family functioning.
Glas, Cornelis A.W.
Abstract: In the present paper it is shown that differential item functioning can be evaluated using the Lagrange multiplier test or Rao’s efficient score test. The test is presented in the framework of a number of IRT models such as the Rasch model, the OPLM, the 2-parameter logistic model, the
Martine H P Crins
Full Text Available The Patient-Reported Outcomes Measurement Information System (PROMIS is a universally applicable set of instruments, including item banks, short forms and computer adaptive tests (CATs, measuring patient-reported health across different patient populations. PROMIS CATs are highly efficient and the use in practice is considered feasible with little administration time, offering standardized and routine patient monitoring. Before an item bank can be used as CAT, the psychometric properties of the item bank have to be examined. Therefore, the objective was to assess the psychometric properties of the Dutch-Flemish PROMIS Physical Function item bank (DF-PROMIS-PF in Dutch patients receiving physical therapy.Cross-sectional study.805 patients >18 years, who received any kind of physical therapy in primary care in the past year, completed the full DF-PROMIS-PF (121 items.Unidimensionality was examined by Confirmatory Factor Analysis and local dependence and monotonicity were evaluated. A Graded Response Model was fitted. Construct validity was examined with correlations between DF-PROMIS-PF T-scores and scores on two legacy instruments (SF-36 Health Survey Physical Functioning scale [SF36-PF10] and the Health Assessment Questionnaire Disability-Index [HAQ-DI]. Reliability (standard errors of theta was assessed.The results for unidimensionality were mixed (scaled CFI = 0.924, TLI = 0.923, RMSEA = 0.045, 1th factor explained 61.5% of variance. Some local dependence was found (8.2% of item pairs. The item bank showed a broad coverage of the physical function construct (threshold-parameters range: -4.28-2.33 and good construct validity (correlation with SF36-PF10 = 0.84 and HAQ-DI = -0.85. Furthermore, the DF-PROMIS-PF showed greater reliability over a broader score-range than the SF36-PF10 and HAQ-DI.The psychometric properties of the DF-PROMIS-PF item bank are sufficient. The DF-PROMIS-PF can now be used as short forms or CAT to measure the level of
Panouillères, M; Anota, A; Nguyen, T V; Brédart, A; Bosset, J F; Monnier, A; Mercier, M; Hardouin, J B
The present study investigates the properties of the French version of the OUT-PATSAT35 questionnaire, which evaluates the outpatients' satisfaction with care in oncology using classical analysis (CTT) and item response theory (IRT). This cross-sectional multicenter study includes 692 patients who completed the questionnaire at the end of their ambulatory treatment. CTT analyses tested the main psychometric properties (convergent and divergent validity, and internal consistency). IRT analyses were conducted separately for each OUT-PATSAT35 domain (the doctors, the nurses or the radiation therapists and the services/organization) by models from the Rasch family. We examined the fit of the data to the model expectations and tested whether the model assumptions of unidimensionality, monotonicity and local independence were respected. A total of 605 (87.4%) respondents were analyzed with a mean age of 64 years (range 29-88). Internal consistency for all scales separately and for the three main domains was good (Cronbach's α 0.74-0.98). IRT analyses were performed with the partial credit model. No disordered thresholds of polytomous items were found. Each domain showed high reliability but fitted poorly to the Rasch models. Three items in particular, the item about "promptness" in the doctors' domain and the items about "accessibility" and "environment" in the services/organization domain, presented the highest default of fit. A correct fit of the Rasch model can be obtained by dropping these items. Most of the local dependence concerned items about "information provided" in each domain. A major deviation of unidimensionality was found in the nurses' domain. CTT showed good psychometric properties of the OUT-PATSAT35. However, the Rasch analysis revealed some misfitting and redundant items. Taking the above problems into consideration, it could be interesting to refine the questionnaire in a future study.
Petersen, Morten Aa; Groenvold, Mogens; Bjorner, Jakob B.; Aaronson, Neil; Conroy, Thierry; Cull, Ann; Fayers, Peter; Hjermstad, Marianne; Sprangers, Mirjam; Sullivan, Marianne
In cross-national comparisons based on questionnaires, accurate translations are necessary to obtain valid results. Differential item functioning (DIF) analysis can be used to test whether translations of items in multi-item scales are equivalent to the original. In data from 10,815 respondents
Cabrero-García, Julio; Ramos-Pichardo, Juan Diego; Muñoz-Mendoza, Carmen Luz; Cabañero-Martínez, María José; González-Llopis, Lorena; Reig-Ferrer, Abilio
To develop and validate an item bank to measure mobility in older people in primary care and to analyse differential item functioning (DIF) and differential bundle functioning (DBF) by sex. A pool of 48 mobility items was administered by interview to 593 older people attending primary health care practices. The pool contained four domains based on the International Classification of Functioning: changing and maintaining body position, carrying, lifting and pushing, walking and going up and down stairs. The Late Life Mobility item bank consisted of 35 items, and measured with a reliability of 0.90 or more across the full spectrum of mobility, except at the higher end of better functioning. No evidence was found of non-uniform DIF but uniform DIF was observed, mainly for items in the changing and maintaining body position and carrying, lifting and pushing domains. The walking domain did not display DBF, but the other three domains did, principally the carrying, lifting and pushing items. During the design and validation of an item bank to measure mobility in older people, we found that strength (carrying, lifting and pushing) items formed a secondary dimension that produced DBF. More research is needed to determine how best to include strength items in a mobility measure, or whether it would be more appropriate to design separate measures for each construct.
Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li
The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: email@example.com.
Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…
Lee, Yi-Hsuan; Zhang, Jinming
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Falk, Carl F.; Cai, Li
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Drabinová, Adéla; Martinková, Patrícia
In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…
Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi
High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Full Text Available Abstract Background A previous study had identified 45 items assessing the impact of atopic dermatitis (AD on the whole family. From these it was intended to develop two separate scales, one assessing impact on carers and the other determining the effect on the child. Methods The 45 items were included in three clinical trials designed to test the efficacy of a new topical treatment (pimecrolimus, Elidel cream 1% in the treatment of AD in infants and children and in validation studies in the UK, US, Germany, France and the Netherlands. Rasch analyses were undertaken to determine whether an internationally valid, unidimensional scale could be developed that would inform on the direct impact of AD on the child. Results Rasch analyses applied to the data from the trials indicated that the draft measure consisted of two scales, one assessing the QoL of the carer and the other (consisting of 12 items measuring the impact of AD on the child. Three of the 12 potential items failed to fit the measurement model in Europe and five in the US. In addition, four items exhibiting differential item functioning (DIF by country were identified. After removing the misfitting items and controlling for DIF it was possible to derive a scale; The Childhood Impact of Atopic Dermatitis (CIAD with good item fit for each trial analysis. Analysis of the validation data from each of the different countries confirmed that the CIAD had adequate internal consistency, reproducibility and construct validity. The CIAD demonstrated the benefits of treatment with Elidel over placebo in the European trial. A similar (non-significant trend was found for the US trials. Conclusion The study represents a novel method of dealing with the problem of DIF associated with different cultures. Such problems are likely to arise in any multinational study involving patient-reported outcome measures, as items in the scales are likely to be valued differently in different cultures. However, where
Bacci, Elizabeth D; Staniewska, Dorota; Coyne, Karin S; Boyer, Stacey; White, Leigh Ann; Zach, Neta; Cedarbaum, Jesse M
Our objective was to examine dimensionality and item-level performance of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) across time using classical and modern test theory approaches. Confirmatory factor analysis (CFA) and Item Response Theory (IRT) analyses were conducted using data from patients with amyotrophic lateral sclerosis (ALS) Pooled Resources Open-Access ALS Clinical Trials (PRO-ACT) database with complete ALSFRS-R data (n = 888) at three time-points (Time 0, Time 1 (6-months), Time 2 (1-year)). Results demonstrated that in this population of 888 patients, mean age was 54.6 years, 64.4% were male, and 93.7% were Caucasian. The CFA supported a 4* individual-domain structure (bulbar, gross motor, fine motor, and respiratory domains). IRT analysis within each domain revealed misfitting items and overlapping item response category thresholds at all time-points, particularly in the gross motor and respiratory domain items. Results indicate that many of the items of the ALSFRS-R may sub-optimally distinguish among varying levels of disability assessed by each domain, particularly in patients with less severe disability. Measure performance improved across time as patient disability severity increased. In conclusion, modifications to select ALSFRS-R items may improve the instrument's specificity to disability level and sensitivity to treatment effects.
Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A
This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.
Balluerka, Nekane; Gorostiaga, Arantxa; Gómez-Benito, Juana; Hidalgo, María Dolores
Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.
Kaskowitz, Gary S.; De Ayala, R. J.
Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B
The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.
Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri
Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
Rodriguez, Hector P; Crane, Paul K
Objective To evaluate psychometric properties of a widely used patient experience survey. Data Sources English-language responses to the Clinician & Group Consumer Assessment of Healthcare Providers and Systems (CG-CAHPS®) survey (n = 12,244) from a 2008 quality improvement initiative involving eight southern California medical groups. Methods We used an iterative hybrid ordinal logistic regression/item response theory differential item functioning (DIF) algorithm to identify items with DIF related to patient sociodemographic characteristics, duration of the physician–patient relationship, number of physician visits, and self-rated physical and mental health. We accounted for all sources of DIF and determined its cumulative impact. Principal Findings The upper end of the CG-CAHPS® performance range is measured with low precision. With sensitive settings, some items were found to have DIF. However, overall DIF impact was negligible, as 0.14 percent of participants had salient DIF impact. Latinos who spoke predominantly English at home had the highest prevalence of salient DIF impact at 0.26 percent. Conclusions The CG-CAHPS® functions similarly across commercially insured respondents from diverse backgrounds. Consequently, previously documented racial and ethnic group differences likely reflect true differences rather than measurement bias. The impact of low precision at the upper end of the scale should be clarified. PMID:22092021
Full Text Available The aim of this research is to compare the result of the differential item functioning (DIF determining with hierarchical generalized linear model (HGLM technique and the results of the DIF determining with logistic regression (LR and item response theory–likelihood ratio (IRT-LR techniques on the test items. For this reason, first in this research, it is determined whether the students encounter DIF with HGLM, LR, and IRT-LR techniques according to socioeconomic status (SES, in the Turkish, Social Sciences, and Science subtest items of the Secondary School Institutions Examination. When inspecting the correlations among the techniques in terms of determining the items having DIF, it was discovered that there was significant correlation between the results of IRT-LR and LR techniques in all subtests; merely in Science subtest, the results of the correlation between HGLM and IRT-LR techniques were found significant. DIF applications can be made on test items with other DIF analysis techniques that were not taken to the scope of this research. The analysis results, which were determined by using the DIF techniques in different sample sizes, can be compared.
Bingenheimer, Jeffrey B; Raudenbush, Stephen W; Leventhal, Tama; Brooks-Gunn, Jeanne
Several hypotheses in family psychology involve comparisons of sociocultural groups. Yet the potential for cross-cultural inequivalence in widely used psychological measurement instruments threatens the validity of inferences about group differences. Methods for dealing with these issues have been developed via the framework of item response theory. These methods deal with an important type of measurement inequivalence, called differential item functioning (DIF). The authors introduce DIF analytic methods, linking them to a well-established framework for conceptualizing cross-cultural measurement equivalence in psychology (C.H. Hui and H.C. Triandis, 1985). They illustrate the use of DIF methods using data from the Project on Human Development in Chicago Neighborhoods (PHDCN). Focusing on the Caregiver Warmth and Environmental Organization scales from the PHDCN's adaptation of the Home Observation for Measurement of the Environment Inventory, the authors obtain results that exemplify the range of outcomes that may result when these methods are applied to psychological measurement instruments. (c) 2005 APA, all rights reserved
Ayotte, Brian J; Trivedi, Ranak; Bosworth, Hayden B
Health-related knowledge is an important component in the self-management of chronic illnesses. The objective of this study was to more accurately assess racial differences in hypertension knowledge by using a latent variable modeling approach that controlled for sociodemographic factors and accounted for measurement issues in the assessment of hypertension knowledge. Cross-sectional data from 1,177 participants (45% African American; 35% female) were analyzed using a multiple indicator multiple causes (MIMIC) modeling approach. Available sociodemographic data included race, education, sex, financial status, and age. All participants completed six items on a hypertension knowledge questionnaire. Overall, the final model suggested that females, Whites, and patients with at least a high school diploma had higher latent knowledge scores than males, African Americans, and patients with less than a high school diploma, respectively. The model also detected differential item functioning (DIF) based on race for two of the items. Specifically, the error rate for African Americans was lower than would be expected given the lower level of latent knowledge on the items, on the questions related to: (a) the association between high blood pressure and kidney disease, and (b) the increased risk African Americans have for developing hypertension. Not accounting for DIF resulted in the difference between Whites and African Americans to be underestimated. These results are discussed in the context of the need for careful measurement of health-related constructs, and how measurement-related issues can result in an inaccurate estimation of racial differences in hypertension knowledge.
Owens, Sherry; Kristjansson, Alfgeir L; Hunte, Haslyn E R
We investigated whether individual items on the nine item William's Perceived Everyday Discrimination Scale (EDS) functioned differently by age (ethnic group. Overall, Asian and Hispanic respondents reported less discrimination than Whites; on the other hand, African Americans and Black Caribbeans reported more discrimination than Whites. Regardless of race/ethnicity, the younger respondents (aged ethnicity, the results were mixed for 19 out of 45 tests of DIF (40%). No differences in item function were observed among Black Caribbeans. "Being called names or insulted" and others acting as "if they are afraid" of the respondents were the only two items that did not exhibit differential item functioning by age across all racial/ethnic groups. Overall, our findings suggest that the EDS scale should be used with caution in multi-age multi-racial/ethnic samples.
Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike
To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.
Paz, Sylvia H; Jones, Loretta; Calderón, José L; Hays, Ron D
Depression and physical function are particularly important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) physical function item bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. The aim of this study was to estimate the readability of the GDS and PROMIS ® physical function items and to assess their comprehensibility using a sample of African American and Latino elderly. Readability was estimated using the Flesch-Kincaid and Flesch Reading Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS ® items by minority elderly was evaluated with 30 cognitive interviews. Readability estimates of a number of items in English and Spanish of the GDS and PROMIS ® physical functioning items exceed the U.S. recommended 5th-grade threshold for vulnerable populations, or were rated as 'fairly difficult', 'difficult', or 'very difficult' to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS ® items was considered confusing, and interpreting responses was problematic because they were based on using physical aids. Problems with item wording and response options of the GDS and PROMIS ® physical function items may reduce reliability and validity of measurement when used with minority elderly.
Hannula, Deborah E; Libby, Laura A; Yonelinas, Andrew P; Ranganath, Charan
Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model-namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. Copyright © 2013 Elsevier Ltd. All rights reserved.
Hannula, Deborah E.; Libby, Laura A.; Yonelinas, Andrew P.; Ranganath, Charan
Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model – namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. PMID:23466350
Wang, Wen-Chung; Shih, Ching-Lin; Yang, Chih-Chien
This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling…
Drabinová, Adéla; Martinková, Patrícia
Roč. 54, č. 4 (2017), s. 498-517 ISSN 0022-0655 R&D Projects: GA ČR GJ15-15856Y Institutional support: RVO:67985807 Keywords : differential item functioning * non-linear regression * logistic regression * item response theory Subject RIV: AM - Education OBOR OECD: Statistics and probability Impact factor: 0.979, year: 2016
Pilcher, June J; Switzer, Fred S; Munc, Alec; Donnelly, Janet; Jellen, Julia C; Lamm, Claus
The purpose of this study is to examine the psychometric properties of the Epworth Sleepiness Scale (ESS) in two languages, German and English. Students from a university in Austria (N = 292; 55 males; mean age = 18.71 ± 1.71 years; 237 females; mean age = 18.24 ± 0.88 years) and a university in the US (N = 329; 128 males; mean age = 18.71 ± 0.88 years; 201 females; mean age = 21.59 ± 2.27 years) completed the ESS. An exploratory-factor analysis was completed to examine dimensionality of the ESS. Item response theory (IRT) analyses were used to provide information about the response rates on the items on the ESS and provide differential item functioning (DIF) analyses to examine whether the items were interpreted differently between the two languages. The factor analyses suggest that the ESS measures two distinct sleepiness constructs. These constructs indicate that the ESS is probing sleepiness in settings requiring active versus passive responding. The IRT analyses found that overall, the items on the ESS perform well as a measure of sleepiness. However, Item 8 and to a lesser extent Item 6 were being interpreted differently by respondents in comparison to the other items. In addition, the DIF analyses showed that the responses between German and English were very similar indicating that there are only minor measurement differences between the two language versions of the ESS. These findings suggest that the ESS provides a reliable measure of propensity to sleepiness; however, it does convey a two-factor approach to sleepiness. Researchers and clinicians can use the German and English versions of the ESS but may wish to exclude Item 8 when calculating a total sleepiness score.
Rose, Matthias; Bjorner, Jakob B; Gandek, Barbara; Bruce, Bonnie; Fries, James F; Ware, John E
To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n>2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range. Copyright © 2014. Published by Elsevier Inc.
Zheng, Yinggan; Gierl, Mark J.; Cui, Ying
This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
Rose, Matthias; Bjørner, Jakob; Gandek, Barbara
OBJECTIVE: To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. STUDY DESIGN AND SETTING: The items were evaluated using qualitative and quantitative methods. A total...... response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. RESULTS: The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living...... to identify differences between age and disease groups. CONCLUSION: The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range....
Gideon P. De Bruin
Full Text Available The factor analysis of items often produces spurious results in the sense that unidimensional scales appear multidimensional. This may be ascribed to failure in meeting the assumptions of linearity and normality on which factor analysis is based. Item response theory is explicitly designed for the modelling of the non-linear relations between ordinal variables and provides a strong alternative to the factor analysis of items. Items may also be combined in parcels that are more likely to satisfy the assumptions of factor analysis than do the items. The use of the Rasch rating scale model and the factor analysis of parcels is illustrated with data obtained with the Locus of Control Inventory. The results of these analyses are compared with the results obtained through the factor analysis of items. It is shown that the Rasch rating scale model and the factoring of parcels produce superior results to the factor analysis of items. Recommendations for the analysis of scales are made. Opsomming Die faktorontleding van items lewer dikwels misleidende resultate op, veral in die opsig dat eendimensionele skale as meerdimensioneel voorkom. Hierdie resultate kan dikwels daaraan toegeskryf word dat daar nie aan die aannames van lineariteit en normaliteit waarop faktorontleding berus, voldoen word nie. Itemresponsteorie, wat eksplisiet vir die modellering van die nie-liniêre verbande tussen ordinale items ontwerp is, bied ’n aantreklike alternatief vir die faktorontleding van items. Items kan ook in pakkies gegroepeer word wat meer waarskynlik aan die aannames van faktorontleding voldoen as individuele items. Die gebruik van die Rasch beoordelingskaalmodel en die faktorontleding van pakkies word aan die hand van data wat met die Lokus van Beheervraelys verkry is, gedemonstreer. Die resultate van hierdie ontledings word vergelyk met die resultate wat deur ‘n faktorontleding van die individuele items verkry is. Die resultate dui daarop dat die Rasch
Liegl, Gregor; Gandek, Barbara; Fischer, H Felix; Bjorner, Jakob B; Ware, John E; Rose, Matthias; Fries, James F; Nolte, Sandra
Physical function (PF) is a core patient-reported outcome domain in clinical trials in rheumatic diseases. Frequently used PF measures have ceiling effects, leading to large sample size requirements and low sensitivity to change. In most of these instruments, the response category that indicates the highest PF level is the statement that one is able to perform a given physical activity without any limitations or difficulty. This study investigates whether using an item format with an extended response scale, allowing respondents to state that the performance of an activity is easy or very easy, increases the range of precise measurement of self-reported PF. Three five-item PF short forms were constructed from the Patient-Reported Outcomes Measurement Information System (PROMIS®) wave 1 data. All forms included the same physical activities but varied in item stem and response scale: format A ("Are you able to …"; "without any difficulty"/"unable to do"); format B ("Does your health now limit you …"; "not at all"/"cannot do"); format C ("How difficult is it for you to …"; "very easy"/"impossible"). Each short-form item was answered by 2217-2835 subjects. We evaluated unidimensionality and estimated a graded response model for the 15 short-form items and remaining 119 items of the PROMIS PF bank to compare item and test information for the short forms along the PF continuum. We then used simulated data for five groups with different PF levels to illustrate differences in scoring precision between the short forms using different item formats. Sufficient unidimensionality of all short-form items and the original PF item bank was supported. Compared to formats A and B, format C increased the range of reliable measurement by about 0.5 standard deviations on the positive side of the PF continuum of the sample, provided more item information, and was more useful in distinguishing known groups with above-average functioning. Using an item format with an extended
Deutscher, Daniel; Hart, Dennis L; Crane, Paul K; Dickstein, Ruth
Comparative effectiveness research across cultures requires unbiased measures that accurately detect clinical differences between patient groups. The purpose of this study was to assess the presence and impact of differential item functioning (DIF) in knee functional status (FS) items administered using computerized adaptive testing (CAT) as a possible cause for observed differences in outcomes between 2 cultural patient groups in a polyglot society. This study was a secondary analysis of prospectively collected data. We evaluated data from 9,134 patients with knee impairments from outpatient physical therapy clinics in Israel. Items were analyzed for DIF related to sex, age, symptom acuity, surgical history, exercise history, and language used to complete the functional survey (Hebrew versus Russian). Several items exhibited DIF, but unadjusted FS estimates and FS estimates that accounted for DIF were essentially equal (intraclass correlation coefficient [2,1]>.999). No individual patient had a difference between unadjusted and adjusted FS estimates as large as the median standard error of the unadjusted estimates. Differences between groups defined by any of the covariates considered were essentially unchanged when using adjusted instead of unadjusted FS estimates. The greatest group-level impact was <0.3% of 1 standard deviation of the unadjusted FS estimates. Complete data where patients answered all items in the scale would have been preferred for DIF analysis, but only CAT data were available. Differences in FS outcomes between groups of patients with knee impairments who answered the knee CAT in Hebrew or Russian in Israel most likely reflected true differences that may reflect societal disparities in this health outcome.
Eichenbaum, Alexander E; Marcus, David K; French, Brian F
This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T
Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
fed set ofvaluesof a, b, AI , B1 A2 2 . 2 A3 , and 13 , the f ’. g ’a. nd h’a in (7) are fied. Equation (7) must still hold for S - e19029e3,..* . Thus...for Item I Is -- b ?(a:1 , b1 ,O) (1 + ’)(I + e4 (22 where a and pi are arbitrary constants. These constants mst be the sam for all Items In a given...NETHERLIS I E3I1 Focility-Acquisitions 4133 Rugby Avnue 1 Lee Cronbach Bethesda, NO 20014 16 Laburnue Road Atherton, CA 94205 1 Dr. Benjamin A. Fairbank
Deville, Craig W.; Chalhoub-Deville, Micheline
A study demonstrated the utility of item analyses to investigate which items function well or poorly in a second language reading recall protocol instrument. Data were drawn from a larger study of 56 learners of German as a second language at various proficiency levels. Pausal units of scored recall protocols were analyzed using both classical…
Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…
Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R
Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
A loglinear item response theory (IRT) model is proposed that relates polytomously scored item responses to a multidimensional latent space. Each item may have a different response function where each item response may be explained by one or more latent traits. Item response functions may follow a
Oude Voshaar, Martijn Ah; Ten Klooster, Peter M; Taal, Erik; Krishnan, Eswar; van de Laar, Mart Afj
Patient-reported physical function is an established outcome domain in clinical studies in rheumatology. To overcome the limitations of the current generation of questionnaires, the Patient-Reported Outcomes Measurement Information System (PROMIS®) project in the USA has developed calibrated item banks for measuring several domains of health status in people with a wide range of chronic diseases. The aim of this study was to translate and cross-culturally adapt the PROMIS physical function item bank to the Dutch language and to pretest it in a sample of patients with arthritis. The items of the PROMIS physical function item bank were translated using rigorous forward-backward protocols and the translated version was subsequently cognitively pretested in a sample of Dutch patients with rheumatoid arthritis. Few issues were encountered in the forward-backward translation. Only 5 of the 124 items to be translated had to be rewritten because of culturally inappropriate content. Subsequent pretesting showed that overall, questions of the Dutch version were understood as they were intended, while only one item required rewriting. Results suggest that the translated version of the PROMIS physical function item bank is semantically and conceptually equivalent to the original. Future work will be directed at creating a Dutch-Flemish final version of the item bank to be used in research with Dutch speaking populations.
Kennedy Deborah M
Full Text Available Abstract Background Although the Western Ontario and McMaster University Osteoarthritis Index (WOMAC is considered the leading outcome measure for patients with osteoarthritis of the lower extremity, recent work has challenged its factorial validity and the physical function subscale's ability to detect valid change when pain and function display different profiles of change. This study examined the etiology of the WOMAC's physical function subscale's limited ability to detect change in the presence of discordant changes for pain and function. We hypothesized that the duplication of some items on the WOMAC's pain and function subscales contributed to this shortcoming. Methods Two eight-item physical function scales were abstracted from the WOMAC's 17-item physical function subscale: one contained activities and themes that were duplicated on the pain subscale (SIMILAR-8; the other version avoided overlapping activities (DISSIMILAR-8. Factorial validity of the shortened measures was assessed on 310 patients awaiting hip or knee arthroplasty. The shortened measures' abilities to detect change were examined on a sample of 104 patients following primary hip or knee arthroplasty. The WOMAC and three performance measures that included activity specific pain assessments – 40 m walk test, stair test, and timed-up-and-go test – were administered preoperatively, within 16 days of hip or knee arthroplasty, and at an interval of greater than 20 days following the first post-surgical assessment. Standardized response means were used to quantify change. Results The SIMILAR-8 did not demonstrate factorial validity; however, the factorial structure of the DISSIMILAR-8 was supported. The time to complete the performance measures more than doubled between the preoperative and first postoperative assessments supporting the theory that lower extremity functional status diminished over this interval. The DISSIMILAR-8 detected this deterioration in functional
Full Text Available In a study of the Emotional Skills and Competence Questionnaire instrument (ESCQ; Takšić, 1998 three samples of university students from Balkan countries (Croatia, Serbia, and Slovenia were contrasted with two samples of university students from Nordic countries (Finland and Sweden. In total, 1978 students participated. Effects of country and gender were obtained from the ESCQ total scores, as well as from the subscale scores. The subsequent analyses of item bias, that is, differential item functioning (DIF, revealed a number of DIF items in pair wise comparisons of the samples, thus creating doubts about the fairness in comparing mean scores. Further analyses of the DIF items showed, however, that most of the item curve functions were uniform, and that effect sizes were low. It was also shown that the number of DIF items depended on which countries were compared. Spearman correlations between measures of number of DIF items and cultural values as measured by World Value Survey data were very high. Implications of these findings for future cross-cultural studies of the ESCQ instrument are discussed.
Fermino Fernandes Sisto
Full Text Available In this research evidences of construct validity were searched analyzing the differential functioning items related to aggressiveness. The participants were 445 college students of both genders, attending the courses of Engineering, Computing and Psychology. The scale of aggressiveness composed by 81 items was collectively applied, in the classroom, to the students who consented to participate in the study. The items of the instrument were studied by means of the Rasch model. Twenty-eight items presented differential functioning item, 15 were characterized as typical for females and 13 for males. The reliability coefficients were 0.99 to the items and 0.86 to the persons. It was concluded that the aggressiveness can be measured separately on the basis of gender.
Lin, Chung-Ying; Griffiths, Mark D; Pakpour, Amir H
Background and aims Research examining problematic mobile phone use has increased markedly over the past 5 years and has been related to "no mobile phone phobia" (so-called nomophobia). The 20-item Nomophobia Questionnaire (NMP-Q) is the only instrument that assesses nomophobia with an underlying theoretical structure and robust psychometric testing. This study aimed to confirm the construct validity of the Persian NMP-Q using Rasch and confirmatory factor analysis (CFA) models. Methods After ensuring the linguistic validity, Rasch models were used to examine the unidimensionality of each Persian NMP-Q factor among 3,216 Iranian adolescents and CFAs were used to confirm its four-factor structure. Differential item functioning (DIF) and multigroup CFA were used to examine whether males and females interpreted the NMP-Q similarly, including item content and NMP-Q structure. Results Each factor was unidimensional according to the Rach findings, and the four-factor structure was supported by CFA. Two items did not quite fit the Rasch models (Item 14: "I would be nervous because I could not know if someone had tried to get a hold of me;" Item 9: "If I could not check my smartphone for a while, I would feel a desire to check it"). No DIF items were found across gender and measurement invariance was supported in multigroup CFA across gender. Conclusions Due to the satisfactory psychometric properties, it is concluded that the Persian NMP-Q can be used to assess nomophobia among adolescents. Moreover, NMP-Q users may compare its scores between genders in the knowledge that there are no score differences contributed by different understandings of NMP-Q items.
Despite the attention that has been given to gender and science, boys continue to outperform girls in science achievement, particularly by the end of secondary school. Because it is unclear whether gender differences have narrowed over time (Leder, 1992; Willingham & Cole, 1997), it is important to continue a line of inquiry into the nature of gender differences, specifically at the international level. The purpose of this study was to investigate gender differences in science achievement across two countries: United States and Spain. A secondary purpose was to demonstrate an alternative method for exploring gender differences based on the many-faceted Rasch model (1980). A secondary analysis of the data from the Third International Mathematics and Science Study (TIMSS) was used to examine the relationship between gender DIF (differential item functioning) and item characteristics (item type, content, and performance expectation) across both countries. Nationally representative samples of eighth grade students in the United States and Spain who participated in TIMSS were analyzed to answer the research questions in this study. In both countries, girls showed an advantage over boys on life science items and most extended response items, whereas boys, by and large, had an advantage on earth science, physics, and chemistry items. However, even within areas that favored boys, such as physics, there were items that were differentially easier for girls. In general, patterns in gender differences were similar across both countries although there were a few differences between the countries on individual items. It was concluded that simply looking at mean differences does not provide an adequate understanding of the nature of gender differences in science achievement.
...) from item response theory (IRT). DIF was found for the majority of the 40 items examined, although in many cases the DIF indicated improvements in the revised items. Implications for these scales and for the use of IRT with the MEOCS are discussed.
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Lix, Lisa M; Wu, Xiuyun; Hopman, Wilma; Mayo, Nancy; Sajobi, Tolulope T; Liu, Juxin; Prior, Jerilynn C; Papaioannou, Alexandra; Josse, Robert G; Towheed, Tanveer E; Davison, K Shawn; Sawatzky, Richard
Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample. Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects. The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size. SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
Lisa M Lix
Full Text Available Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36, can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF, which arises when population sub-groups with the same underlying (i.e., latent level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF and mental health (MH sub-scale items in a Canadian population-based sample.Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos, which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health produced estimates of the magnitude of DIF effects.The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size.SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
Pedersen, Eric R; Huang, Wenjing; Dvorak, Robert D; Prince, Mark A; Hummer, Justin F
Given recent state legislation legalizing marijuana for recreational purposes and majority popular opinion favoring these laws, we developed the Protective Behavioral Strategies for Marijuana scale (PBSM) to identify strategies that may mitigate the harms related to marijuana use among those young people who choose to use the drug. In the current study, we expand on the initial exploratory study of the PBSM to further validate the measure with a large and geographically diverse sample (N = 2,117; 60% women, 30% non-White) of college students from 11 different universities across the United States. We sought to develop a psychometrically sound item bank for the PBSM and to create a short assessment form that minimizes respondent burden and time. Quantitative item analyses, including exploratory and confirmatory factor analyses with item response theory (IRT) and evaluation of differential item functioning (DIF), revealed an item bank of 36 items that was examined for unidimensionality and good content coverage, as well as a short form of 17 items that is free of bias in terms of gender (men vs. women), race (White vs. non-White), ethnicity (Hispanic vs. non-Hispanic), and recreational marijuana use legal status (state recreational marijuana was legal for 25.5% of participants). We also provide a scoring table for easy transformation from sum scores to IRT scale scores. The PBSM item bank and short form associated strongly and negatively with past month marijuana use and consequences. The measure may be useful to researchers and clinicians conducting intervention and prevention programs with young adults. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Bunketorp Käll, Lina
This study explores the internal structure of the Self-Efficacy Scale (SES) using item response analysis. The SES was previously translated into Swedish and modified to encompass all types of pain, not exclusively back pain. Data on perceived self-efficacy in 47 patients with subacute whiplash-associated disorders were derived from a previously conducted randomized-controlled trial. The item-level factor analysis was carried out using a six-step procedure. To further study the item inter-relationships and to determine the underlying structure empirically, the 20 items of the SES were also subjected to principal component analysis with varimax rotation. The analyses showed two underlying factors, named 'social activities' and 'physical activities', with seven items loading on each factor. The remaining six items of the SES appeared to measure somewhat different constructs and need to be analysed further.
Ilich, Maria O.
Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Oude Voshaar, M.A.H.; ten Klooster, P.M.; Glas, C.A.W.; Vonkeman, H.E.; Taal, E; Krishnan, E.; Moens, H.J.B.; Boers, M.; Terwee, C.B.; van Riel, P.L.C.M.; van de Laar, M.A.F.J.
Objective: To calibrate the Dutch-Flemish version of the PROMIS physical function (PF) item bank in patients with rheumatoid arthritis (RA) and to evaluate cross-cultural measurement equivalence with US general population and RA data. Methods: Data were collected from RA patients enrolled in the
Oude Voshaar, Martijn A.H.; Ten Klooster, Peter M.; Vonkeman, Harald E.; van de Laar, Mart A.F.J.
Objective: Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Study
Erhart, M; Hagquist, C; Auquier, P; Rajmil, L; Power, M; Ravens-Sieberer, U
This study compares item reduction analysis based on classical test theory (maximizing Cronbach's alpha - approach A), with analysis based on the Rasch Partial Credit Model item-fit (approach B), as applied to children and adolescents' health-related quality of life (HRQoL) items. The reliability and structural, cross-cultural and known-group validity of the measures were examined. Within the European KIDSCREEN project, 3019 children and adolescents (8-18 years) from seven European countries answered 19 HRQoL items of the Physical Well-being dimension of a preliminary KIDSCREEN instrument. The Cronbach's alpha and corrected item total correlation (approach A) were compared with infit mean squares and the Q-index item-fit derived according to a partial credit model (approach B). Cross-cultural differential item functioning (DIF ordinal logistic regression approach), structural validity (confirmatory factor analysis and residual correlation) and relative validity (RV) for socio-demographic and health-related factors were calculated for approaches (A) and (B). Approach (A) led to the retention of 13 items, compared with 11 items with approach (B). The item overlap was 69% for (A) and 78% for (B). The correlation coefficient of the summated ratings was 0.93. The Cronbach's alpha was similar for both versions [0.86 (A); 0.85 (B)]. Both approaches selected some items that are not strictly unidimensional and items displaying DIF. RV ratios favoured (A) with regard to socio-demographic aspects. Approach (B) was superior in RV with regard to health-related aspects. Both types of item reduction analysis should be accompanied by additional analyses. Neither of the two approaches was universally superior with regard to cultural, structural and known-group validity. However, the results support the usability of the Rasch method for developing new HRQoL measures for children and adolescents.
Demura, Shinichi; Yamada, Takayoshi; Uchiyama, Masanobu; Sugiura, Hiroki; Hamazaki, Hiroshi
This study aimed to examine useful items for screening the fall risk of community dwelling elderly from various perspectives, including fall experience, physical function level, and age level difference. 968 independently living elderly persons over the age of 60 (age: 70.0 ± 7.0) responded to 80 fall risk items representing 7 factors (physical function, fall history, using devices, fear of falling and inactivity, dosing, disease and disability, and environment) and an ADL questionnaire. The high fall risk response rate was calculated for each item and tested for statistical significance among age groups and those with and without fall experience. Cramer's V was calculated to examine the relationship between each item and the ADL. In addition, we selected items with significant differences in the high fall risk response rates between the faller and the non-faller groups, a significant relationship with ADL, and a significant difference among age groups. A total of 40 useful items were selected from each fall risk factor (decrease in physical function: 21 items, fall history: 2 items, device usage: 3 items, fear of falling and inactivity: 5 items, dosing: 0 items, disease and disability: 8 items, and environment: 1 item). Selected items can comprehensively and properly assess the fall risk of the healthy elderly as compared with existing questionnaires. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Ballert, Carolina S; Stucki, Gerold; Biering-Sørensen, Fin; Cieza, Alarcos
To determine whether the International Classification of Functioning, Disability and Health (ICF) categories relevant to spinal cord injury (SCI) can be integrated in clinical measures and to obtain insights to guide their future operationalization. Specific aims are to find out whether the ICF categories relevant to SCI fit a Rasch model taking into consideration the dimensionality found in previous investigations, local item dependencies, or differential item functioning. All second-level ICF categories collected in the Development of ICF Core Sets for SCI project in specialized centers within 15 countries from 2006 through 2008. Secondary data analysis. Adults (N=1048) with SCI from the early postacute and long-term living context. Not applicable. Two unidimensional Rasch analyses: one for the ICF categories from body functions and body structures components and another for the ICF categories from the activities and participation component. Results support good reliability and targeting of the ICF categories in both dimensions. In each dimension, few ICF categories were subject to misfit. Local item dependency was observed between ICF categories of the same chapters. Group effects for age and sex were observed only to a small extent. The validity of ICF categories to develop measures of functioning in SCI for clinical practice and research is to some extent supported. Model adjustments were suggested to further improve their operationalization and psychometrics. Copyright © 2014 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Grigg, Kaine; Manderson, Lenore
Racism and associated discrimination are pervasive and persistent challenges with multiple cumulative deleterious effects contributing to inequities in various health outcomes. Globally, research over the past decade has shown consistent associations between racism and negative health concerns. Such research confirms that race endures as one of the strongest predictors of poor health. Due to the lack of validated Australian measures of racist attitudes, RACES (Racism, Acceptance, and Cultural-Ethnocentrism Scale) was developed. Here, we examine RACES' psychometric properties, including the latent structure, utilising Item Response Theory (IRT). Unidimensional and Multidimensional Rating Scale Model (RSM) Rasch analyses were utilised with 296 Victorian primary school students and 182 adolescents and 220 adults from the Australian community. RACES was demonstrated to be a robust 24-item three-dimensional scale of Accepting Attitudes (12 items), Racist Attitudes (8 items), and Ethnocentric Attitudes (4 items). RSM Rasch analyses provide strong support for the instrument as a robust measure of racist attitudes in the Australian context, and for the overall factorial and construct validity of RACES across primary school children, adolescents, and adults. RACES provides a reliable and valid measure that can be utilised across the lifespan to evaluate attitudes towards all racial, ethnic, cultural, and religious groups. A core function of RACES is to assess the effectiveness of interventions to reduce community levels of racism and in turn inequities in health outcomes within Australia.
Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Jette, Alan M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Brandt, Diane E; Rasch, Elizabeth K
To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Cross-sectional. Community. Item pools of behavioral health functioning were developed, refined, and field tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working because of mental or both mental and physical conditions. None. Social Security Administration Behavioral Health (SSA-BH) measurement instrument. Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the 4 scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these 4 distinct scales of function. This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work-related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work-related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Sachse, Karoline A.; Haag, Nicole
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…
Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa
The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
Sheldon, Signy; Levine, Brian
During autobiographical memory retrieval, the medial temporal lobes (MTL) relate together multiple event elements, including object (within-item relations) and context (item-context relations) information, to create a cohesive memory. There is consistent support for a functional specialization within the MTL according to these relational processes, much of which comes from recognition memory experiments. In this study, we compared brain activation patterns associated with retrieving within-item relations (i.e., associating conceptual and sensory-perceptual object features) and item-context relations (i.e., spatial relations among objects) with respect to naturalistic autobiographical retrieval. We developed a novel paradigm that cued participants to retrieve information about past autobiographical events, non-episodic within-item relations, and non-episodic item-context relations with the perceptuomotor aspects of retrieval equated across these conditions. We used multivariate analysis techniques to extract common and distinct patterns of activity among these conditions within the MTL and across the whole brain, both in terms of spatial and temporal patterns of activity. The anterior MTL (perirhinal cortex and anterior hippocampus) was preferentially recruited for generating within-item relations later in retrieval whereas the posterior MTL (posterior parahippocampal cortex and posterior hippocampus) was preferentially recruited for generating item-context relations across the retrieval phase. These findings provide novel evidence for functional specialization within the MTL with respect to naturalistic memory retrieval. © 2015 Wiley Periodicals, Inc.
Brown, K.F.; Rankin, W.N.
Yellow items used in Radiologically Controlled Areas (RCAs) that could contain hazardous metals were identified. X-ray fluorescence analyses indicated that thirty of the fifty-two items do contain hazardous metals. It is important to minimize the hazardous metals put into the wastes. The authors recommend that the specifications for all yellow items stocked in Stores be changed to specify that they contain no hazardous metals
Mueller, Michael M.; Nkosi, Ajamu; Hine, Jeffrey F.
Several review and epidemiological studies have been conducted over recent years to inform behavior analysts of functional analysis outcomes. None to date have closely examined demographic and clinical data for functional analyses conducted exclusively in public school settings. The current paper presents a data-based summary of 90 functional…
Polikoff, Morgan S.; May, Henry; Porter, Andrew C.; Elliott, Stephen N.; Goldring, Ellen; Murphy, Joseph
The Vanderbilt Assessment of Leadership in Education is a 360-degree assessment of the effectiveness of principals' learning-centered leadership behaviors. In this report, we present results from a differential item functioning (DIF) study of the assessment. Using data from a national field trial, we searched for evidence of DIF on school level,…
Messinger, H B; Messinger, M I
Recently in this journal Peters and Murphy challenged the validity of factor analyses done on bimodal handedness data, suggesting instead that right- and left-handers be studied separately. But bimodality may be avoidable if attention is paid to Oldfield's questionnaire format and instructions for the subjects. Two characteristics appear crucial: a two-column LEFT-RIGHT format for the body of the instrument and what we call Oldfield's Admonition: not to indicate strong preference for handedness item, such as write, unless "... the preference is so strong that you would never try to use the other hand unless absolutely forced to...". Attaining unimodality of an item distribution would seem to overcome the objections of Peters and Murphy. In a 1984 survey in Boston we used Oldfield's ten-item questionnaire exactly as published. This produced unimodal item distributions. With reflection of the five-point item scale and a logarithmic transformation, we achieved a degree of normalization for the items. Two surveys elsewhere based on Oldfield's 20-item list but with changes in the questionnaire format and the instructions, yielded markedly different item distributions with peaks at each extreme and sometimes in the middle as well.
Rahmani, B. D.
The purpose of this paper is to evaluate Indonesian senior high school teacher’s pedagogical content knowledge also their perception toward curriculum changing in West Java Indonesia. The data used in this study were derived from a questionnaire survey conducted among teachers in Bandung, West Java. A total of 61 usable responses were collected. The Differential Item Functioning (DIFF) was used to analyze the data whether the item had a difference or not toward gender, education background also on school location. However, the result showed that there was no any significant difference on gender and school location toward the item response but educational background. As a conclusion, the teacher’s educational background influence on giving the response to the questionnaire. Therefore, it is suggested in the future to construct the items on the questionnaire which is coped the differences of the participant particularly the educational background.
Guns, Raf; Lioma, Christina; Larsen, Birger
One of the best known measures of information retrieval (IR) performance is the F-score, the harmonic mean of precision and recall. In this article we show that the curve of the F-score as a function of the number of retrieved items is always of the same shape: a fast concave increase to a maximu...
Big Five Inventory (BFI) is one of personality test had been adapted into Indonesia language. More research had been developed to adapt the Indonesian Big Five Inventory. The purpose of this research is to check whether BFI’s personality test is fair if apply to ethnic of Batak Toba and Java. Therefore, examination of BFI’s items is needed. In psychology, especially in psychometric study, it is called Differential Item Functioning (DIF). Subject in this research is 327 people around 18 to 40 ...
Mulcahey, Mary Jane; Kozin, Scott; Merenda, Lisa; Gaughan, John; Tian, Feng; Gogola, Gloria; James, Michelle A; Ni, Pengsheng
One of the greatest limitations to measuring outcomes in pediatric orthopaedics is the lack of effective instruments. Computer adaptive testing, which uses large item banks, select only items that are relevant to a child's function based on a previous response and filters items that are too easy or too hard or simply not relevant to the child. In this way, computer adaptive testing provides for a meaningful, efficient, and precise method to evaluate patient-reported outcomes. Banks of items that assess activity and upper extremity (UE) function have been developed for children with cerebral palsy and have enabled computer adaptive tests that showed strong reliability, strong validity, and broader content range when compared with traditional instruments. Because of the void in instruments for children with brachial plexus birth palsy (BPBP) and the importance of having an UE and activity scale, we were interested in how well these items worked in this population. Cross-sectional, multicenter study involving 200 children with BPBP was conducted. The box and block test (BBT) and Stereognosis tests were administered and patient reports of UE function and activity were obtained with the cerebral palsy item banks. Differential item functioning (DIF) was examined. Predictive ability of the BBT and stereognosis was evaluated with proportional odds logistic regression model. Spearman correlations coefficients (rs) were calculated to examine correlation between stereognosis and the BBT and between individual stereognosis items and the total stereognosis score. Six of the 86 items showed DIF, indicating that the activity and UE item banks may be useful for computer adaptive tests for children with BPBP. The penny and the button were strongest predictors of impairment level (odds ratio=0.34 to 0.40]. There was a good positive relationship between total stereognosis and BBT scores (rs=0.60). The BBT had a good negative (rs=-0.55) and good positive (rs=0.55) relationship with
Zampetakis, Leonidas A; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G; Moustakis, Vassilis
Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens' and women's entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women's reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender.
Portela-Buelvas, Katherin; Oviedo, Heidi C.; Herazo, Edwin; Campo-Arias, Adalberto
Introduction. Quality of life could be quantified with the Menopause Rating Scale (MRS), which evaluates the severity of somatic, psychological, and urogenital symptoms in menopause. However, differential item functioning (DIF) analysis has not been applied previously. Objective. To establish the DIF of the psychological domain of the MRS in Colombian women. Methods. 4,009 women aged between 40 and 59 years, who participated in the CAVIMEC (Calidad de Vida en la Menopausia y Etnias Colombianas) project, were included. Average age was 49.0 ± 5.9 years. Women were classified in mestizo, Afro-Colombian, and indigenous. The results were presented as averages and standard deviation (X ± SD). A p value <0.001 was considered statistically significant. Results. In mestizo women, the highest X ± SD were obtained in physical and mental exhaustion (PME) (0.86 ± 0.93) and the lowest ones in anxiety (0.44 ± 0.79). In Afro-Colombian women, an average score of 0.99 ± 1.07 for PME and 0.63 ± 0.88 for anxiety was gotten. Indigenous women obtained an increased average score for PME (1.33 ± 0.93). The lowest score was evidenced in depressive mood (0.50 ± 0.81), which is different from other Colombian women (p < 0.001). Conclusions. The psychological items of the MRS show differential functioning according to the ethnic group, which may induce systematic error in the measurement of the construct. PMID:27847825
Monterrosa-Castro, Alvaro; Portela-Buelvas, Katherin; Oviedo, Heidi C; Herazo, Edwin; Campo-Arias, Adalberto
Introduction. Quality of life could be quantified with the Menopause Rating Scale (MRS), which evaluates the severity of somatic, psychological, and urogenital symptoms in menopause. However, differential item functioning (DIF) analysis has not been applied previously. Objective . To establish the DIF of the psychological domain of the MRS in Colombian women. Methods . 4,009 women aged between 40 and 59 years, who participated in the CAVIMEC (Calidad de Vida en la Menopausia y Etnias Colombianas) project, were included. Average age was 49.0 ± 5.9 years. Women were classified in mestizo, Afro-Colombian, and indigenous. The results were presented as averages and standard deviation ( X ± SD). A p value <0.001 was considered statistically significant. Results . In mestizo women, the highest X ± SD were obtained in physical and mental exhaustion (PME) (0.86 ± 0.93) and the lowest ones in anxiety (0.44 ± 0.79). In Afro-Colombian women, an average score of 0.99 ± 1.07 for PME and 0.63 ± 0.88 for anxiety was gotten. Indigenous women obtained an increased average score for PME (1.33 ± 0.93). The lowest score was evidenced in depressive mood (0.50 ± 0.81), which is different from other Colombian women ( p < 0.001). Conclusions . The psychological items of the MRS show differential functioning according to the ethnic group, which may induce systematic error in the measurement of the construct.
Full Text Available The role of response time in completing an item can have very different interpretations. Responding more slowly could be positively related to success as the item is answered more carefully. However, the association may be negative if working faster indicates higher ability. The objective of this study was to clarify the validity of each assumption for reasoning items considering the mode of processing. A total of 230 persons completed a computerized version of Raven’s Advanced Progressive Matrices test. Results revealed that response time overall had a negative effect. However, this effect was moderated by items and persons. For easy items and able persons the effect was strongly negative, for difficult items and less able persons it was less negative or even positive. The number of rules involved in a matrix problem proved to explain item difficulty significantly. Most importantly, a positive interaction effect between the number of rules and item response time indicated that the response time effect became less negative with an increasing number of rules. Moreover, exploratory analyses suggested that the error type influenced the response time effect.
Schiltz, L; Schiltz, J
We present the general structure of a multi-annual research project. Our general expectancy concerns the possibilities of arts psychotherapy as a means of launching the blocked process of subjectivation with people suffering from exclusion, precarity and marginalization. The research project follows a complex research design with a sequential strategy, the first part consisting in an integrated psychosocial and clinical study using a mixed methodology. We constructed special rating scales for the analysis of the data of a semi-structured biographical interview and also for the holistic interpretation of the Rotter Blank Sentences Test, separating the associations to sentences beginning with the third and first person. The correlations between two sets of variables (biographical interview and Rotter test) were computed for the total experimental group (N=206), and for clinical subgroups. We shall analyse the matrices of correlations (Spearman's Rho) with the help of optimal scaling procedures (OVERALS). The links between traumatic biographical events and responses to the 3rd, respectively 1st person items of the Rotter test are interpreted in terms of unconscious versus conscious psychological processes and allow us analysing the expression of defence mechanisms and coping strategies. The results of the study are discussed in the light of the recent traumatogenic hypothesis of borderline functioning.
Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert
A psychometric analysis of 2 interview-based measures of cognitive deficits was conducted: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on 2 occasions to a sample of people with…
Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C
Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C
This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.
Lim, Bee Chiu; Kueh, Yee Cheng; Arifin, Wan Nor; Ng, Kok Huan
Heart disease knowledge is an important concept for health education, yet there is lack of evidence on proper validated instruments used to measure levels of heart disease knowledge in the Malaysian context. A cross-sectional, survey design was conducted to examine the psychometric properties of the adapted English version of the Heart Disease Knowledge Questionnaire (HDKQ). Using proportionate cluster sampling, 788 undergraduate students at Universiti Sains Malaysia, Malaysia, were recruited and completed the HDKQ. Item analysis and confirmatory factor analysis (CFA) were used for the psychometric evaluation. Construct validity of the measurement model was included. Most of the students were Malay (48%), female (71%), and from the field of science (51%). An acceptable range was obtained with respect to both the difficulty and discrimination indices in the item analysis results. The difficulty index ranged from 0.12-0.91 and a discrimination index of ≥ 0.20 were reported for the final retained 23 items. The final CFA model showed an adequate fit to the data, yielding a 23-item, one-factor model [weighted least squares mean and variance adjusted scaled chi-square difference = 1.22, degrees of freedom = 2, P-value = 0.544, the root mean square error of approximation = 0.03 (90% confidence interval = 0.03, 0.04); close-fit P-value = > 0.950]. Adequate psychometric values were obtained for Malaysian undergraduate university students using the 23-item, one-factor model of the adapted HDKQ.
Hong, Ickpyo; Velozo, Craig A; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L; Shulman, Lisa M
The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R (2) less than 10 %). The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59-0.85) and acceptable internal consistency (Cronbach's alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms.
Zampetakis, Leonidas A.; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G.; Moustakis, Vassilis
Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens’ and women’s entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women’s reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender. PMID:28386244
With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J. B.
on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). METHODS: In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients...... model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study...... sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. CONCLUSION: A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient...
Full Text Available 12106784 Structural and functional analyses of bacterial lipopolysaccharides. Carof...html) (.csml) Show Structural and functional analyses of bacterial lipopolysaccharides. PubmedID 12106784 Title Structural and functi...onal analyses of bacterial lipopolysaccharides. Authors
Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
Chundi, Parvathi; Rosenkrantz, Daniel J.
We propose a special type of time series, which we call an item-set time series, to facilitate the temporal analysis of software version histories, email logs, stock market data, etc. In an item-set time series, each observed data value is a set of discrete items. We formalize the concept of an item-set time series and present efficient algorithms for segmenting a given item-set time series. Segmentation of a time series partitions the time series into a sequence of segments where each segment is constructed by combining consecutive time points of the time series. Each segment is associated with an item set that is computed from the item sets of the time points in that segment, using a function which we call a measure function. We then define a concept called the segment difference, which measures the difference between the item set of a segment and the item sets of the time points in that segment. The segment difference values are required to construct an optimal segmentation of the time series. We describe novel and efficient algorithms to compute segment difference values for each of the measure functions described in the paper. We outline a dynamic programming based scheme to construct an optimal segmentation of the given item-set time series. We use the item-set time series segmentation techniques to analyze the temporal content of three different data sets—Enron email, stock market data, and a synthetic data set. The experimental results show that an optimal segmentation of item-set time series data captures much more temporal content than a segmentation constructed based on the number of time points in each segment, without examining the item set data at the time points, and can be used to analyze different types of temporal data.
Verdam, M.G.E.; Oort, F.J.; Sprangers, M.A.G.
Purpose Comparison of patient-reported outcomes may be invalidated by the occurrence of item bias, also known as differential item functioning. We show two ways of using structural equation modeling (SEM) to detect item bias: (1) multigroup SEM, which enables the detection of both uniform and
Waid-Ebbs, J Kay; Wen, Pey-Shan; Heaton, Shelley C; Donovan, Neila J; Velozo, Craig
To determine whether the psychometrics of the BRIEF-A are adequate for individuals diagnosed with TBI. A prospective observational study in which the BRIEF-A was collected as part of a larger study. Informant ratings of the 75-item BRIEF-A on 89 individuals diagnosed with TBI were examined to determine items level psychometrics for each of the two BRIEF-A indexes: Behaviour Rating Index (BRI) and Metacognitive Index (MI). Patients were either outpatients or at least 1 year post-injury. Each index measured a latent trait, separating individuals into five-to-six ability levels and demonstrated good reliability (0.94 and 0.96). Four items were identified that did not meet the infit criteria. The results provide support for the use of the BRIEF-A as a supplemental assessment of executive function in TBI populations. However, further validation is needed with other measures of executive function. Recommendations include use of the index scores over the Global Executive Composite score and use of the difficulty hierarchy for setting therapy goals.
Schnohr, Christina W.; Rasmussen, Charlotte L.; Langberg, Henning
of the Physical Function item bank into Danish. METHODS: We followed the PROMIS standard procedure, including: 1) two independent translations, 2) back translation, 3) independent reviews of translation quality, and 4) cognitive interviews with a representative sample of the adult population from the municipality...
Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G
Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the
Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal
The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.
Full Text Available Background. The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods. The ordinal logistic regression (OLR and hierarchical ordinal logistic regression (HOLR were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™ 4.0 collected from 576 healthy school children were analyzed. Results. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.
Full Text Available Item response theory (IRT becomes an increasingly important tool when analyzing “big data” gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large-enrollment physics course for scientists and engineers, the study compares outcomes from IRT analyses of exam and homework data, and then proceeds to investigate the effects of each confounding factor introduced in the online realm. It is found that IRT yields the correct trends for learner ability and meaningful item parameters, yet overall agreement with exam data is moderate. It is also found that learner ability and item discrimination is robust over a wide range with respect to model assumptions and introduced noise. Item difficulty is also robust, but over a narrower range.
Lesser, Lenard I; Wu, Leslie; Matthiessen, Timothy B; Luft, Harold S
To develop a technology-based method for evaluating the nutritional quality of chain-restaurant menus to increase the efficiency and lower the cost of large-scale data analysis of food items. Using a Modified Nutrient Profiling Index (MNPI), we assessed chain-restaurant items from the MenuStat database with a process involving three steps: (i) testing 'extreme' scores; (ii) crowdsourcing to analyse fruit, nut and vegetable (FNV) amounts; and (iii) analysis of the ambiguous items by a registered dietitian. In applying the approach to assess 22 422 foods, only 3566 could not be scored automatically based on MenuStat data and required further evaluation to determine healthiness. Items for which there was low agreement between trusted crowd workers, or where the FNV amount was estimated to be >40 %, were sent to a registered dietitian. Crowdsourcing was able to evaluate 3199, leaving only 367 to be reviewed by the registered dietitian. Overall, 7 % of items were categorized as healthy. The healthiest category was soups (26 % healthy), while desserts were the least healthy (2 % healthy). An algorithm incorporating crowdsourcing and a dietitian can quickly and efficiently analyse restaurant menus, allowing public health researchers to analyse the healthiness of menu items.
Anderson, George M.; Montazeri, Farhad; de Bildt, Annelies
A network conceptualization might contribute to understanding the occurrence and interacting nature of behavioral traits in the autism realm. Networks were constructed based on correlations of item scores of the Autism Diagnostic Observation Schedule for Modules 1, 2 and 3 obtained for a group of
Mitchell, Karen J; Mather, Mara; Johnson, Marcia K; Raye, Carol L; Greene, Erich J
We investigated the hypothesis that arousal recruits attention to item information, thereby disrupting working memory processes that help bind items to context. Using functional magnetic resonance imaging, we compared brain activity when participants remembered negative or neutral picture-location conjunctions (source memory) versus pictures only. Behaviorally, negative trials showed disruption of short-term source, but not picture, memory; long-term picture recognition memory was better for negative than for neutral pictures. Activity in areas involved in working memory and feature integration (precentral gyrus and its intersect with superior temporal gyrus) was attenuated on negative compared with neutral source trials relative to picture-only trials. Visual processing areas (middle occipital and lingual gyri) showed greater activity for negative than for neutral trials, especially on picture-only trials.
Chong Ho Yu
Full Text Available This paper aims to illustrate how data visualization could be utilized to identify errors prior to modeling, using an example with multi-dimensional item response theory (MIRT. MIRT combines item response theory and factor analysis to identify a psychometric model that investigates two or more latent traits. While it may seem convenient to accomplish two tasks by employing one procedure, users should be cautious of problematic items that affect both factor analysis and IRT. When sample sizes are extremely large, reliability analyses can misidentify even random numbers as meaningful patterns. Data visualization, such as median smoothing, can be used to identify problematic items in preliminary data cleaning.
Fermino Fernandes Sisto
Full Text Available Nesta pesquisa buscou-se evidência de validade de construto relacionada ao funcionamento dos itens para diferenciar sexos em um instrumento de agressividade. Participaram 445 universitários, de ambos os sexos, dos cursos de Engenharia, Computação e Psicologia. A escala de agressividade composta por 81 itens foi aplicada coletivamente, em sala de aula, nos estudantes que consentiram em participar do estudo. Os itens do instrumento foram analisados por meio do modelo Rasch. Vinte e oito itens apresentaram funcionamento diferencial, sendo 15 condutas mais características de pessoas do sexo feminino e outras 13 mais características do masculino. Os índices de precisão foram de 0,99 para os itens e 0,86 para as pessoas. Conclui-se que a agressividade pode ser medida separadamente em razão do sexo.In this research evidences of construct validity were searched analyzing the differential functioning items related to aggressiveness. The participants were 445 college students of both genders, attending the courses of Engineering, Computing and Psychology. The scale of aggressiveness composed by 81 items was collectively applied, in the classroom, to the students who consented to participate in the study. The items of the instrument were studied by means of the Rasch model. Twenty-eight items presented differential functioning item, 15 were characterized as typical for females and 13 for males. The reliability coefficients were 0.99 to the items and 0.86 to the persons. It was concluded that the aggressiveness can be measured separately on the basis of gender.
Moreau, Nicolas; Hassan, Ghayda; Rousseau, Cécile; Chenguiti, Khalid
This brief report illustrates how the migration context can affect specific item validity of mental health measures. The SCL-25 was administered to 432 recently settled immigrants (220 Haitian and 212 Arabs). We performed descriptive analyses, as well as Infit and Outfit statistics analyses using WINSTEPS Rasch Measurement Software based on Item Response Theory. The participants' comments about the item You feel everything requires a lot of effort in the SCL-25 were also qualitatively analyzed. Results revealed that the item You feel everything requires a lot of effort is an outlier and does not adjust in an expected and valid fashion with its cluster items, as it is over-endorsed by Haitian and Arab healthy participants. Our study thus shows that, in transcultural mental health research, the cultural and migratory contexts may interact and significantly influence the meaning of some symptom items and consequently, the validity of symptom scales.
Sacco, Paul; Torres, Luis R.; Cunningham-Williams, Renee M.; Woods, Carol; Unick, G. Jay
This study tested for the presence of differential item functioning (DIF) in DSM-IV Pathological Gambling Disorder (PGD) criteria based on gender, race/ethnicity and age. Using a nationally representative sample of adults from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), indicating current gambling (n = 10,899), Multiple Indicator-Multiple Cause (MIMIC) models tested for DIF, controlling for income, education, and marital status. Compared to the reference grou...
Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael
The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.
Anderson, George M.; Montazeri, Farhad; de Bildt, Annelies
A network conceptualization might contribute to understanding the occurrence and interacting nature of behavioral traits in the autism realm. Networks were constructed based on correlations of item scores of the Autism Diagnostic Observation Schedule for Modules 1, 2 and 3 obtained for a group of 477 Dutch individuals with developmental disorders.…
Fossati, Andrea; Widiger, Thomas A; Borroni, Serena; Maffei, Cesare; Somma, Antonella
To extend the evidence on the reliability and construct validity of the Five-Factor Model Rating Form (FFMRF) in its self-report version, two independent samples of Italian participants, which were composed of 510 adolescent high school students and 457 community-dwelling adults, respectively, were administered the FFMRF in its Italian translation. Adolescent participants were also administered the Italian translation of the Borderline Personality Features Scale for Children-11 (BPFSC-11), whereas adult participants were administered the Italian translation of the Triarchic Psychopathy Measure (TriPM). Cronbach α values were consistent with previous findings; in both samples, average interitem r values indicated acceptable internal consistency for all FFMRF scales. A multidimensional graded item response theory model indicated that the majority of FFMRF items had adequate discrimination parameters; information indices supported the reliability of the FFMRF scales. Both categorical (i.e., item-level) and scale-level regression analyses suggested that the FFMRF scores may predict a nonnegligible amount of variance in the BPFSC-11 total score in adolescent participants, and in the TriPM scale scores in adult participants.
Interpreting Mini-Mental State Examination Performance in Highly Proficient Bilingual Spanish-English and Asian Indian-English Speakers: Demographic Adjustments, Item Analyses, and Supplemental Measures.
Milman, Lisa H; Faroqi-Shah, Yasmeen; Corcoran, Chris D; Damele, Deanna M
Performance on the Mini-Mental State Examination (MMSE), among the most widely used global screens of adult cognitive status, is affected by demographic variables including age, education, and ethnicity. This study extends prior research by examining the specific effects of bilingualism on MMSE performance. Sixty independent community-dwelling monolingual and bilingual adults were recruited from eastern and western regions of the United States in this cross-sectional group study. Independent sample t tests were used to compare 2 bilingual groups (Spanish-English and Asian Indian-English) with matched monolingual speakers on the MMSE, demographically adjusted MMSE scores, MMSE item scores, and a nonverbal cognitive measure. Regression analyses were also performed to determine whether language proficiency predicted MMSE performance in both groups of bilingual speakers. Group differences were evident on the MMSE, on demographically adjusted MMSE scores, and on a small subset of individual MMSE items. Scores on a standardized screen of language proficiency predicted a significant proportion of the variance in the MMSE scores of both bilingual groups. Bilingual speakers demonstrated distinct performance profiles on the MMSE. Results suggest that supplementing the MMSE with a language screen, administering a nonverbal measure, and/or evaluating item-based patterns of performance may assist with test interpretation for this population.
Veerkamp, W.J.J.; Veerkamp, Wim J.J.; Berger, Martijn P.F.; Berger, Martijn
Items with the highest discrimination parameter values in a logistic item response theory model do not necessarily give maximum information. This paper derives discrimination parameter values, as functions of the guessing parameter and distances between person parameters and item difficulty, that
Ballert, Carolina S; Stucki, Gerold; Biering-Sørensen, Fin
item dependency was observed between ICF categories of the same chapters. Group effects for age and sex were observed only to a small extent. CONCLUSIONS: The validity of ICF categories to develop measures of functioning in SCI for clinical practice and research is to some extent supported. Model......OBJECTIVES: To determine whether the International Classification of Functioning, Disability and Health (ICF) categories relevant to spinal cord injury (SCI) can be integrated in clinical measures and to obtain insights to guide their future operationalization. Specific aims are to find out whether...... in specialized centers within 15 countries from 2006 through 2008. SETTING: Secondary data analysis. PARTICIPANTS: Adults (N=1048) with SCI from the early postacute and long-term living context. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Two unidimensional Rasch analyses: one for the ICF categories...
Construct validity of the items on the Stroke Specific Quality of Life (SS-QOL) questionnaire that evaluate the participation component of the International Classification of Functioning, Disability and Health.
Silva, Soraia Micaela; Corrêa, Fernanda Ishida; Pereira, Gabriela Santos; Faria, Christina Danielli Coelho de Morais; Corrêa, João Carlos Ferrari
Analyze the construct validity and internal consistency of the Stroke Specific Quality of Life (SS-QOL) items that address the participation component of the ICF as well as analyze the ceiling and floor effects. One hundred subjects were analyzed: 85 community-dwelling and 15 institutionalized individuals. The analysis of construct validity was performed using classic psychometrics: (1) the comparison of known groups (individuals without restriction to participation vs. those with restriction to participation) using the Mann-Whitney test and (2) convergent validity - correlation between the scores on the SS-QOL items that address participation and the subscale scores of measures used to evaluate the similar constructs and concepts [the Short-Form Health Survey (SF-36), Functional Independence Measure (FIM) and grip strength test]. Spearman's correlation coefficients were calculated for this analysis. Cronbach's α was used for the analysis of internal consistency and both the ceiling and floor effects were analyzed. The level of significance for all analyses was α = 0.05. The a priori hypotheses regarding construct validity were partially demonstrated, as only five of the eight domains exhibited positive moderate to strong correlations (r > 0.40) with measures that address constructs similar to those addressed on the SS-QOL questionnaire. The items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The ceiling and floor effects were considered adequate for the total SS-QOL score, but beyond acceptable standards for some domains. The 26 items of the SS-QOL questionnaire measure a multidimensional construct and therefore do not only address participation. However, the items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. Implications for rehabilitation The 26 items of the SS
Veerkamp, W.J.J.; Veerkamp, Wim J.J.; Berger, Martijn; Berger, Martijn P.F.
Items with the highest discrimination parameter values in a logistic item response theory (IRT) model do not necessarily give maximum information. This paper shows which discrimination parameter values (as a function of the guessing parameter and the distance between person ability and item
Gierl, Mark J; Lai, Hollis; Turner, Simon R
Many tests of medical knowledge, from the undergraduate level to the level of certification and licensure, contain multiple-choice items. Although these are efficient in measuring examinees' knowledge and skills across diverse content areas, multiple-choice items are time-consuming and expensive to create. Changes in student assessment brought about by new forms of computer-based testing have created the demand for large numbers of multiple-choice items. Our current approaches to item development cannot meet this demand. We present a methodology for developing multiple-choice items based on automatic item generation (AIG) concepts and procedures. We describe a three-stage approach to AIG and we illustrate this approach by generating multiple-choice items for a medical licensure test in the content area of surgery. To generate multiple-choice items, our method requires a three-stage process. Firstly, a cognitive model is created by content specialists. Secondly, item models are developed using the content from the cognitive model. Thirdly, items are generated from the item models using computer software. Using this methodology, we generated 1248 multiple-choice items from one item model. Automatic item generation is a process that involves using models to generate items using computer technology. With our method, content specialists identify and structure the content for the test items, and computer technology systematically combines the content to generate new test items. By combining these outcomes, items can be generated automatically. © Blackwell Publishing Ltd 2012.
Ni, Pengsheng; McDonough, Christine M; Jette, Alan M; Bogusz, Kara; Marfeo, Elizabeth E; Rasch, Elizabeth K; Brandt, Diane E; Meterko, Mark; Haley, Stephen M; Chan, Leighton
To develop and test an instrument to assess physical function for Social Security Administration (SSA) disability programs, the SSA-Physical Function (SSA-PF) instrument. Item response theory (IRT) analyses were used to (1) create a calibrated item bank for each of the factors identified in prior factor analyses, (2) assess the fit of the items within each scale, (3) develop separate computer-adaptive testing (CAT) instruments for each scale, and (4) conduct initial psychometric testing. Cross-sectional data collection; IRT analyses; CAT simulation. Telephone and Internet survey. Two samples: SSA claimants (n=1017) and adults from the U.S. general population (n=999). None. Model fit statistics, correlation, and reliability coefficients. IRT analyses resulted in 5 unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. On comparing the simulated CATs with the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared with those of a sample of U.S. adults. The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Shaw, Amanda M; Rogge, Ronald D
This study took a critical look at the construct of sexual quality. The 65 items of four well-validated self-report measures of sexual satisfaction (the Index of Sexual Satisfaction [ISS], Hudson, Harrison, & Crosscup, 1981; the Global Measure of Sexual Satisfaction [GMSEX], Lawrance & Byers, 1995; the Pinney Sexual Satisfaction Inventory [PSSI], Pinney, Gerrard, & Denney, 1987; the Young Sexual Satisfaction Scale [YSSS], Young, Denny, Luquis, & Young, 1998) and an additional 74 potential sexual quality items were given to 3060 online participants. Using Item Response Theory (IRT), we demonstrated that the ISS, YSSS, and PSSI scales provided suboptimal levels of precision in assessing sexual quality, particularly given the length of those scales. Exploratory factor analyses, IRT, differential item functioning analyses, and longitudinal responsiveness analyses were used to develop and evaluate the Quality of Sex Inventory. Results suggested that, in comparison to existing scales, the QSI (1) offers investigators and clinicians more theoretically focused scales, (2) distinguishes sexual satisfaction from sexual dissatisfaction, and (3) offers greater precision and power for detecting differences with (4) comparably high levels of responsiveness for detecting change over time despite being notably shorter than most of the existing scales. The QSI-satisfaction subscales demonstrated strong convergent validity with other measures of sexual satisfaction and excellent construct validity with anchor scales from the nomological net surrounding that construct, suggesting that they continue to assess the same theoretical construct as prior scales. Implications for research are discussed.
Qian, Xiaoyu; Nandakumar, Ratna; Glutting, Joseoph; Ford, Danielle; Fifield, Steve
In this study, we investigated gender and minority achievement gaps on 8th-grade science items employing a multilevel item response methodology. Both gaps were wider on physics and earth science items than on biology and chemistry items. Larger gender gaps were found on items with specific topics favoring male students than other items, for…
Martinková, Patrícia; Drabinová, Adéla; Liaw, Y.L.; Sanders, E.A.; McFarland, J.L.; Price, R.M.
Roč. 16, č. 2 (2017), č. článku rm2. ISSN 1931-7913 R&D Projects: GA ČR GJ15-15856Y Grant - others:NSF(US) DUE-1043443 Institutional support: RVO:67985807 Keywords : differential item functioning * fairness * conceptual assessments * concept inventory * undergraduate education * bias Subject RIV: AM - Education OBOR OECD: Education , special (to gifted persons, those with learning disabilities) Impact factor: 3.930, year: 2016
Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl
The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David
To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Marié De Beer
Opsomming Waar differensiële itemfunksioneringsprosedures (DIF-prosedures vir itemontleding gebaseer op itemresponsteorie (IRT tydens toetskonstruksie gebruik word, is dit moontlik om itemkarakteristiekekrommes vir dieselfde item vir verskillende subgroepe voor te stel. Hierdie krommes dui aan hoe elke item vir die verskillende subgroepe op verskillende vermoënsvlakke te funksioneer. DIF word aangetoon deur die area tussen die krommes. DIF is in die konstruksie van die 'Learning Potential Computerised Adaptive test (LPCAT' gebruik om die items te identifiseer wat sydigheid ten opsigte van geslag, kultuur, taal of opleidingspeil geopenbaar het. Items wat ’n voorafbepaalde vlak van DIF oorskry het, is uit die finale itembank weggelaat, ongeag die subgroep wat bevoordeel of benadeel is. Die proses en resultate van die DIF-ontleding word bespreek.
Kelderman, Henk; Rijkes, Carl P.M.; Rijkes, Carl
A loglinear IRT model is proposed that relates polytomously scored item responses to a multidimensional latent space. The analyst may specify a response function for each response, indicating which latent abilities are necessary to arrive at that response. Each item may have a different number of
Zhang, Wei; van Ast, Vanessa A; Klumpers, Floris; Roelofs, Karin; Hermans, Erno J
Memory recall is facilitated when retrieval occurs in the original encoding context. This context dependency effect likely results from the automatic binding of central elements of an experience with contextual features (i.e., memory "contextualization") during encoding. However, despite a vast body of research investigating the neural correlates of explicit associative memory, the neural interactions during encoding that predict implicit context-dependent memory remain unknown. Twenty-six participants underwent fMRI during encoding of salient stimuli (faces), which were overlaid onto unique background images (contexts). To index subsequent context-dependent memory, face recognition was tested either in intact or rearranged contexts, after scanning. Enhanced face recognition in intact relative to rearranged contexts evidenced successful memory contextualization. Overall subsequent memory effects (brain activity predicting whether items were later remembered vs. forgotten) were found in the left inferior frontal gyrus (IFG) and right amygdala. Effective connectivity analyses showed that stronger context-dependent memory was associated with stronger coupling of the left IFG with face- and place-responsive areas, both within and between participants. Our findings indicate an important role for the IFG in integrating information across widespread regions involved in the representation of salient items and contextual features.
Lei, Pui-Wa; Wu, Qiong
This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach's alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user's manual that contains instructions and examples are downloadable from suen.ed.psu.edu/-pwlei/plei.htm.
Full Text Available Abstract Background A new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression. The ability of biologists to analyze and interpret such data relies on functional annotation of the included proteins, but even in highly characterized organisms many proteins can lack the functional evidence necessary to infer their biological relevance. Results Here we have applied high confidence function predictions from our automated prediction system, PFP, to three genome sequences, Escherichia coli, Saccharomyces cerevisiae, and Plasmodium falciparum (malaria. The number of annotated genes is increased by PFP to over 90% for all of the genomes. Using the large coverage of the function annotation, we introduced the functional similarity networks which represent the functional space of the proteomes. Four different functional similarity networks are constructed for each proteome, one each by considering similarity in a single Gene Ontology (GO category, i.e. Biological Process, Cellular Component, and Molecular Function, and another one by considering overall similarity with the funSim score. The functional similarity networks are shown to have higher modularity than the protein-protein interaction network. Moreover, the funSim score network is distinct from the single GO-score networks by showing a higher clustering degree exponent value and thus has a higher tendency to be hierarchical. In addition, examining function assignments to the protein-protein interaction network and local regions of genomes has identified numerous cases where subnetworks or local regions have functionally coherent proteins. These results will help interpreting interactions of proteins and gene orders in a genome. Several examples of both analyses are highlighted. Conclusion The analyses demonstrate that applying high confidence predictions from PFP
Sison, Jo Ann G; Mather, Mara
In the part-set cuing effect, cuing a subset of previously studied items impairs recall of the remaining noncued items. This experiment reveals that cuing participants with previously-studied emotional pictures (e.g., fear-evoking pictures of people) can impair recall of pictures involving the same emotion but different content (e.g., fear-evoking pictures of animals). This indicates that new events can be organized in memory using emotion as a grouping function to create associations. However, whether new information is organized in memory along emotional or nonemotional lines appears to be a flexible process that depends on people's current focus. Mentioning in the instructions that the pictures were either amusement- or fear-related led to memory impairment for pictures with the same emotion as cued pictures, whereas mentioning that the pictures depicted either animals or people led to memory impairment for pictures with the same type of actor.
Gierl, Mark J.; Lai, Hollis
Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W
To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu
Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A
Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical
de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang
This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
Full Text Available The study was to identify the load, the type and the significance of differential item functioning (DIF in constructed response item using the partial credit model (PCM. The data in the study were the students’ instruments and the students’ responses toward the PISA-like test items that had been completed by 386 ninth grade students and 460 tenth grade students who had been about 15 years old in the Province of Yogyakarta Special Region in Indonesia. The analysis toward the item characteristics through the student categorization based on their class was conducted toward the PCM using CONQUEST software. Furthermore, by applying these items characteristics, the researcher draw the category response function (CRF graphic in order to identify whether the type of DIF content had been in uniform or non-uniform. The significance of DIF was identified by comparing the discrepancy between the difficulty level parameter and the error in the CONQUEST output results. The results of the analysis showed that from 18 items that had been analyzed there were 4 items which had not been identified load DIF, there were 5 items that had been identified containing DIF but not statistically significant and there were 9 items that had been identified containing DIF significantly. The causes of items containing DIF were discussed.
Babiar, Tasha Calvert
Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth item-level analysis across two countries: Spain and the United States. This study investigated eighth-grade gender differences on science items across the two countries. A secondary purpose of the study was to explore the nature of gender differences using the many-faceted Rasch Model as a way to estimate gender DIF. A secondary analysis of data from the Third International Mathematics and Science Study (TIMSS) was used to address three questions: 1) Does gender DIF in science achievement exist? 2) Is there a relationship between gender DIF and characteristics of the science items? 3) Do the relationships between item characteristics and gender DIF in science items replicate across countries. Participants included 7,087 eight grade students from the United States and 3,855 students from Spain who participated in TIMSS. The Facets program (Linacre and Wright, 1992) was used to estimate gender DIF. The results of the analysis indicate that the content of the item seemed to be related to gender DIF. The analysis also suggests that there is a relationship between gender DIF and item format. No pattern of gender DIF related to cognitive demand was found. The general pattern of gender DIF was similar across the two countries used in the analysis. The strength of item-level analysis as opposed to group mean difference analysis is that gender differences can be detected at the item level, even when no mean differences can be detected at the group level.
Marfeo, Elizabeth E; Ni, Pengsheng; McDonough, Christine; Peterik, Kara; Marino, Molly; Meterko, Mark; Rasch, Elizabeth K; Chan, Leighton; Brandt, Diane; Jette, Alan M
Purpose To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function. Methods Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB. Results Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties. Conclusions IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.
Adema, Jos J.; van der Linden, Willem J.
Recently, linear programming models for test construction were developed. These models were based on the information function from item response theory. In this paper another approach is followed. Two 0-1 linear programming models for the construction of tests using classical item and test
Chung, Sang M; Lee, David J; Hand, Austin; Young, Philip; Vaidyanathan, Jayabharathi; Sahajwalla, Chandrahas
The study evaluated whether the renal function decline rate per year with age in adults varies based on two primary statistical analyses: cross-section (CS), using one observation per subject, and longitudinal (LT), using multiple observations per subject over time. A total of 16628 records (3946 subjects; age range 30-92 years) of creatinine clearance and relevant demographic data were used. On average, four samples per subject were collected for up to 2364 days (mean: 793 days). A simple linear regression and random coefficient models were selected for CS and LT analyses, respectively. The renal function decline rates per year were 1.33 and 0.95 ml/min/year for CS and LT analyses, respectively, and were slower when the repeated individual measurements were considered. The study confirms that rates are different based on statistical analyses, and that a statistically robust longitudinal model with a proper sampling design provides reliable individual as well as population estimates of the renal function decline rates per year with age in adults. In conclusion, our findings indicated that one should be cautious in interpreting the renal function decline rate with aging information because its estimation was highly dependent on the statistical analyses. From our analyses, a population longitudinal analysis (e.g. random coefficient model) is recommended if individualization is critical, such as a dose adjustment based on renal function during a chronic therapy. Copyright © 2015 John Wiley & Sons, Ltd.
Item response theory (IRT) is a framework for modeling and analyzing item response data. Item-level modeling gives IRT advantages over classical test theory. The fit of an item score pattern to an item response theory (IRT) models is a necessary condition that must be assessed for further use of item and models that best fit ...
Freeman, Emily; Heathcote, Andrew; Chalmers, Kerry; Hockley, William
We investigate the effects of word characteristics on episodic recognition memory using analyses that avoid Clark's (1973) "language-as-a-fixed-effect" fallacy. Our results demonstrate the importance of modeling word variability and show that episodic memory for words is strongly affected by item noise (Criss & Shiffrin, 2004), as measured by the…
Elif Bengi ÜNSAL ÖZBERK
Full Text Available The purpose of this study is to investigate potential gender and socio-economic status bias in theWechler Intelligence Scale for Children: Fourth Edition (WISC-4 by using several differential item functioning detection techniques. In this study, WISC-4 Turkish standardization test pilot data including 817 children were used. In accordance with the purpose of the study, 315 items were used both in polytomously scored subtests such as Block Design, Similarities, Digit Span, Vocabulary, Letter-Number Sequencing, Comprehension, and dichotomously scored subtests such as Picture Concepts, Matrix Reasoning, Picture Completion, Information, Arithmetic, and Word Reasoning. While Rasch Model, Mantel-Haenszel, and SIBTEST DIF detection techniques were used for dichotomously scored items, Partila Credit Model, Mantel, and Poly-SIBTEST techniques were used for polytomously scored items. In terms of DIF techniques, Mantel-Haenszel, SIBTEST and Mantel Test, Poly-SIBTEST analyses provided similar results when DIF based on gender was investigated. In addition Mantel-Haenszel, Rasch estimations and Partial Credit Model, Mantel Test results were similar while investigating DIF according to socioeconomic status.
Magis, David; Facon, Bruno
Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Azevedo, Caio L. N.; Migon, Helio S.
In this paper we introduce a new item response theory (IRT) model with a generalized Student t-link function with unknown degrees of freedom (df), named generalized t-link (GtL) IRT model. In this model we consider only the difficulty parameter in the item response function. GtL is an alternative to the two parameter logit and probit models, since the degrees of freedom (df) play a similar role to the discrimination parameter. However, the behavior of the curves of the GtL is different from those of the two parameter models and the usual Student t link, since in GtL the curve obtained from different df's can cross the probit curves in more than one latent trait level. The GtL model has similar proprieties to the generalized linear mixed models, such as the existence of sufficient statistics and easy parameter interpretation. Also, many techniques of parameter estimation, model fit assessment and residual analysis developed for that models can be used for the GtL model. We develop fully Bayesian estimation and model fit assessment tools through a Metropolis-Hastings step within Gibbs sampling algorithm. We consider a prior sensitivity choice concerning the degrees of freedom. The simulation study indicates that the algorithm recovers all parameters properly. In addition, some Bayesian model fit assessment tools are considered. Finally, a real data set is analyzed using our approach and other usual models. The results indicate that our model fits the data better than the two parameter models.
Khan, Anzalee; Lindenmayer, Jean-Pierre; Opler, Mark; Yavorsky, Christian; Rothman, Brian; Lucic, Luka
Debate persists with regard to how best to categorize the syndromal dimension of negative symptoms in schizophrenia. The aim was to first review published Principle Components Analysis (PCA) of the PANSS, and extract items most frequently included in the negative domain, and secondly, to examine the quality of items using Item Response Theory (IRT) to select items that best represent a measurable dimension (or dimensions) of negative symptoms. First, 22 factor analyses and PCA met were included. Second, using a large dataset (n=7187) of participants in clinical trials with chronic schizophrenia, we extracted items loading on one or more PCA. Third, items not loading with a value of ≥ 0.5, or loading on more than one component with values of ≥ 0.5 were discarded. Fourth, resulting items were included in a non-parametric IRT and retained based on Option Characteristic Curves (OCCs) and Item Characteristic Curves (ICCs). 15 items loaded on a negative domain in at least one study, with Emotional Withdrawal loading on all studies. Non-parametric IRT retained nine items as an Integrated Negative Factor: Emotional Withdrawal, Blunted Affect, Passive/Apathetic Social Withdrawal, Poor Rapport, Lack of Spontaneity/Conversation Flow, Active Social Avoidance, Disturbance of Volition, Stereotyped Thinking and Difficulty in Abstract Thinking. This is the first study to use a psychometric IRT process to arrive at a set of negative symptom items. Future steps will include further examination of these nine items in terms of their stability, sensitivity to change, and correlations with functional and cognitive outcomes. © 2013 Elsevier B.V. All rights reserved.
Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica
The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
J.E.M. van Nierop; D. Fok (Dennis); Ph.H.B.F. Franses (Philip Hans)
textabstractSales models are mainly used to analyze markets with a fairly small number of items, obtained after aggregating to the brand level. In practice one may require analyses at a more disaggregate level. For example, brand managers may be interested in a comparison across product
Hu Jinhui; Xu Ziyan; Sun Naifeng; Yan Aoshuang; Gao Wei; Wang Binglin
Statistical analyses on initial contamination of 624 disposal medical infusion items are made. The normal distribution of the initial contamination, the relation of initial contamination of inner and outer walls of disposal medical infusion items and the changes of initial contamination before irradiation are shown. The sterilized dose for disposal infusion is determined as 17.2 kGy using bioburden information. The SAL (sterility assurance level) dose is 10 6 . The SIP (device sample item proportion) is 1 and the average initial contamination is 7 CFU/item
Doolittle, Allen E.; Cleary, T. Anne
Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)
Multani, Namita; Rudzicz, Frank; Wong, Wing Yiu Stephanie; Namasivayam, Aravind Kumar; van Lieshout, Pascal
Purpose: Random item generation (RIG) involves central executive functioning. Measuring aspects of random sequences can therefore provide a simple method to complement other tools for cognitive assessment. We examine the extent to which RIG relates to specific measures of cognitive function, and whether those measures can be estimated using RIG…
Camos, Valérie; Lagner, Prune; Loaiza, Vanessa M
Although verbal recall of item and order information is well-researched in short-term memory paradigms, there is relatively little research concerning item and order recall from working memory. The following study examined whether manipulating the opportunity for attentional refreshing and articulatory rehearsal in a complex span task differently affected the recall of item- and order-specific information of the memoranda. Five experiments varied the opportunity for articulatory rehearsal and attentional refreshing in a complex span task, but the type of recall was manipulated between experiments (item and order, order only, and item only recall). The results showed that impairing attentional refreshing and articulatory rehearsal similarly affected recall regardless of whether the scoring procedure (Experiments 1 and 4) or recall requirements (Experiments 2, 3, and 5) reflected item- or order-specific recall. This implies that both mechanisms sustain the maintenance of item and order information, and suggests that the common cumulative functioning of these two mechanisms to maintain items could be at the root of order maintenance.
Reeve, Bryce B; Stover, Angela M; Alfano, Catherine M; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B; Piper, Barbara F
Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study's primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Breast cancer survivors (n = 799; stages in situ through IIIa; ages 29-86 years) were recruited through three SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has four subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale's content validity, items' relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87 to 0.89, compared to 0.90-0.94 prior to item removal. The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African
This article investigates a version of the International English Language Testing System (IELTS) listening test for evidence of differential item functioning (DIF) based on gender, nationality, age, and degree of previous exposure to the test. Overall, the listening construct was found to be underrepresented, which is probably an important cause…
Effects of memantine on cognition in patients with moderate to severe Alzheimer's disease: post-hoc analyses of ADAS-cog and SIB total and single-item scores from six randomized, double-blind, placebo-controlled studies.
Mecocci, Patrizia; Bladström, Anna; Stender, Karina
The post-hoc analyses reported here evaluate the specific effects of memantine treatment on ADAS-cog single-items or SIB subscales for patients with moderate to severe AD. Data from six multicentre, randomised, placebo-controlled, parallel-group, double-blind, 6-month studies were used as the basis for these post-hoc analyses. All patients with a Mini-Mental State Examination (MMSE) score of less than 20 were included. Analyses of patients with moderate AD (MMSE: 10-19), evaluated with the Alzheimer's disease Assessment Scale (ADAS-cog) and analyses of patients with moderate to severe AD (MMSE: 3-14), evaluated using the Severe Impairment Battery (SIB), were performed separately. The mean change from baseline showed a significant benefit of memantine treatment on both the ADAS-cog (p ADAS-cog single-item analyses showed significant benefits of memantine treatment, compared to placebo, for mean change from baseline for commands (p < 0.001), ideational praxis (p < 0.05), orientation (p < 0.01), comprehension (p < 0.05), and remembering test instructions (p < 0.05) for observed cases (OC). The SIB subscale analyses showed significant benefits of memantine, compared to placebo, for mean change from baseline for language (p < 0.05), memory (p < 0.05), orientation (p < 0.01), praxis (p < 0.001), and visuospatial ability (p < 0.01) for OC. Memantine shows significant benefits on overall cognitive abilities as well as on specific key cognitive domains for patients with moderate to severe AD. (c) 2009 John Wiley & Sons, Ltd.
Yu, Tzu-Chieh; Jowsey, Tanisha; Henning, Marcus
The Readiness for Interprofessional Learning Scale (RIPLS) was developed to assess undergraduate readiness for engaging in interprofessional education (IPE). It has become an accepted and commonly used instrument. To determine utility of a modified 16-item RIPLS instrument, exploratory and confirmatory factor analyses were performed. Data used were collected from a pre- and post-intervention study involving 360 New Zealand undergraduate students from one university. Just over half of the participants were enrolled in medicine (51%) while the remainder were in pharmacy (27%) and nursing (22%). The intervention was a two-day simulation-based IPE course focused on managing unplanned acute medical problems in hospital wards ("ward calls"). Immediately prior to the course, 288 RIPLS were collected and immediately afterwards, 322 (response rates 80% and 89%, respectively). Exploratory factor analysis involving principal axis factoring with an oblique rotation method was conducted using pre-course data. The scree plot suggested a three-factor solution over two- and four-factor solutions. Subsequent confirmatory factor analysis performed using post-course data demonstrated partial goodness-of-fit for this suggested three-factor model. Based on these findings, further robust psychometric testing of the RIPLS or modified versions of it is recommended before embarking on its use in evaluative research in various healthcare education settings.
Friedman, Bruce; Heisel, Marnin; Delavan, Rachel
To examine criterion and construct validity of the five-item Mental Health Index (MHI-5) of the 36-item Short Form health survey (SF-36) in relation to the presence of major depression in functionally impaired, community-dwelling elderly patients and of eight subsamples defined by cognitive functioning, levels of functional impairment, and proxy report versus self-report. Cross-sectional observational. Nineteen counties in western New York, West Virginia, and Ohio. One thousand four hundred forty-four functionally impaired, community-dwelling Medicare beneficiaries aged 65 and older who participated in the Medicare Primary and Consumer-Directed Care Demonstration. MHI-5, Mini-International Neuropsychiatric Interview Major Depressive Episode (MINI-MDE) module. The MHI-5 demonstrated sufficient criterion validity (area under the receiver operating characteristic curve=0.837; sensitivity=78.7% and specificity=72.1% using a cutpoint of 59/60) with respect to the presence of depression for the entire sample. A significant correlation between MHI-5 scores and presence of major depression as identified using the MINI-MDE (Spearman correlation=-0.426, Pvalidity. Additional evidence is provided by decline in mean MHI-5 score as level of formal education and number of close friends and relatives decreased. All eight subsamples demonstrated similar criterion and construct validity. A Cronbach alpha of 0.794 demonstrated internal consistency reliability. This study provides evidence for adequate criterion and construct validity of the MHI-5 in relation to the presence of major depression among functionally impaired, community-dwelling elderly Medicare patients.
The study was to identify the load, the type and the significance of differential item functioning (DIF) in constructed response item using the partial credit model (PCM). The data in the study were the students’ instruments and the students’ responses toward the PISA-like test items that had been completed by 386 ninth grade students and 460 tenth grade students who had been about 15 years old in the Province of Yogyakarta Special Region in Indonesia. The analysis toward the item characteris...
The comparability of English, French and Dutch scores on the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F: an assessment of differential item functioning in patients with systemic sclerosis.
Full Text Available The Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc patients.The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC model was utilized to assess differential item functioning (DIF, comparing English versus French and versus Dutch patient responses separately.A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference.There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics.
Bjorner, Jakob B; Rose, Matthias; Gandek, Barbara
assistant (PDA), or personal computer (PC) on the Internet, and a second form by PC, in the same administration. Structural invariance, equivalence of item responses, and measurement precision were evaluated using confirmatory factor analysis and item response theory methods. RESULTS: Multigroup...... levels in IVR, PQ, or PDA administration as compared to PC. Availability of large item response theory-calibrated PROMIS item banks allowed for innovations in study design and analysis.......PURPOSE: To test the impact of method of administration (MOA) on the measurement characteristics of items developed in the Patient-Reported Outcomes Measurement Information System (PROMIS). METHODS: Two non-overlapping parallel 8-item forms from each of three PROMIS domains (physical function...
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations.
Teresi, Jeanne A; Ocepek-Welikson, Katja; Cook, Karon F; Kleinman, Marjorie; Ramirez, Mildred; Reid, M Carrington; Siu, Albert
Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System ® (PROMIS ® ) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, "How much did pain interfere with enjoyment of social activities?" was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations
Teresi, Jeanne A.; Ocepek-Welikson, Katja; Cook, Karon F.; Kleinman, Marjorie; Ramirez, Mildred; Reid, M. Carrington; Siu, Albert
Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System® (PROMIS®) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. Methods DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. Results The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, “How much did pain interfere with enjoyment of social activities?” was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and
Van der Elst, Wim; Ouwehand, Carolijn; van Rijn, Peter; Lee, Nikki; Van Boxtel, Martin; Jolles, Jelle
The purpose of the present study was to evaluate the psychometric properties of a shortened version of the Raven Standard Progressive Matrices (SPM) under an item response theory framework (the one- and two-parameter logistic models). The shortened Raven SPM was administered to N = 453 cognitively healthy adults aged between 24 and 83 years. The…
KLAUS D. KUBINGER
Full Text Available Multiple choice response formats are problematical as an item is often scored as solved simply because the test-taker is a lucky guesser. Instead of applying pertinent IRT models which take guessing effects into account, a pragmatic approach of re-conceptualizing multiple choice response formats to reduce the chance of lucky guessing is considered. This paper compares the free response format with two different multiple choice formats. A common multiple choice format with a single correct response option and five distractors (“1 of 6” is used, as well as a multiple choice format with five response options, of which any number of the five is correct and the item is only scored as mastered if all the correct response options and none of the wrong ones are marked (“x of 5”. An experiment was designed, using pairs of items with exactly the same content but different response formats. 173 test-takers were randomly assigned to two test booklets of 150 items altogether. Rasch model analyses adduced a fitting item pool, after the deletion of 39 items. The resulting item difficulty parameters were used for the comparison of the different formats. The multiple choice format “1 of 6” differs significantly from “x of 5”, with a relative effect of 1.63, while the multiple choice format “x of 5” does not significantly differ from the free response format. Therefore, the lower degree of difficulty of items with the “1 of 6” multiple choice format is an indicator of relevant guessing effects. In contrast the “x of 5” multiple choice format can be seen as an appropriate substitute for free response format.
Dziak, Renata; Peneder, Amy; Buetter, Alicia; Hageman, Cecilia
Trace DNA analysis is a significant part of a forensic laboratory's workload. Knowing optimal sampling strategies and item success rates for particular item types can assist in evidence selection and examination processes and shorten turnaround times. In this study, forensic short tandem repeat (STR) casework results were reviewed to determine how often STR profiles suitable for comparison were obtained from "handler" and "wearer" areas of 764 items commonly submitted for examination. One hundred and fifty-five (155) items obtained from volunteers were also sampled. Items were analyzed for best sampling location and strategy. For casework items, headwear and gloves provided the highest success rates. Experimentally, eyeglasses and earphones, T-shirts, fabric gloves and watches provided the highest success rates. Eyeglasses and latex gloves provided optimal results if the entire surfaces were swabbed. In general, at least 10%, and up to 88% of all trace DNA analyses resulted in suitable STR profiles for comparison. © 2017 American Academy of Forensic Sciences.
McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M
impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
Ashenberg, Orr; Laub, Michael T
Determining which residues of a protein contribute to a specific function is a difficult problem. Analyses of amino acid covariation within a protein family can serve as a useful guide by identifying residues that are functionally coupled. Covariation analyses have been successfully used on several different protein families to identify residues that work together to promote folding, enable protein-protein interactions, or contribute to an enzymatic activity. Covariation is a statistical signal that can be measured in a multiple sequence alignment of homologous proteins. As sequence databases have expanded dramatically, covariation analyses have become easier and more powerful. In this chapter, we describe how functional covariation arises during the evolution of proteins and how this signal can be distinguished from various background signals. We discuss the basic methodology for performing amino acid covariation analysis, using bacterial two-component signal transduction proteins as an example. We provide practical suggestions for each step of the process including assembly of protein sequences, construction of a multiple sequence alignment, measurement of covariation, and analysis of results. Copyright © 2013 Elsevier Inc. All rights reserved.
Elosua, Paula; Wells, Craig
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
Polak, Marike; De Rooij, Mark; Heiser, Willem J.
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Inamura, Patricia Y.; Uehara, Vanessa B.; Teixeira, Christian A.H.M.; Mastro, Nelida L. del
For most of prepackaged foods a 10 kGy radiation dose is considered the maximum dose needed; however, the commercially available and practically accepted packaging materials must be suitable for such application. This work describes the application of ionizing radiation on several packaged food items, using 5 dehydrated food items, 5 ready-to-eat meals and 5 ready-to-eat food items irradiated in a 60 Co gamma source with a 3 kGy dose. The quality evaluation of the irradiated samples was performed 2 and 8 months after irradiation. Microbiological analysis (bacteria, fungus and yeast load) was performed. The sensory characteristics were established for appearance, aroma, texture and flavor attributes were also established. From these data, the acceptability of all irradiated items was obtained. All ready-to-eat food items assayed like manioc flour, some pâtés and blocks of raw brown sugar and most of ready-to-eat meals like sausages and chicken with legumes were considered acceptable for microbial and sensory characteristics. On the other hand, the dehydrated food items chosen for this study, such as dehydrated bacon potatoes or pea soups were not accepted by the sensory analysis. A careful dose choice and special irradiation conditions must be used in order to achieve sensory acceptability needed for the commercialization of specific irradiated food items. - Highlights: ► We applied gamma radiation on several kinds of packaged food items. ► Microbiological and sensory analyses were performed 2 and 8 months after irradiation. ► All ready-to-eat food items assayed were approved for microbial and sensory characteristics. ► Most ready-to-eat meals like sausages and chicken with legumes were also acceptable. ► Dehydrated bacon potatoes or pea soups were considered not acceptable.
Baylor, Carolyn R.; Yorkston, Kathryn M.; Eadie, Tanya L.; Miller, Robert M.; Amtmann, Dagmar
Purpose: The purpose of this study was to conduct the initial psychometric analyses of the Communicative Participation Item Bank--a new self-report instrument designed to measure the extent to which communication disorders interfere with communicative participation. This item bank is intended for community-dwelling adults across a range of…
Kubinger, Klaus D.; Reif, Manuel; Yanagida, Takuya
Item position effects provoke serious problems within adaptive testing. This is because different testees are necessarily presented with the same item at different presentation positions, as a consequence of which comparing their ability parameter estimations in the case of such effects would not at all be fair. In this article, a specific…
The Comparability of English, French and Dutch Scores on the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F): An Assessment of Differential Item Functioning in Patients with Systemic Sclerosis
Kwakkenbos, Linda; Willems, Linda M.; Baron, Murray; Hudson, Marie; Cella, David; van den Ende, Cornelia H. M.; Thombs, Brett D.
Objective The Functional Assessment of Chronic Illness Therapy- Fatigue (FACIT-F) is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc) patients. Methods The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF), comparing English versus French and versus Dutch patient responses separately. Results A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD) lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference. Conclusions There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics. PMID:24638101
Irwin, Debra E; Gross, Heather E; Stucky, Brian D; Thissen, David; DeWitt, Esi Morgan; Lai, Jin Shei; Amtmann, Dagmar; Khastou, Leyla; Varni, James W; DeWalt, Darren A
Pediatric self-report should be considered the standard for measuring patient reported outcomes (PRO) among children. However, circumstances exist when the child is too young, cognitively impaired, or too ill to complete a PRO instrument and a proxy-report is needed. This paper describes the development process including the proxy cognitive interviews and large-field-test survey methods and sample characteristics employed to produce item parameters for the Patient Reported Outcomes Measurement Information System (PROMIS) pediatric proxy-report item banks. The PROMIS pediatric self-report items were converted into proxy-report items before undergoing cognitive interviews. These items covered six domains (physical function, emotional distress, social peer relationships, fatigue, pain interference, and asthma impact). Caregivers (n = 25) of children ages of 5 and 17 years provided qualitative feedback on proxy-report items to assess any major issues with these items. From May 2008 to March 2009, the large-scale survey enrolled children ages 8-17 years to complete the self-report version and caregivers to complete the proxy-report version of the survey (n = 1548 dyads). Caregivers of children ages 5 to 7 years completed the proxy report survey (n = 432). In addition, caregivers completed other proxy instruments, PedsQL™ 4.0 Generic Core Scales Parent Proxy-Report version, PedsQL™ Asthma Module Parent Proxy-Report version, and KIDSCREEN Parent-Proxy-52. Item content was well understood by proxies and did not require item revisions but some proxies clearly noted that determining an answer on behalf of their child was difficult for some items. Dyads and caregivers of children ages 5-17 years old were enrolled in the large-scale testing. The majority were female (85%), married (70%), Caucasian (64%) and had at least a high school education (94%). Approximately 50% had children with a chronic health condition, primarily asthma, which was diagnosed or treated within 6
M. Kleppe (Martijn); L. Hollink (Laura); M.J. Kemman (Max); D. Juric (Damir); H.J.G. Beunders (Henri); J. Blom (Jaap); J. Oomen (Johan); G.J. Houben (Geert Jan)
markdownabstractStudents and researchers of media and communication sciences study the role of media in our society. They frequently search through media archives to manually select items that cover a certain event. When this is done for large time spans and across media-outlets, this task can
Kisala, Pamela A; Tulsky, David S; Kalpakjian, Claire Z; Heinemann, Allen W; Pohlig, Ryan T; Carle, Adam; Choi, Seung W
To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.
Thyssen, Jacob P; Menné, Torkil; Johansen, Jeanne D
Nickel allergy is prevalent as assessed by epidemiological studies. In an attempt to further identify and characterize sources that may result in nickel allergy and dermatitis, we analysed items identified by nickel-allergic dermatitis patients as causative of nickel dermatitis by using the dimethylglyoxime (DMG) test. Dermatitis patients with nickel allergy of current relevance were identified over a 2-year period in a tertiary referral patch test centre. When possible, their work tools and personal items were examined with the DMG test. Among 95 nickel-allergic dermatitis patients, 70 (73.7%) had metallic items investigated for nickel release. A total of 151 items were investigated, and 66 (43.7%) gave positive DMG test reactions. Objects were nearly all purchased or acquired after the introduction of the EU Nickel Directive. Only one object had been inherited, and only two objects had been purchased outside of Denmark. DMG testing is valuable as a screening test for nickel release and should be used to identify relevant exposures in nickel-allergic patients. Mainly consumer items, but also work tools used in an occupational setting, released nickel in dermatitis patients. This study confirmed 'risk items' from previous studies, including mobile phones.
Qiu, Tian; Chen, Guang; Zhang, Zi-Ke; Zhou, Tao
Based on a hybrid algorithm incorporating the heat conduction and probability spreading processes (Proc. Natl. Acad. Sci. U.S.A., 107 (2010) 4511), in this letter, we propose an improved method by introducing an item-oriented function, focusing on solving the dilemma of the recommendation accuracy between the cold and popular items. Differently from previous works, the present algorithm does not require any additional information (e.g., tags). Further experimental results obtained in three real datasets, RYM, Netflix and MovieLens, show that, compared with the original hybrid method, the proposed algorithm significantly enhances the recommendation accuracy of the cold items, while it keeps the recommendation accuracy of the overall and the popular items. This work might shed some light on both understanding and designing effective methods for long-tailed online applications of recommender systems.
Razani, Jill; Wong, Jennifer T; Dafaeeboini, Natalia; Edwards-Lee, Terri; Lu, Po; Alessi, Cathy; Josephson, Karen
The Mini-Mental State Examination is a widely used cognitive screening measure. The purpose of the present study was to assess how 5 specific clusters of Mini-Mental State Examination items (ie, subscores) correlate with and predict specific areas of daily functioning in dementia patients, 61 patients with varied forms of dementia were administered the Mini-Mental State Examination and an observation-based daily functional test (the Direct Assessment of Functional Status). The results revealed that the orientation and attention subscores of the Mini-Mental State Examination correlated most significantly with most functional domains. The Mini-Mental State Examination language items correlated with all but the shopping and time orientation tasks, while the Mini-Mental State Examination recall items correlated with the Direct Assessment of Functional Status time orientation and shopping tasks. Stepwise regression analyses found that among the Mini-Mental State Examination subscores, orientation was the single, best independent predictor of daily functioning.
Mesic, Vanes; Muratovic, Hasnija
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge
Full Text Available Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal
Hidalgo, María D; López-Martínez, María D; Gómez-Benito, Juana; Guilera, Georgina
Short scales are typically used in the social, behavioural and health sciences. This is relevant since test length can influence whether items showing DIF are correctly flagged. This paper compares the relative effectiveness of discriminant logistic regression (DLR) and IRTLRDIF for detecting DIF in polytomous short tests. A simulation study was designed. Test length, sample size, DIF amount and item response categories number were manipulated. Type I error and power were evaluated. IRTLRDIF and DLR yielded Type I error rates close to nominal level in no-DIF conditions. Under DIF conditions, Type I error rates were affected by test length DIF amount, degree of test contamination, sample size and number of item response categories. DLR showed a higher Type I error rate than did IRTLRDIF. Power rates were affected by DIF amount and sample size, but not by test length. DLR achieved higher power rates than did IRTLRDIF in very short tests, although the high Type I error rate involved means that this result cannot be taken into account. Test length had an important impact on the Type I error rate. IRTLRDIF and DLR showed a low power rate in short tests and with small sample sizes.
Tiger, Jeffrey H.; Fisher, Wayne W.; Toussaint, Karen A.; Kodak, Tiffany
Most often functional analyses are initiated using a standard set of test conditions, similar to those described by Iwata, Dorsey, Slifer, Bauman, and Richman (1982/1994). These test conditions involve the careful manipulation of motivating operations, discriminative stimuli, and reinforcement contingencies to determine the events related to the occurrence and maintenance of problem behavior. Some individuals display problem behavior that is occasioned and reinforced by idiosyncratic or other...
Fenwick, Eva K; Pesudovs, Konrad; Khadka, Jyoti; Rees, Gwyn; Wong, Tien Y; Lamoureux, Ecosse L
We are developing an item bank assessing the impact of diabetic retinopathy (DR) on quality of life (QoL) using a rigorous multi-staged process combining qualitative and quantitative methods. We describe here the first two qualitative phases: content development and item evaluation. After a comprehensive literature review, items were generated from four sources: (1) 34 previously validated patient-reported outcome measures; (2) five published qualitative articles; (3) eight focus groups and 18 semi-structured interviews with 57 DR patients; and (4) seven semi-structured interviews with diabetes or ophthalmic experts. Items were then evaluated during 3 stages, namely binning (grouping) and winnowing (reduction) based on key criteria and panel consensus; development of item stems and response options; and pre-testing of items via cognitive interviews with patients. The content development phase yielded 1,165 unique items across 7 QoL domains. After 3 sessions of binning and winnowing, items were reduced to a minimally representative set (n = 312) across 9 domains of QoL: visual symptoms; ocular surface symptoms; activity limitation; mobility; emotional; health concerns; social; convenience; and economic. After 8 cognitive interviews, 42 items were amended resulting in a final set of 314 items. We have employed a systematic approach to develop items for a DR-specific QoL item bank. The psychometric properties of the nine QoL subscales will be assessed using Rasch analysis. The resulting validated item bank will allow clinicians and researchers to better understand the QoL impact of DR and DR therapies from the patient's perspective.
Peterson, Dwight J; Naveh-Benjamin, Moshe
An important yet unresolved question regarding visual working memory (VWM) relates to whether or not binding processes within VWM require additional attentional resources compared with processing solely the individual components comprising these bindings. Previous findings indicate that binding of surface features (e.g., colored shapes) within VWM is not demanding of resources beyond what is required for single features. However, it is possible that other types of binding, such as the binding of complex, distinct items (e.g., faces and scenes), in VWM may require additional resources. In 3 experiments, we examined VWM item-item binding performance under no load, articulatory suppression, and backward counting using a modified change detection task. Binding performance declined to a greater extent than single-item performance under higher compared with lower levels of concurrent load. The findings from each of these experiments indicate that processing item-item bindings within VWM requires a greater amount of attentional resources compared with single items. These findings also highlight an important distinction between the role of attention in item-item binding within VWM and previous studies of long-term memory (LTM) where declines in single-item and binding test performance are similar under divided attention. The current findings provide novel evidence that the specific type of binding is an important determining factor regarding whether or not VWM binding processes require attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Brandl, Simon J; Bellwood, David R
Detailed knowledge of a species' functional niche is crucial for the study of ecological communities and processes. The extent of niche overlap, functional redundancy and functional complementarity is of particular importance if we are to understand ecosystem processes and their vulnerability to disturbances. Coral reefs are among the most threatened marine systems, and anthropogenic activity is changing the functional composition of reefs. The loss of herbivorous fishes is particularly concerning as the removal of algae is crucial for the growth and survival of corals. Yet, the foraging patterns of the various herbivorous fish species are poorly understood. Using a multidimensional framework, we present novel individual-based analyses of species' realized functional niches, which we apply to a herbivorous coral reef fish community. In calculating niche volumes for 21 species, based on their microhabitat utilization patterns during foraging, and computing functional overlaps, we provide a measurement of functional redundancy or complementarity. Complementarity is the inverse of redundancy and is defined as less than 50% overlap in niche volumes. The analyses reveal extensive complementarity with an average functional overlap of just 15.2%. Furthermore, the analyses divide herbivorous reef fishes into two broad groups. The first group (predominantly surgeonfishes and parrotfishes) comprises species feeding on exposed surfaces and predominantly open reef matrix or sandy substrata, resulting in small niche volumes and extensive complementarity. In contrast, the second group consists of species (predominantly rabbitfishes) that feed over a wider range of microhabitats, penetrating the reef matrix to exploit concealed surfaces of various substratum types. These species show high variation among individuals, leading to large niche volumes, more overlap and less complementarity. These results may have crucial consequences for our understanding of herbivorous processes on
Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.
Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…
Liegl, Gregor; Gandek, Barbara; Fischer, H. Felix
precision between the short forms using different item formats. Results: Sufficient unidimensionality of all short-form items and the original PF item bank was supported. Compared to formats A and B, format C increased the range of reliable measurement by about 0.5 standard deviations on the positive side...
Conclusion: Item response models are useful for medical test analyses and provide valuable information about model comparisons and identification of differential items other than test reliability, item difficulty, and examinee's ability.
Irwin Debra E
Full Text Available Abstract Background Pediatric self-report should be considered the standard for measuring patient reported outcomes (PRO among children. However, circumstances exist when the child is too young, cognitively impaired, or too ill to complete a PRO instrument and a proxy-report is needed. This paper describes the development process including the proxy cognitive interviews and large-field-test survey methods and sample characteristics employed to produce item parameters for the Patient Reported Outcomes Measurement Information System (PROMIS pediatric proxy-report item banks. Methods The PROMIS pediatric self-report items were converted into proxy-report items before undergoing cognitive interviews. These items covered six domains (physical function, emotional distress, social peer relationships, fatigue, pain interference, and asthma impact. Caregivers (n = 25 of children ages of 5 and 17 years provided qualitative feedback on proxy-report items to assess any major issues with these items. From May 2008 to March 2009, the large-scale survey enrolled children ages 8-17 years to complete the self-report version and caregivers to complete the proxy-report version of the survey (n = 1548 dyads. Caregivers of children ages 5 to 7 years completed the proxy report survey (n = 432. In addition, caregivers completed other proxy instruments, PedsQL™ 4.0 Generic Core Scales Parent Proxy-Report version, PedsQL™ Asthma Module Parent Proxy-Report version, and KIDSCREEN Parent-Proxy-52. Results Item content was well understood by proxies and did not require item revisions but some proxies clearly noted that determining an answer on behalf of their child was difficult for some items. Dyads and caregivers of children ages 5-17 years old were enrolled in the large-scale testing. The majority were female (85%, married (70%, Caucasian (64% and had at least a high school education (94%. Approximately 50% had children with a chronic health condition, primarily
Full Text Available This paper deals with a deterministic inventory model for deteriorating items under the condition of permissible delay in payments with constant demand rate is a function of time which differs from before and after deterioration for a single item. Shortages are allowed and completely backlogged which is a function of time. Under these assumptions, this paper develops a retailer's model for obtaining an optimal cycle length and ordering quantity in deteriorating items of an inventory model. Thus, our objective is retailer's cost minimization problem to nd an optimal replenishment policy under various parameters. The convexity of the objective function is derived and the numerical examples are provided to support the proposed model. Sensitivity analysis of the optimal solution with respect to major parameters of the model is included and the implications are discussed.
Lebedeva, Elena; Huang, Mei; Koski, Lisa
The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.
DISC Predictive Scales (DPS): Factor structure and uniform differential item functioning across gender and three racial/ethnic groups for ADHD, conduct disorder, and oppositional defiant disorder symptoms.
Wiesner, Margit; Windle, Michael; Kanouse, David E; Elliott, Marc N; Schuster, Mark A
The factor structure and potential uniform differential item functioning (DIF) among gender and three racial/ethnic groups of adolescents (African American, Latino, White) were evaluated for attention deficit/hyperactivity disorder (ADHD), conduct disorder (CD), and oppositional defiant disorder (ODD) symptom scores of the DISC Predictive Scales (DPS; Leung et al., 2005; Lucas et al., 2001). Primary caregivers reported on DSM-IV ADHD, CD, and ODD symptoms for a probability sample of 4,491 children from three geographical regions who took part in the Healthy Passages study (mean age = 12.60 years, SD = 0.66). Confirmatory factor analysis indicated that the expected 3-factor structure was tenable for the data. Multiple indicators multiple causes (MIMIC) modeling revealed uniform DIF for three ADHD and 9 ODD item scores, but not for any of the CD item scores. Uniform DIF was observed predominantly as a function of child race/ethnicity, but minimally as a function of child gender. On the positive side, uniform DIF had little impact on latent mean differences of ADHD, CD, and ODD symptomatology among gender and racial/ethnic groups. Implications of the findings for researchers and practitioners are discussed. (c) 2015 APA, all rights reserved).
Lang, R.B.; Sigafoos, J.; Lancioni, G.E.; Didden, H.C.M.; Rispoli, M.
Analogue functional analyses are widely used to identify the operant function of problem behavior in individuals with developmental disabilities. Because problem behavior often occurs across multiple settings (e.g., homes, schools, outpatient clinics), it is important to determine whether the
Recent testing has identified the presence of hydrogen and oxygen in MIS Item 011589A. This isolated observation has effectuated concern regarding the potential for flammable gas mixtures in containers in the storage inventory. This study examines the known physicochemical characteristics of MIS Item 011589A and queries the ISP Database for items that are most similar or potentially similar. Items identified as most similar are believed to have the highest probability of being chemically and structurally identical to MIS Item 011589A. Items identified as potentially like MIS Item 011589A have some attributes in common, have the potential to generate gases, but have a lower probability of having similar gas generating characteristics. MIS Item 011589A is an oxide that was generated prior to 1990 at Rocky Flats in Building 707. It was associated with foundry processing and had an actinide assay of approximately 77%. Prompt gamma analysis of MIS Item 011589A indicated the presence of chloride, fluorine, magnesium, sodium, and aluminum. Queries based on MIS representation classification and process of origin were applied to the ISP Database. Evaluation criteria included binning classification (i.e., innocuous, pressure, or pressure and corrosion), availability of prompt gamma analyses, presence of chlorine and magnesium, percentage of chlorine by weight, peak ratios (i.e., Na:Cl and Mg:Na), moisture, and percent assay. These queries identified 15 items that were most similar and 106 items that were potentially like MIS Item 011589A. Although these queries identified containers that could potentially generate flammable gases, verification and confirmation can only be accomplished by destructive evaluation and testing of containers from the storage inventory.
Maki, Yoichi; Kawasaki, Yayoi; Demiray, Burcu; Janssen, Steve M J
The present study examined whether the three major functions of autobiographical memory observed in Western societies (i.e., directing-behaviour, social-bonding and self-continuity) also exist in an East Asian society. Two self-report measures were used to assess the autobiographical memory functions of Japanese men and women. Japanese young adults (N = 451, ages 17-28 years) first completed the original Thinking About Life Experiences (TALE) Questionnaire. They subsequently received three TALE items that represented memory functions and attempted to recall a specific instance of memory recall for each item. Confirmatory factor analyses on the TALE showed that the three functions were replicated in the current sample. However, Japanese participants reported lower levels of all three functions than American participants in a previous study. We also explored whether there was an effect of gender in this Japanese sample. Women reported higher levels of the self-continuity and social-bonding functions than men. Finally, participants recalled more specific instances of memory recall for the TALE items that had received higher ratings on the TALE, suggesting that the findings on the first measure were supported by the second measure. Results are discussed in relation to the functional approach to autobiographical memory in a cross-cultural context.
Alsadaawi, Abdullah Saleh
The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
Pal, Brojeswar; Sankar Sana, Shib; Chaudhuri, Kripasindhu
This article deals with an economic production quantity (EPQ) model in an imperfect production system. The production system may undergo in 'out-of-control' state from 'in-control' state, after a certain time that follows a probability density function. The density function varies with reliability of the machinery system that may be controlled by new technologies, investing more costs. The defective items produced in 'out-of-control' state are reworked at a cost just after the regular production time. Occurrence of the 'out-of-control' state during or after regular production-run time is analysed and also graphically illustrated separately. Finally, an expected profit function regarding the inventory cost, unit production cost and selling price is maximised analytically. Sensitivity analysis of the model with respect to key parameters of the system is carried out. Two numerical examples are considered to test the model and one of them is illustrated graphically.
Item response theory (IRT) is a framework for modeling and analyzing item response ... data. Though, there is an argument that the evaluation of fit in IRT modeling has been ... National Council on Measurement in Education ... model data fit should be based on three types of ... prediction should be assessed through the.
Jafari, Peyman; Allahyari, Elahe; Salarzadeh, Mina; Bagheri, Zahra
Child obesity has become a major health concern worldwide. In order to provide successful intervention strategies, it is necessary to understand how obese-overweight children and their parents perceive obesity and its consequences on child's health-related quality of life (HRQoL). This study aimed to assess measurement equivalence of the PedsQL™ 4.0 across obese-overweight children and their parents. The items in the PedsQL™ 4.0 were analysed for differential item functioning (DIF) across obese-overweight children and their parents using an iterative hybrid ordinal logistic regression/item response theory approach. The sample included 647 overweight-obese children and their parents, who completed child and parent reports of the PedsQL™ 4.0, respectively. Overall, 17 out of 23 (74%) items were flagged with DIF across two groups: eight items exhibited uniform DIF and nine items non-uniform DIF. In addition, parents of obese children rated the child's HRQoL significantly lower than their children in all domains of the PedsQL™ 4.0, and this finding did not change whether or not items with uniform DIF were included. Although obese-overweight children and their parents interpret items of the PedsQL™ 4.0 in a conceptually different manner, removing or retaining DIF items in the subscales had no significant effects on group differences. Accordingly, it appears that observed differences in HRQoL scores across child and parent reports are a true difference and not a reflection of measurement artefact.
Vishkaei, Behzad Maleki; Niaki, S. T. A.; Farhangi, Milad; Rashti, Mehdi Ebrahimnezhad Moghadam
This paper is an extension of Hsu and Hsu (Int J Ind Eng Comput 3(5):939-948, 2012) aiming to determine the optimal order quantity of product batches that contain defective items with percentage nonconforming following a known probability density function. The orders are subject to 100 % screening process at a rate higher than the demand rate. Shortage is backordered, and defective items in each ordering cycle are stored in a warehouse to be returned to the supplier when a new order is received. Although the retailer does not sell defective items at a lower price and only trades perfect items (to avoid loss), a higher holding cost incurs to store defective items. Using the renewal-reward theorem, the optimal order and shortage quantities are determined. Some numerical examples are solved at the end to clarify the applicability of the proposed model and to compare the new policy to an existing one. The results show that the new policy provides better expected profit per time.
Amber D. Dumford
Full Text Available As society’s needs for quantitative skills become more prevalent, college graduates require quantitative skills regardless of their career choices. Therefore, it is important that institutions assess students’ engagement in quantitative activities during college. This study chronicles the process taken by the National Survey of Student Engagement (NSSE to develop items that measure students’ participation in quantitative reasoning (QR activities. On the whole, findings across the quantitative and qualitative analyses suggest good overall properties for the developed QR items. The items show great promise to explore and evaluate the frequency with which college students participate in QR-related activities. Each year, hundreds of institutions across the United States and Canada participate in NSSE, and, with the addition of these new items on the core survey, every participating institution will have information on this topic. Our hope is that these items will spur conversations on campuses about students’ use of quantitative reasoning activities.
Gu, Chenyang; Gutman, Roee
The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.
Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.
A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Full Text Available Abstract Background The purpose of the study was to determine whether the Persian version of the KIDSCREEN-27 has the optimal number of response category to measure health-related quality of life (HRQoL in children and adolescents. Moreover, we aimed to determine if all the items contributed adequately to their own domain. Findings The Persian version of the KIDSCREEN-27 was completed by 1083 school children and 1070 of their parents. The Rasch partial credit model (PCM was used to investigate item statistics and ordering of response categories. The PCM showed that no item was misfitting. The PCM also revealed that, successive response categories for all items were located in the expected order except for category 1 in self- and proxy-reports. Conclusions Although Rasch analysis confirms that all the items belong to their own underlying construct, response categories should be reorganized and evaluated in further studies, especially in children with chronic conditions.
This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
Yoshioka, Naoki; Kishida, Masako; Yamada, Yumi
In this work, analyses on sodium-water reaction temperature in the new SWAT-1(SWAT-1R) test were completed by the CHAMPAGNE code in order to understand void and velocity distribution in sodium system, which was difficult to be measured in experiments. The application method of the RELAP5/Mod2 code was investigated to LMFBR steam generator (SG) blow down analysis, too. The following results were obtained. (1) Analyses on sodium-water reaction temperature in the SWAT-1R test. 1) Analyses were carried out for the SWAT-1R test under the condition water leak rate 600 g/s by treating the pressure loss coefficient, the interface friction coefficient and the coefficient related to reaction rate as parameters. The effect and mechanism of each parameter on the shape of reaction zone were well understood by these analyses. 2) The void and velocity distribution in sodium system were estimated by use of the most suitable parameters. These analytical results are expected to be useful for planning of the SWAT-1R test and evaluation of test result. (2) Investigation of the RELAP5/Mod2 code. 1) The items to be improved in the RELAP5/Mod2 code were clarified to apply this code to the FBR SG blow down analysis. 2) One of these items was an addition of the shell-side (sodium-side) model. A sodium-side model was designed and added to the RELAP5/Mod2 code. Test calculations were carried out by this improved code and the basic function of this code was confirmed. (author)
Ku, Ja Hyeon; Park, Dal Woo; Kim, Soo Woong; Paick, Jae-Seung
To assess whether the translated Korean version of the International Index of Erectile Function (IIEF-5) developed by Rosen et al. (RIIEF-5) may be adapted for a Korean population to have cross-cultural equivalency to the original version. A total of 151 patients with erectile dysfunction (ED) and 156 controls were prospectively studied. All the patients and controls had had sexual activity or attempted sexual intercourse within the 4-week period before completing the questionnaire. The Classification and Regression Trees program was used to select an optimal set of five items from the IIEF-15 (KIIEF-5) to discriminate between men with and without ED. Then, the optimal cutoff score for the diagnosis of ED was determined using the receiver operating characteristic curve. The optimal cutoff score, sensitivity, and specificity were also calculated using the RIIEF-5. The KIIEF-5 consisted, in order of importance, of items 15, 5, 13, 4, and 2 from the IIEF-15. Item 7 in the original RIIEF-5 was replaced with item 13 in the new KIIEF-5. The optimal cutoff score proved to be 21, with a corresponding sensitivity and specificity of 0.97 and 0.91, respectively. For the original RIIEF-5, the optimal cutoff score was 21 and the corresponding sensitivity and specificity was 0.94 and 0.90, respectively. Although the RIIEF-5 may be adapted for a Korean population, the KIIEF-5 can aid in decreasing the incidence of an incorrect diagnosis of ED and decreasing the number of undiagnosed cases of ED in this population. In addition, our findings suggest that the equivalence of psychometric properties does not imply cross-cultural equivalence.
Jensen, Pernille T; Klee, Marianne C; Thranov, Ingrid
The Sexual function-Vaginal changes Questionnaire (SVQ), was developed to investigate sexual and vaginal problems in gynaecological cancer patients. The instrument consists of 20 core items, measuring sexual interest, lubrication, orgasm, dyspareunia, vaginal dimensions, intimacy, sexual problems...... of partner, sexual activity, sexual satisfaction, and body image. Seven additional items assessing current levels of sexual and vaginal problems compared to pre-diagnosis are intended to be used only once in longitudinal studies. The SVQ was validated in two ways: first, the comprehensibility of each item...... was investigated through combined quantitative and qualitative assessment of patient-observer agreement in 75 gynaecological cancer patients, second, multitrait analyses and principal component analyses were applied to responses from 257 patients with cervical cancer to investigate the scale properties. The level...
Diana, Rachel A.; Yonelinas, Andrew P.; Ranganath, Charan
The medial temporal lobes (MTL) play an essential role in episodic memory, and accumulating evidence indicates that two MTL subregions--the perirhinal (PRc) and parahippocampal (PHc) cortices--might have different functions. According to the binding of item and context theory (  and ), PRc is involved in processing item information, the…
Tsang, Siny; Schmidt, Karen M.; Vincent, Gina M.; Salekin, Randall T.; Moretti, Marlene M.; Odgers, Candice L.
This study used an item response theory (IRT) model and a large adolescent sample of justice involved youth (N = 1,007, 38% female) to examine the item functioning of the Psychopathy Checklist – Youth Version (PCL: YV). Items that were most discriminating (or most sensitive to changes) of the latent trait (thought to be psychopathy) among adolescents included “Glibness/superficial charm”, “Lack of remorse”, and “Need for stimulation”, whereas items that were least discriminating included “Pathological lying”, “Failure to accept responsibility”, and “Lacks goals.” The items “Impulsivity” and “Irresponsibility” were the most likely to be rated high among adolescents, whereas “Parasitic lifestyle”, and “Glibness/superficial charm” were the most likely to be rated low. Evidence of differential item functioning (DIF) on four of the 13 items was found between boys and girls. “Failure to accept responsibility” and “Impulsivity” were endorsed more frequently to describe adolescent girls than boys at similar levels of the latent trait, and vice versa for “Grandiose sense of self-worth” and “Lacks goals.” The DIF findings suggest that four PCL: YV items function differently between boys and girls. PMID:25580672
Full Text Available Objectives: The current study aimed to carry out a post-validation item analysis of multiple choice questions (MCQs in medical examinations in order to evaluate correlations between item difficulty, item discrimination and distraction effectiveness so as to determine whether questions should be included, modified or discarded. In addition, the optimal number of options per MCQ was analysed. Methods: This cross-sectional study was performed in the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain. A total of 800 MCQs and 4,000 distractors were analysed between November 2013 and June 2016. Results: The mean difficulty index ranged from 36.70–73.14%. The mean discrimination index ranged from 0.20–0.34. The mean distractor efficiency ranged from 66.50–90.00%. Of the items, 48.4%, 35.3%, 11.4%, 3.9% and 1.1% had zero, one, two, three and four nonfunctional distractors (NFDs, respectively. Using three or four rather than five options in each MCQ resulted in 95% or 83.6% of items having zero NFDs, respectively. The distractor efficiency was 91.87%, 85.83% and 64.13% for difficult, acceptable and easy items, respectively (P <0.005. Distractor efficiency was 83.33%, 83.24% and 77.56% for items with excellent, acceptable and poor discrimination, respectively (P <0.005. The average Kuder-Richardson formula 20 reliability coefficient was 0.76. Conclusion: A considerable number of the MCQ items were within acceptable ranges. However, some items needed to be discarded or revised. Using three or four rather than five options in MCQs is recommended to reduce the number of NFDs and improve the overall quality of the examination.
Full Text Available A distinction is made between visual declaration and virtual usage of artificial items within a physical environment, such as a street. Visual declaration is a formal pictorial designation, or a function, e.g. “decoration,” of an item, such as a “planter.” Virtual usage refers to the item when it is used in lieu of another item. The formal designation, “sitting,” customarily designated to an item such as “bench,” could also be a virtual usage of the item “planter.” The question asked is, “What is the relationship between items, given their formal, visual declaration and their informal, virtual, usage?” An artificial item, according to its visual declaration, is referred to as a ‘visual’ or ‘real item’. Each visual item has the property of being used as another item by virtue of its undeclared usage. Pending on the item's design and configuration, a visual item can be then substituted for another visual item. An artificial item, thus, attains deliberate ambiguity between its formal designation and its virtual usage. This ambiguity between visual declaration and virtual usage can be quantified. Within the full domain of n possible usages, this relationship can be conveniently presented in a nonnegative matrix. It is shown that the inverse of this matrix belongs to a class of well-known matrices. This being the case, the relationship between visual and virtual properties of items within the environment can be formalized. The formalization throws further light on the emerging opportunities in streetscape design.
Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.
Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Bluck, Susan; Alea, Nicole
Theory suggests that autobiographical remembering serves several functions. This research builds on previous empirical efforts (Bluck, Alea, Habermas, & Rubin, 2005) with the aim of constructing a brief, valid measure of three functions of autobiographical memory. Participants (N=306) completed 28 theoretically derived items concerning the frequency with which they use autobiographical memory to serve a variety of functions. To examine convergent and discriminant validity, participants rated their tendency to think about and talk about the past, and measures of future time orientation, self-concept clarity, and trait personality. Confirmatory factor analysis of the function items resulted in a respecified model with 15 items in three factors. The newly developed Thinking about Life Experiences scale (TALE) shows good internal consistency as well as convergent validity for three subscales: Self-Continuity, Social-Bonding, and Directing-Behaviour. Analyses demonstrate factorial equivalence across age and gender groups. Potential use and limitations of the TALE are discussed.
Preciado, Izaskun; Cartes, Joan E.; Punzón, Antonio; Frutos, Inmaculada; López-López, Lucía; Serrano, Alberto
Trophic interactions in the deep-sea fish community of the Galicia Bank seamount (NE Atlantic) were inferred by using stomach contents analyses (SCA) and stable isotope analyses (SIA) of 27 fish species and their main prey items. Samples were collected during three surveys performed in 2009, 2010 and 2011 between 625 and 1800 m depth. Three main trophic guilds were determined using SCA data: pelagic, benthopelagic and benthic feeders, respectively. Vertically migrating macrozooplankton and meso-bathypelagic shrimps were identified to play a key role as pelagic prey for the deep sea fish community of the Galicia Bank. Habitat overlap was hardly detected; as a matter of fact, when species coexisted most of them evidenced a low dietary overlap, indicating a high degree of resource partitioning. A high potential competition, however, was observed among benthopelagic feeders, i.e.: Etmopterus spinax, Hoplostethus mediterraneus and Epigonus telescopus. A significant correlation was found between δ15N and δ13C for all the analysed species. When calculating Trophic Levels (TLs) for the main fish species, using both the SCA and SIA approaches, some discrepancies arose: TLs calculated from SIA were significantly higher than those obtained from SCA, probably indicating a higher consumption of benthic-suprabenthic prey in the previous months. During the summer, food web functioning in the Galicia Bank was more influenced by the assemblages dwelling in the water column than by deep-sea benthos, which was rather scarce in the summer samples. These discrepancies demonstrate the importance of using both approaches, SCA (snapshot of diet) and SIA (assimilated food in previous months), when attempting trophic studies, if an overview of food web dynamics in different compartments of the ecosystem is to be obtained.
A transfer function analyser software package has been produced which is believed to constitute a significant advance over others reported in the literature. The main advantages of the system are its operating speed, especially at low frequencies, which is due to its use of part-cycle integration and its high degree of interactive operator control. The driving sine wave, the return signals and the computed vector diagrams are displayed on TV type visual display units. Data output is by means of an incremental graph plotter or an IBM typewriter. (author)
Gierl, Mark J.; Lai, Hollis
Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
The graded response model (GRM), which is based on item response theory (IRT), was used to evaluate the psychometric properties of the inattention and hyperactivity/impulsivity symptoms in an ADHD rating scale. To accomplish this, parents and teachers completed the DSM-IV ADHD Rating Scale (DARS; Gomez et al., "Journal of Child Psychology and…
Eutalia Aparecida Candido de Araujo
Full Text Available A preocupação com medidas de traços psicológicos é antiga, sendo que muitos estudos e propostas de métodos foram desenvolvidos no sentido de alcançar este objetivo. Entre os trabalhos propostos, destaca-se a Teoria da Resposta ao Item (TRI que, a princípio, veio completar limitações da Teoria Clássica de Medidas, empregada em larga escala até hoje na medida de traços psicológicos. O ponto principal da TRI é que ela leva em consideração o item particularmente, sem relevar os escores totais; portanto, as conclusões não dependem apenas do teste ou questionário, mas de cada item que o compõe. Este artigo propõe-se a apresentar esta Teoria que revolucionou a teoria de medidas.La preocupación con las medidas de los rasgos psicológicos es antigua y muchos estudios y propuestas de métodos fueron desarrollados para lograr este objetivo. Entre estas propuestas de trabajo se incluye la Teoría de la Respuesta al Ítem (TRI que, en principio, vino a completar las limitaciones de la Teoría Clásica de los Tests, ampliamente utilizada hasta hoy en la medida de los rasgos psicológicos. El punto principal de la TRI es que se tiene en cuenta el punto concreto, sin relevar las puntuaciones totales; por lo tanto, los resultados no sólo dependen de la prueba o cuestionario, sino que de cada ítem que lo compone. En este artículo se propone presentar la Teoría que revolucionó la teoría de medidas.The concern with measures of psychological traits is old and many studies and proposals of methods were developed to achieve this goal. Among these proposed methods highlights the Item Response Theory (IRT that, in principle, came to complete limitations of the Classical Test Theory, which is widely used until nowadays in the measurement of psychological traits. The main point of IRT is that it takes into account the item in particular, not relieving the total scores; therefore, the findings do not only depend on the test or questionnaire
Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J
Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.
Arendasy, Martin E.; Sommer, Markus
There is a heated debate on whether observed gender differences in some figural matrices in adults can be attributed to gender differences in inductive reasoning/G[subscript f] or differential item functioning and/or test bias. Based on previous studies we hypothesized that three specific item design features moderate the effect size of the gender…
Katz Jeffrey N
Full Text Available Abstract Background While musculoskeletal problems are leading sources of disability, there has been little research on measuring the number of functionally limiting musculoskeletal problems for use as predictor of outcome in studies of chronic disease. This paper reports on the development and preliminary validation of a self administered musculoskeletal functional limitations index. Methods We developed a summary musculoskeletal functional limitations index based upon a six-item self administered questionnaire in which subjects indicate whether they are limited a lot, a little or not at all because of problems in six anatomic regions (knees, hips, ankles and feet, back, neck, upper extremities. Responses are summed into an index score. The index was completed by a sample of total knee replacement recipients from four US states. Our analyses examined convergent validity at the item and at the index level as well as discriminant validity and the independence of the index from other correlates of quality of life. Results 782 subjects completed all items of the musculoskeletal functional limitations index and were included in the analyses. The mean age of the sample was 75 years and 64% were female. The index demonstrated anticipated associations with self-reported quality of life, activities of daily living, WOMAC functional status score, use of walking support, frequency of usual exercise, frequency of falls and dependence upon another person for assistance with chores. The index was strongly and independently associated with self-reported overall health. Conclusion The self-reported musculoskeletal functional limitations index appears to be a valid measure of musculoskeletal functional limitations, in the aspects of validity assessed in this study. It is useful for outcome studies following TKR and shows promise as a covariate in studies of chronic disease outcomes.
Lindwall, Magnus; Barkoukis, Vassilis; Grano, Caterina; Lucidi, Fabio; Raudsepp, Lennart; Liukkonen, Jarmo; Thøgersen-Ntoumani, Cecilie
Using confirmatory factor analyses, we examined method effects on Rosenberg's Self-Esteem Scale (RSES; Rosenberg, 1965) in a sample of older European adults. Nine hundred forty nine community-dwelling adults 60 years of age or older from 5 European countries completed the RSES as well as measures of depression and life satisfaction. The 2 models that had an acceptable fit with the data included method effects. The method effects were associated with both positively and negatively worded items. Method effects models were invariant across gender and age, but not across countries. Both depression and life satisfaction predicted method effects. Individuals with higher depression scores and lower life satisfaction scores were more likely to endorse negatively phrased items.
Benítez, Isabel; Padilla, José-Luis
Differential item functioning (DIF) can undermine the validity of cross-lingual comparisons. While a lot of efficient statistics for detecting DIF are available, few general findings have been found to explain DIF results. The objective of the article was to study DIF sources by using a mixed method design. The design involves a quantitative phase…
Lange, Elke B; Verhaeghen, Paul; Cerella, John
Memory sets of N = 1~5 digits were exposed sequentially from left-to-right across the screen, followed by N recognition probes. Probes had to be compared to memory list items on identity only (Sternberg task) or conditional on list position. Positions were probed randomly or in left-to-right order. Search functions related probe response times to set size. Random probing led to ramped, "Sternbergian" functions whose intercepts were elevated by the location requirement. Sequential probing led to flat search functions-fast responses unaffected by set size. These results suggested that items in STM could be accessed either by a slow search-on-identity followed by recovery of an associated location tag, or in a single step by following item-to-item links in study order. It is argued that this dual coding of location information occurs spontaneously at study, and that either code can be utilised at retrieval depending on test demands.
Full Text Available Abstract Background The Brief Pain Inventory (BPI is recommended as a pain measurement tool by the Expert Working Group of the European Association of Palliative Care. The BPI is designed to assess both pain severity and interference with functions caused by pain. The purpose of this study was to investigate if pain interference items are influenced by other factors than pain. Methods We asked adult cancer patients to complete the original and a revised BPI on two study days. In the original version of the BPI the patients were asked how, during the last 24 hours, pain has interfered with functions. In the revised BPI this question was changed to how, during the last 24 hours, these functions are affected in general. Heath related quality of life was assessed at both study days applying the European Organization for Research and Treatment of Cancer quality of life questionnaire. Results Forty-eight of the 55 included patients completed both assessments. The BPI pain intensities scores and the health related quality of life scores were similar at the two study days. Except for mood this study observed no significant distinctions between the patients' BPI interference items scores in the original (pain influence on function and the revised BPI (function in general. Seventeen patients reported higher influence from pain on functions than the total influence on function from all causes. Conclusion We observed similar scores in the original BPI interference scores (pain influence on function compared with the revised BPI interference scores (decreased function in general. This finding might imply that the BPI interference scale measures are partly responded to as more of a global interference measure.
Peters, J.C.; Roelfsema, P.R.; Goebel, R.
During visual search, the working memory (WM) representation of the search target guides attention to matching items in the visual scene. However, we can hold multiple items in WM. Do all these items guide attention at the same time? Using a new functional magnetic resonance imaging visual search paradigm, we found that items in WM can attain two different states that influence activity in extrastriate visual cortex in opposite directions: whereas the target item in WM enhanced processing of ...
Chenault, Michelene; Berger, Martijn; Kremer, Bernd; Anteunis, Lucien
To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired. PMID:28028428
Sevigny, Jeffrey J; Peng, Yahong; Liu, Lian; Lines, Christopher R
We explored the association of Alzheimer's disease (AD) Assessment Scale (ADAS-Cog) item scores with AD severity using cross-sectional and longitudinal data from the same study. Post hoc analyses were performed using placebo data from a 12-month trial of patients with mild-to-moderate AD (N =281 randomized, N =209 completed). Baseline distributions of ADAS-Cog item scores by Mini-Mental State Examination (MMSE) score and Clinical Dementia Rating (CDR) sum of boxes score (measures of dementia severity) were estimated using local and nonparametric regressions. Mixed-effect models were used to characterize ADAS-Cog item score changes over time by dementia severity (MMSE: mild =21-26, moderate =14-20; global CDR: mild =0.5-1, moderate =2). In the cross-sectional analysis of baseline ADAS-Cog item scores, orientation was the most sensitive item to differentiate patients across levels of cognitive impairment. Several items showed a ceiling effect, particularly in milder AD. In the longitudinal analysis of change scores over 12 months, orientation was the only item with noticeable decline (8%-10%) in mild AD. Most items showed modest declines (5%-20%) in moderate AD.
Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D
To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
Aybek, Eren Can; Demirtasli, R. Nukhet
This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
Aaro, Leif E.; Breivik, Kyrre; Klepp, Knut-Inge; Kaaya, Sylvia; Onya, Hans E.; Wubs, Annegreet; Helleve, Arnfinn; Flisher, Alan J.
A 14-item human immunodeficiency virus/acquired immunodeficiency syndrome knowledge scale was used among school students in 80 schools in 3 sites in Sub-Saharan Africa (Cape Town and Mankweng, South Africa, and Dar es Salaam, Tanzania). For each item, an incorrect or don't know response was coded as 0 and correct response as 1. Exploratory factor…
Method: A cross-sectional tuck shop survey. Nutritional analyses were conducted using the ... Results: Savoury pies were the most popular lunch item for all learners for both breaks (n = 5, 45%, and n = 3, 27.3%), selling the most number of units (43) per day at eight schools (72.7%). Iced popsicles were sold at almost every ...
Tarrant, Marie; Ware, James; Mohammed, Ahmed M
Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
Mellenbergh, Gideon J.; van der Linden, Wim J.
Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Cher Wong, Cheow
Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Cheung, Felix; Lucas, Richard E
The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS)-a more psychometrically established measure. Two large samples from Washington (N = 13,064) and Oregon (N = 2,277) recruited by the Behavioral Risk Factor Surveillance System and a representative German sample (N = 1,312) recruited by the Germany Socio-Economic Panel were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62-0.64; disattenuated r = 0.78-0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001-0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS was very small (average absolute difference = 0.015-0.042). Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use.
Coffman, Donna L.; BeLue, Rhonda
The sense of community index (SCI) has been widely used to measure psychological sense of community (SOC). Furthermore, SOC has been found to differ among racial groups. Because different ethnic groups have different cultural and historical experiences that may lead to different interpretations of measurement items, it is important to know whether…
Lin, Chung-Ying; Ku, Li-Jung Elizabeth; Pakpour, Amir H
The Zarit Burden Interview (ZBI) is a commonly used self-report to assess caregiver burden. A 12-item short form of the ZBI has been developed; however, its measurement invariance has not been examined across some different demographics. It is unclear whether different genders and educational levels of a population interpret the ZBI items similarly. Therefore, this study aimed to examine the measurement invariance of the 12-item ZBI across gender and educational levels in a Taiwanese sample. Caregivers who had a family member with dementia (n = 270) completed the ZBI through telephone interviews. Three confirmatory factor analysis (CFA) models were conducted: Model 1 was the configural model, Model 2 constrained all factor loadings, Model 3 constrained all factor loadings and item intercepts. Multiple group CFAs and the differential item functioning (DIF) contrast under Rasch analyses were used to detect measurement invariance across males (n = 100) and females (n = 170) and across educational levels of junior high schools and below (n = 86) and senior high schools and above (n = 183). The fit index differences between models supported the measurement invariance across gender and across educational levels (∆ comparative fit index (CFI) = -0.010 and 0.003; ∆ root mean square error of approximation (RMSEA) = -0.006 to 0.004). No substantial DIF contrast was found across gender and educational levels (value = -0.36 to 0.29). The ZBI is appropriate for combined use and for comparisons in caregivers across gender and different educational levels in Taiwan.
Yang, Ji Seung; Zheng, Xiaying
The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
MacCann, Robert G.; Stanley, Gordon
An item banking method that does not use Item Response Theory (IRT) is described. This method provides a comparable grading system across schools that would be suitable for low-stakes testing. It uses the Angoff standard-setting method to obtain item ratings that are stored with each item. An example of such a grading system is given, showing how…
The principle of integrality, moderation and equilibrium should be considered in the safety classification of items in nuclear power plant. The basic ways for safety classification of items is to classify the safety function based on the effect of the outside enclosure damage of the items (parts) on the safety. Tianwan Nuclear Power Plant adopts Russian VVER-1000/428 type reactor, it safety classification mainly refers to Russian Guidelines and standards. The safety classification of the electric equipment refers to IEEE-308(80) standard, including 1E and Non 1E classification. The safety classification of the instrumentation and control equipment refers to GB/T 15474-1995 standard, including safety 1E, safety-related SR and NC non-safety classification. The safety classification of Tianwan Nuclear Power Plant has to be approved by NNSA and satisfy Chinese Nuclear Safety Guidelines. (authors)
Mohammed Ahmed M
Full Text Available Abstract Background Four- or five-option multiple choice questions (MCQs are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Methods Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses. Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. Results The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88 functioning distractors. Only 52.2% (n = 805 of all distractors were functioning effectively and 10.2% (n = 158 had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. Conclusion The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
van Kooten, Jojanneke A M C; van Litsenburg, Raphaёle R L; Yoder, Whitney R; Kaspers, Gertjan J L; Terwee, Caroline B
Sleep problems are common in adolescents and have a negative impact on daytime functioning. However, there is a lack of well-validated adolescent sleep questionnaires. The Patient-Reported Outcomes Measurement Information System (PROMIS) Sleep Disturbance and Sleep-Related Impairment item banks are well-validated instruments developed for and tested in adults. The aim of this study was to evaluate their structural validity in adolescents. Test and retest data were collected for the Dutch-Flemish V1.0 PROMIS Sleep Disturbance (27) and Sleep-Related Impairment (16 items) item banks from 1046 adolescents (11-19 years). Cross-validation methods, Confirmatory (CFA), and Exploratory Factor Analyses (EFA) were used. Fit indices and factor loadings were used to improve the models. The final models were assessed for model fit using retest data. The one-factor Sleep Disturbance (CFI = 0.795, TLI = 0.778, RMSEA = 0.117) and Sleep-Related Impairment (CFI = 0.897, TLI = 0.882, RMSEA = 0.156) models could not be replicated in adolescents. Cross-validation resulted in a final Sleep Disturbance model of 23 and a Sleep-Related Impairment model of 11 items. Retest data CFA showed adequate fit for the Sleep-Related Impairment-11 (CFI = 0.981, TLI = 0.976, RMSEA = 0.116). The Sleep Disturbance-23 model fit indices stayed below the recommended values (CFI = 0.895, TLI = 0.885, RMSEA = 0.105). While the PROMIS Sleep Disturbance-23 for adolescents and PROMIS Sleep-Related Impairment-11 for adolescents provide a framework to assess adolescent sleep, additional research is needed to replicate these findings in a larger and more diverse sample.
Full Text Available Previous studies have reported conflicting findings on whether item repetition has beneficial or detrimental effects on source memory. To reconcile such contradictions, we investigated whether the degree of pre-exposure of items can be a potential modulating factor. The experimental procedures spanned two consecutive days. On Day 1, participants were exposed to a set of unfamiliar faces. On Day 2, the same faces presented on the previous day were used again in half of the participants, whereas novel faces were used for the other half. Day 2 procedures consisted of three successive phases: item repetition, source association, and source memory test. In the item repetition phase, half of the face stimuli were repeatedly presented while participants were making male/female judgments. During the source association phase, both the repeated and the unrepeated faces appeared in one of the four locations on the screen. Finally, participants were tested on the location in which a given face was presented during the previous phase and reported the confidence of their memory. Source memory accuracy was measured as the percentage of correct non-guess trials. As results, we found a significant interaction between prior exposure and repetition. Repetition impaired source memory when the items had been pre-exposed on Day 1, while it led to greater accuracy in novel ones. These results show that pre-experimental exposure can modulate the effects of repetition on associative binding between an item and its contextual information, suggesting that pre-existing representation and novelty signal interact to form new episodic memory.
Guàrdia-Olmos, J; Carbó-Carreté, M; Peró-Cebollero, M; Giné, C
The study of measurements of quality of life (QoL) is one of the great challenges of modern psychology and psychometric approaches. This issue has greater importance when examining QoL in populations that were historically treated on the basis of their deficiency, and recently, the focus has shifted to what each person values and desires in their life, as in cases of people with intellectual disability (ID). Many studies of QoL scales applied in this area have attempted to improve the validity and reliability of their components by incorporating various sources of information to achieve consistency in the data obtained. The adaptation of the Personal Outcomes Scale (POS) in Spanish has shown excellent psychometric attributes, and its administration has three sources of information: self-assessment, practitioner and family. The study of possible congruence or incongruence of observed distributions of each item between sources is therefore essential to ensure a correct interpretation of the measure. The aim of this paper was to analyse the observed distribution of items and dimensions from the three Spanish POS information sources cited earlier, using the item response theory. We studied a sample of 529 people with ID and their respective practitioners and family member, and in each case, we analysed items and factors using Samejima's model of polytomic ordinal scales. The results indicated an important number of items with differential effects regarding sources, and in some cases, they indicated significant differences in the distribution of items, factors and sources of information. As a result of this analysis, we must affirm that the administration of the POS, considering three sources of information, was adequate overall, but a correct interpretation of the results requires that it obtain much more information to consider, as well as some specific items in specific dimensions. The overall ratings, if these comments are considered, could result in bias. © 2017
Yoon Soo ePark
Full Text Available This study investigates the impact of item parameter drift (IPD on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effect on item parameters and examinee ability.
Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan
This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.
Doyle, F; Watson, R; Morgan, K; McBride, O
Invariant item ordering (IIO) is defined as the extent to which items have the same ordering (in terms of item difficulty/severity - i.e. demonstrating whether items are difficult [rare] or less difficult [common]) for each respondent who completes a scale. IIO is therefore crucial for establishing a scale hierarchy that is replicable across samples, but no research has demonstrated IIO in scales of psychological distress. We aimed to determine if a hierarchy of distress with IIO exists in a large general population sample who completed a scale measuring distress. Data from 4107 participants who completed the 12-item General Health Questionnaire (GHQ-12) from the Northern Ireland Health and Social Wellbeing Survey 2005-6 were analysed. Mokken scaling was used to determine the dimensionality and hierarchy of the GHQ-12, and items were investigated for IIO. All items of the GHQ-12 formed a single, strong unidimensional scale (H=0.58). IIO was found for six of the 12 items (H-trans=0.55), and these symptoms reflected the following hierarchy: anhedonia, concentration, participation, coping, decision-making and worthlessness. The cross-sectional analysis needs replication. The GHQ-12 showed a hierarchy of distress, but IIO is only demonstrated for six of the items, and the scale could therefore be shortened. Adopting brief, hierarchical scales with IIO may be beneficial in both clinical and research contexts. Copyright © 2011 Elsevier B.V. All rights reserved.
Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.
Maicon Rodrigues Albuquerque
Full Text Available Abstract Introduction: Factor analysis of the Edinburgh Postnatal Depression Scale (EPDS could result in a shorter and easier to handle screening tool. Therefore, the aim of this study was to check and compare the metrics of two different 6-item EPDS subscales. Methods: We administered the EPDS to a total of 3,891 women who had given birth between 1 and 3 months previously. We conducted confirmatory and exploratory factor analyses and plotted receiver-operating characteristics (ROC curves to, respectively, determine construct validity, scale items' fit to the data, and ideal cutoff scores for the short versions. Results: A previously defined 6-item scale did not exhibit construct validity for our sample. Nevertheless, we used exploratory factor analysis to derive a new 6-item scale with very good construct validity. The area under the ROC curve of the new 6-item scale was 0.986 and the ideal cutoff score was ≥ 6. Conclusions: The new 6-item scale has adequate psychometric properties and similar ROC curve values to the10-item version and offers a means of reducing the cost and time taken to administer the instrument.
Soo, Jackie; Harris, Jennifer L; Davison, Kirsten K; Williams, David R; Roberto, Christina A
To examine the nutritional quality of menu items promoted in four (US) fast-food restaurant chains (McDonald's, Burger King, Wendy's, Taco Bell) in 2010 and 2013. Menu items pictured on signs and menu boards were recorded at 400 fast-food restaurants across the USA. The Nutrient Profile Index (NPI) was used to calculate overall nutrition scores for items (higher scores indicate greater nutritional quality) and was dichotomized to denote healthier v. less healthy items. Changes over time in NPI scores and energy of promoted foods and beverages were analysed using linear regression. Four hundred fast-food restaurants (McDonald's, Burger King, Wendy's, Taco Bell; 100 locations per chain). NPI of fast-food items marketed at fast-food restaurants. Promoted foods and beverages on general menu boards and signs remained below the 'healthier' cut-off at both time points. On general menu boards, pictured items became modestly healthier from 2010 to 2013, increasing (mean (se)) by 3·08 (0·16) NPI score points (Prestaurants showed limited improvements in nutritional quality in 2013 v. 2010.
Full Text Available Background: The Dark Triad (i.e., Machiavellianism, narcissism, and psychopathy can be captured quickly with 12 items using the Dark Triad Dirty Dozen (Jonason and Webster, 2010. Previous Item Response Theory (IRT analyses of the original English Dark Triad Dirty Dozen have shown that all three subscales adequately tap into the dark domains of personality. The aim of the present study was to analyze the Swedish version of the Dark Triad Dirty Dozen using IRT. Method: 570 individuals (nmales = 326, nfemales = 242, and 2 unreported, including university students and white-collar workers with an age range between 19 and 65 years, responded to the Swedish version of the Dark Triad Dirty Dozen (Garcia et al., 2017a,b. Results: Contrary to previous research, we found that the narcissism scale provided most information, followed by psychopathy, and finally Machiavellianism. Moreover, the psychopathy scale required a higher level of the latent trait for endorsement of its items than the narcissism and Machiavellianism scales. Overall, all items provided reasonable amounts of information and are thus effective for discriminating between individuals. The mean item discriminations (alphas were 1.92 for Machiavellianism, 2.31 for narcissism, and 1.99 for psychopathy. Conclusion: This is the first study to provide IRT analyses of the Swedish version of the Dark Triad Dirty Dozen. Our findings add to a growing literature on the Dark Triad Dirty Dozen scale in different cultures and highlight psychometric characteristics, which can be used for comparative studies. Items tapping into psychopathy showed higher thresholds for endorsement than the other two scales. Importantly, the narcissism scale seems to provide more information about a lack of narcissism, perhaps mirroring cultural conditions. Keywords: Psychology, Psychiatry, Clinical psychology
Cheung, Felix; Lucas, Richard E.
Purpose The present paper assessed the validity of single-item life satisfaction measures by comparing single-item measures to the Satisfaction with Life Scale (SWLS) - a more psychometrically established measure. Methods Two large samples from Washington (N=13,064) and Oregon (N=2,277) recruited by the Behavioral Risk Factor Surveillance System (BRFSS) and a representative German sample (N=1,312) recruited by the Germany Socio-Economic Panel (GSOEP) were included in the present analyses. Single-item life satisfaction measures and the SWLS were correlated with theoretically relevant variables, such as demographics, subjective health, domain satisfaction, and affect. The correlations between the two life satisfaction measures and these variables were examined to assess the construct validity of single-item life satisfaction measures. Results Consistent across three samples, single-item life satisfaction measures demonstrated substantial degree of criterion validity with the SWLS (zero-order r = 0.62 – 0.64; disattenuated r = 0.78 – 0.80). Patterns of statistical significance for correlations with theoretically relevant variables were the same across single-item measures and the SWLS. Single-item measures did not produce systematically different correlations compared to the SWLS (average difference = 0.001 – 0.005). The average absolute difference in the magnitudes of the correlations produced by single-item measures and the SWLS were very small (average absolute difference = 0.015 −0.042). Conclusions Single-item life satisfaction measures performed very similarly compared to the multiple-item SWLS. Social scientists would get virtually identical answer to substantive questions regardless of which measure they use. PMID:24890827
Teresi, Jeanne A; Ocepek-Welikson, Katja; Lichtenberg, Peter A
The focus of these analyses was to examine the psychometric properties of the Lichtenberg Financial Decision Screening Scale (LFDSS). The purpose of the screen was to evaluate the decisional abilities and vulnerability to exploitation of older adults. Adults aged 60 and over were interviewed by social, legal, financial, or health services professionals who underwent in-person training on the administration and scoring of the scale. Professionals provided a rating of the decision-making abilities of the older adult. The analytic sample included 213 individuals with an average age of 76.9 (SD = 10.1). The majority (57%) were female. Data were analyzed using item response theory (IRT) methodology. The results supported the unidimensionality of the item set. Several IRT models were tested. Ten ordinal and binary items evidenced a slightly higher reliability estimate (0.85) than other versions and better coverage in terms of the range of reliable measurement across the continuum of financial incapacity.
Sanchez, Edgar Isaac
To enhance test security of high stakes tests, it is vital to understand the way various exposure control strategies function under various IRT models. To that end the present dissertation focused on the performance of several exposure control strategies under the generalized partial credit model with an item pool of 100 and 200 items. These…
Full Text Available To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired.
Fayers, Tessa; Dolman, Peter J
To develop and test a user-friendly questionnaire for rapidly assessing quality of life (QOL) in thyroid eye disease (TED). A three-item questionnaire, the TED-QOL, was designed and compared to the 16-item Graves Ophthalmopathy (GO)-QOL and the nine-item GO-Quality of Life Scale (QLS). 100 patients with TED were administered all three questionnaires on two occasions. Results were compared to clinical severity scores (Vision, Inflammation, Strabismus, Appearance (VISA) classification). Main outcomes were construct and criterion validity, test-retest reliability, duration, comprehension and completion rates. TED-QOL correlated strongly with the other questionnaires for corresponding items (Pearson correlation: appearance 0.71, 0.62; functioning 0.69, 0.66; overall QOL 0.53). Test-retest analysis demonstrated good reliability for all three questionnaires (intraclass correlations: TED-QOL 0.81, 0.74, 0.87; GO-QOL 0.81, 0.82; GO-QLS 0.74, 0.86, 0.67). TED-QOL was significantly faster to complete (1.6 min vs GO-QOL 3.1 min, GO-QLS 2.7 min, p<0.0001) and had a higher completion rate (100% vs GO-QOL 78%, GO-QLS 94%). There was only moderate correlation between items on all three questionnaires and VISA scores. The TED-QOL is rapid and easy to complete and analyse and has similar validity and reliability to longer questionnaires. All questionnaires showed only moderate correlation with disease severity, emphasising the discrepancy between objective and subjective assessments and the importance of measuring both.
Wellman, Robert J; Edelen, Maria Orlando; DiFranza, Joseph R
The Autonomy over Tobacco Scale (AUTOS) is composed of 12-symptoms of nicotine dependence. While it has demonstrated excellent reliability and validity, several psychometric properties have yet to be investigated. We aimed to determine (1) whether items functioned differently across demographic groups, (2) the likelihood that individual symptoms would be endorsed by smokers at different levels of diminished autonomy, and (3) the degree of information provided by each item and the reliability of the full AUTOS across the range of diminished autonomy. Data for this study come from two convenience samples of American adult current smokers (n=777; 69% female; 88% white; Mage=34 years, range: 18-78), of whom 66% were daily smokers (Mcigarettes/smoking day=10.1, range: AUTOS online as part of "a research study about the experiences people have when they smoke." After p value correction, items remained invariant across sex and minority status, while two items functioned differently according to age, with minimal impact on the total AUTOS score. Discriminative power of the items was high. The greatest amount of information is provided at just under one-half SD above the mean and the least at the extremes of diminished autonomy. The AUTOS maintains acceptable reliability (>0.70) across the range of diminished autonomy within which more than 95% of smokers' scores could be anticipated to fall. The AUTOS is a versatile and psychometrically sound instrument for measuring the loss of autonomy over tobacco use. Copyright © 2015 Elsevier Ltd. All rights reserved.
Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Bombardier, Charles H; Pohlig, Ryan T; Heinemann, Allen W; Carle, Adam; Choi, Seung W
To develop a calibrated spinal cord injury-quality of life (SCI-QOL) item bank, computer adaptive test (CAT), and short form to assess depressive symptoms experienced by individuals with SCI, transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a crosswalk to the Patient Health Questionnaire (PHQ)-9. We used grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, item response theory (IRT) analyses, and statistical linking techniques to transform scores to a PROMIS metric and to provide a crosswalk with the PHQ-9. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. Spinal Cord Injury--Quality of Life (SCI-QOL) Depression Item Bank Individuals with SCI were involved in all phases of SCI-QOL development. A sample of 716 individuals with traumatic SCI completed 35 items assessing depression, 18 of which were PROMIS items. After removing 7 non-PROMIS items, factor analyses confirmed a unidimensional pool of items. We used a graded response IRT model to estimate slopes and thresholds for the 28 retained items. The SCI-QOL Depression measure correlated 0.76 with the PHQ-9. The SCI-QOL Depression item bank provides a reliable and sensitive measure of depressive symptoms with scores reported in terms of general population norms. We provide a crosswalk to the PHQ-9 to facilitate comparisons between measures. The item bank may be administered as a CAT or as a short form and is suitable for research and clinical applications.
Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim
We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…
Waiyavutti, Chakadee; Johnson, Wendy; Deary, Ian J.
Intelligence differences might contribute to true differences in personality traits. It is also possible that intelligence might contribute to differences in understanding and interpreting personality items. Previous studies have not distinguished clearly between these possibilities. Before it can be accepted that scale score differences actually…
Full Text Available There is a surge of studies confirming that old age spares the ability to bind in visual working memory (VWM multiple features within singular object representations. Furthermore, it has been suggested that such ability may also be independent of the cultural background of the assessed individual. However, this evidence has been gathered with tasks that use arbitrary bindings of unfamiliar features. Whether age spares memory binding functions when the memoranda are features of everyday life objects remains less well explored. The present study investigated the influence of age, memory delay, and education, on conjunctive binding functions responsible for representing everyday items in VWM. We asked 32 healthy young and 41 healthy older adults to perform a memory binding task. During the task, participants saw visual arrays of objects, colours, or coloured objects presented for 6 s. Immediately after they were asked either to select the objects or the colours that were presented during the study display from larger sets of objects or colours, or to recombine them by selecting from such sets the objects and their corresponding colours. This procedure was repeated immediately after but this time providing a 30 s unfiled delay. We manipulated familiarity by presenting congruent and incongruent object-colour pairings. The results showed that the ability to bind intrinsic features in VWM does not decline with age even when these features belong to everyday items and form novel or well-known associations. Such preserved memory binding abilities held across memory delays. The impact of feature congruency on item-recognition appears to be greater in older than in younger adults. This suggests that long-term memory (LTM supports binding functions carried out in VWM for familiar everyday items and older adults still benefit from this LTM support. We have expanded the evidence supporting the lack of age effects on VWM binding functions to new feature and
Siskind, Theresa G.; Anderson, Lorin W.
The study was designed to examine the similarity of response options generated by different item writers using a systematic approach to item writing. The similarity of response options to student responses for the same item stems presented in an open-ended format was also examined. A non-systematic (subject matter expertise) approach and a…
Franken, Ingmar H. A.; Stam, Cornelis J.; Hendriks, Vincent M.; van den Brink, Wim
Previous studies have shown that drug abuse is associated with altered brain function. However, studies of heroin abuse-related brain dysfunctions are scarce. Electroencephalographic ( EEG) power and coherence analyses are two important tools for examining the effects of drugs on brain function. In
Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip
Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
Glas, Cornelis A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.
Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a
Liu, Pengfei; Qiu, Yuanyu; Qian, Yuting; Chen, Xiao; Wang, Yiran; Cui, Jin; Zhai, Xiao
To appraise the current reporting methodological quality of meta-analyses in five leading gastroenterology and hepatology journals, and to identify the variables associated with the reporting quality. We systematically searched the literature of meta-analyses in Gastroenterology, Gut, Hepatology, Journal of Hepatology (J HEPATOL) and American Journal of Gastroenterology (AM J GASTROENTEROL) from 2006 to 2008 and from 2012 to 2014. Characteristics were extracted based on the PRISMA statement and the AMSTAR tool. Country, number of patients, funding source were also revealed and descriptively reported. A total of 127 meta-analyses were enrolled in this study and were compared among journals, study years, and other characters. Compliances with the PRISMA statement and the AMSTAR checklist were 20.8 ± 4.2 out of a maximum of 27 and 7.6 ± 2.4 out of a maximum of 11, respectively. Some domains were poorly reported including describing a protocol and/or registration (item 5, 0.0%), describing methods, and giving results of additional analyses (item 16, 45.7% and item 23, 48.0%) for PRISMA and duplicating study selection and data extraction (item 2, 53.5%), and providing a list of included and excluded studies (item 5, 14.2%) for AMSTAR. Publication in recent years showed a significantly better methodological quality than those published in previous years. This study shows that methodological reporting quality of MAs in the major gastroenterology and hepatology journals has improved in recent years after the publication of the developed PRISMA statement, and it can be further improved. © 2016 Journal of Gastroenterology and Hepatology Foundation and John Wiley & Sons Australia, Ltd.
Hougaard, Jens Leth; Moulin, Hervé
We ask how to share the cost of finitely many public goods (items) among users with different needs: some smaller subsets of items are enough to serve the needs of each user, yet the cost of all items must be covered, even if this entails inefficiently paying for redundant items. Typical examples...... are network connectivity problems when an existing (possibly inefficient) network must be maintained. We axiomatize a family cost ratios based on simple liability indices, one for each agent and for each item, measuring the relative worth of this item across agents, and generating cost allocation rules...... additive in costs....
Sacco, Paul; Torres, Luis R; Cunningham-Williams, Renee M; Woods, Carol; Unick, G Jay
This study tested for the presence of differential item functioning (DIF) in DSM-IV Pathological Gambling Disorder (PGD) criteria based on gender, race/ethnicity and age. Using a nationally representative sample of adults from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), indicating current gambling (n = 10,899), Multiple Indicator-Multiple Cause (MIMIC) models tested for DIF, controlling for income, education, and marital status. Compared to the reference groups (i.e., Male, Caucasian, and ages 25-59 years), women (OR = 0.62; P gambling to escape (Criterion 5) (OR = 2.22; P < .001) but young adults (OR = 0.62; P < .05) were less likely to endorse it. African Americans (OR = 2.50; P < .001) and Hispanics were more likely to endorse trying to cut back (Criterion 3) (OR = 2.01; P < .01). African Americans were more likely to endorse the suffering losses (OR = 2.27; P < .01) criterion. Young adults were more likely to endorse chasing losses (Criterion 9) (OR = 1.81; P < .01) while older adults were less likely to endorse this criterion (OR = 0.76; P < .05). Further research is needed to identify factors contributing to DIF, address criteria level bias, and examine differential test functioning.
Edmonds, Lisa A; Donovan, Neila J
There is a pressing need for psychometrically sound naming materials for Spanish/English bilingual adults. To address this need, in this study the authors examined the psychometric properties of An Object and Action Naming Battery (An O&A Battery; Druks & Masterson, 2000) in bilingual speakers. Ninety-one Spanish/English bilinguals named O&A Battery items in English and Spanish. Responses underwent a Rasch analysis. Using correlation and regression analyses, the authors evaluated the effect of psycholinguistic (e.g., imageability) and participant (e.g., proficiency ratings) variables on accuracy. Rasch analysis determined unidimensionality across English and Spanish nouns and verbs and robust item-level psychometric properties, evidence for content validity. Few items did not fit the model, there were no ceiling or floor effects after uninformative and misfit items were removed, and items reflected a range of difficulty. Reliability coefficients were high, and the number of statistically different ability levels provided indices of sensitivity. Regression analyses revealed significant correlations between psycholinguistic variables and accuracy, providing preliminary construct validity. The participant variables that contributed most to accuracy were proficiency ratings and time of language use. Results suggest adequate content and construct validity of O&A items retained in the analysis for Spanish/English bilingual adults and support future efforts to evaluate naming in older bilinguals and persons with bilingual aphasia.
McKillop, Ashley B; Carroll, Linda J; Dick, Bruce D; Battié, Michele C
Of the three broad outcome domains of body functions and structures, activities, and participation (eg, engaging in valued social roles) outlined in the World Health Organization's (WHO) International Classification of Functioning, Disability and Health (ICF), it has been argued that participation is the most important to individuals, particularly those with chronic health problems. Yet, participation is not commonly measured in back pain research. The aim of this study was to investigate the construct validity of a modified 5-Item Pain Disability Index (PDI) score as a measure of participation in people with chronic back pain. A validation study was conducted using cross-sectional data. Participants with chronic back pain were recruited from a multidisciplinary pain center in Alberta, Canada. The outcome measure of interest is the 5-Item PDI. Each study participant was given a questionnaire package containing measures of participation, resilience, anxiety and depression, pain intensity, and pain-related disability, in addition to the PDI. The first five items of the PDI deal with social roles involving family responsibilities, recreation, social activities with friends, work, and sexual behavior, and comprised the 5-Item PDI seeking to measure participation. The last two items of the PDI deal with self-care and life support functions and were excluded. Construct validity of the 5-Item PDI as a measure of participation was examined using Pearson correlations or point-biserial correlations to test each hypothesized association. Participants were 70 people with chronic back pain and a mean age of 48.1 years. Forty-four (62.9%) were women. As hypothesized, the 5-Item PDI was associated with all measures of participation, including the Participation Assessment with Recombined Tools-Objective (r=-0.61), Late-Life Function and Disability Instrument: Disability Component (frequency: r=-0.66; limitation: r=-0.65), Work and Social Adjustment Scale (r=0.85), a global
Li, Chih-Ying; Waid-Ebbs, Julia; Velozo, Craig A; Heaton, Shelley C
Social problem-solving deficits characterise individuals with traumatic brain injury (TBI), and poor social problem solving interferes with daily functioning and productive lifestyles. Therefore, it is of vital importance to use the appropriate instrument to identify deficits in social problem solving for individuals with TBI. This study investigates factor structure and item-level psychometrics of the Social Problem Solving Inventory-Revised: Short Form (SPSI-R:S), for adults with moderate and severe TBI. Secondary analysis of 90 adults with moderate and severe TBI who completed the SPSI-R:S was performed. An exploratory factor analysis (EFA), principal components analysis (PCA) and Rasch analysis examined the factor structure and item-level psychometrics of the SPSI-R:S. The EFA showed three dominant factors, with positively worded items represented as the most definite factor. The other two factors are negative problem-solving orientation and skills; and negative problem-solving emotion. Rasch analyses confirmed the three factors are each unidimensional constructs. It was concluded that the total score interpretability of the SPSI-R:S may be challenging due to the multidimensional structure of the total measure. Instead, we propose using three separate SPSI-R:S subscores to measure social problem solving for the TBI population.
Li, Chih-Ying; Waid-Ebbs, Julia; Velozo, Craig A.; Heaton, Shelley C.
Primary Objective Social problem solving deficits characterize individuals with traumatic brain injury (TBI). Poor social problem solving interferes with daily functioning and productive lifestyles. Therefore, it is of vital importance to use the appropriate instrument to identify deficits in social problem solving for individuals with TBI. This study investigates factor structure and item-level psychometrics of the Social Problem Solving Inventory-Revised Short Form (SPSI-R:S), for adults with moderate and severe TBI. Research Design Secondary analysis of 90 adults with moderate and severe TBI who completed the SPSI-R:S. Methods and Procedures An exploratory factor analysis (EFA), principal components analysis (PCA) and Rasch analysis examined the factor structure and item-level psychometrics of the SPSI-R:S. Main Outcomes and Results The EFA showed three dominant factors, with positively worded items represented as the most definite factor. The other two factors are negative problem solving orientation and skills; and negative problem solving emotion. Rasch analyses confirmed the three factors are each unidimensional constructs. Conclusions The total score interpretability of the SPSI-R:S may be challenging due to the multidimensional structure of the total measure. Instead, we propose using three separate SPSI-R:S subscores to measure social problem solving for the TBI population. PMID:26052731
Full Text Available BACKGROUND: Integration of information streams into a unitary representation is an important task of our cognitive system. Within working memory, the medial temporal lobe (MTL has been conceptually linked to the maintenance of bound representations. In a previous fMRI study, we have shown that the MTL is indeed more active during working-memory maintenance of spatial associations as compared to non-spatial associations or single items. There are two explanations for this result, the mere presence of the spatial component activates the MTL, or the MTL is recruited to bind associations between neurally non-overlapping representations. METHODOLOGY/PRINCIPAL FINDINGS: The current fMRI study investigates this issue further by directly comparing intrinsic intra-item binding (object/colour, extrinsic intra-item binding (object/location, and inter-item binding (object/object. The three binding conditions resulted in differential activation of brain regions. Specifically, we show that the MTL is important for establishing extrinsic intra-item associations and inter-item associations, in line with the notion that binding of information processed in different brain regions depends on the MTL. CONCLUSIONS/SIGNIFICANCE: Our findings indicate that different forms of working-memory binding rely on specific neural structures. In addition, these results extend previous reports indicating that the MTL is implicated in working-memory maintenance, challenging the classic distinction between short-term and long-term memory systems.
Glas, Cornelis A.W.; Eggen, T.J.H.M.; Veldkamp, B.P.
Item response theory is usually applied to items with a selected-response format, such as multiple choice items, whereas generalizability theory is usually applied to constructed-response tasks assessed by raters. However, in many situations, raters may use rating scales consisting of items with a selected-response format. This chapter presents a short overview of how item response theory and generalizability theory were integrated to model such assessments. Further, the precision of the esti...
Bai, Mei; Dixon, Jane K
The purpose of this study was to reexamine the factor pattern of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale (FACIT-Sp-12) using exploratory factor analysis in people newly diagnosed with advanced cancer. Principal components analysis (PCA) and 3 common factor analysis methods were used to explore the factor pattern of the FACIT-Sp-12. Factorial validity was assessed in association with quality of life (QOL). Principal factor analysis (PFA), iterative PFA, and maximum likelihood suggested retrieving 3 factors: Peace, Meaning, and Faith. Both Peace and Meaning positively related to QOL, whereas only Peace uniquely contributed to QOL. This study supported the 3-factor model of the FACIT-Sp-12. Suggestions for revision of items and further validation of the identified factor pattern were provided.
Jung, Jaejoon; Philippot, Laurent
The relationship between microbial biodiversity and soil function is an important issue in ecology, yet most studies have been performed in pristine ecosystems. Here, we assess the role of microbial diversity in ecological function and remediation strategies in diesel-contaminated soils. Soil microbial diversity was manipulated using a removal by dilution approach and microbial functions were determined using both metagenomic analyses and enzymatic assays. A shift from Proteobacteria- to Acti...
Magistà, D; Ferrara, M; Del Nobile, M A; Gammariello, D; Conte, A; Perrone, G
Traditional sausages are often considered of superior quality to sausages inoculated with commercial starter cultures and this is partially due to the action of the typical house microflora. Penicillium nalgiovense is the species commonly used as starter culture for dry-cured meat production. Recently a new species, Penicillium salamii, was described as typical colonizer during salami seasoning. In order to understand its contribution to the seasoning process, two different experiments on curing of fresh pork sausages were conducted using P. salamii ITEM 15302 in comparison with P. nalgiovense ITEM 15292 at small and industrial scale, and the dry-cured sausages were subjected to sensory analyses. Additionally, proteolytic and lipolytic in vitro assays were performed on both strains. P. salamii ITEM 15302 proved to be a fast growing mould on dry-cured sausage casings, well adapted to the seasoning process, with high lipolytic and proteolytic enzymatic activity that confers typical sensory characteristics to meat products. Therefore, P. salamii ITEM 15302 was shown to be a good candidate as new starter for meat industry. Copyright © 2015 Elsevier B.V. All rights reserved.
The present study uses a mixture Rasch model to examine latent differential item functioning in English as a foreign language listening tests. Participants (n = 250) took a listening and lexico-grammatical test and completed the metacognitive awareness listening questionnaire comprising problem solving (PS), planning and evaluation (PE), mental…
Mallett, Remington; Lewis-Peacock, Jarrod A
How we attend to our thoughts affects how we attend to our environment. Holding information in working memory can automatically bias visual attention toward matching information. By observing attentional biases on reaction times to visual search during a memory delay, it is possible to reconstruct the source of that bias using machine learning techniques and thereby behaviorally decode the content of working memory. Can this be done when more than one item is held in working memory? There is some evidence that multiple items can simultaneously bias attention, but the effects have been inconsistent. One explanation may be that items are stored in different states depending on the current task demands. Recent models propose functionally distinct states of representation for items inside versus outside the focus of attention. Here, we use behavioral decoding to evaluate whether multiple memory items-including temporarily irrelevant items outside the focus of attention-exert biases on visual attention. Only the single item in the focus of attention was decodable. The other item showed a brief attentional bias that dissipated until it returned to the focus of attention. These results support the idea of dynamic, flexible states of working memory across time and priority. © 2018 New York Academy of Sciences.
For a randomly renewed item the probability distributions of the time to failure and of the duration of down time and the expectations of these random variables are determined. Moreover, it is shown that the same theory applies to randomly checked items with exponential probability distribution of life such as electronic items. The case of periodic renewals is treated as an example. (orig.) [de
Full Text Available Abstract Background The randomized placebo-controlled IFIGENIA-trial demonstrated that therapy with high-dose N-acetylcysteine (NAC given for one year, added to prednisone and azathioprine, significantly ameliorates (i.e. slows down disease progression in terms of vital capacity (VC (+9% and diffusing capacity (DLco (+24% in idiopathic pulmonary fibrosis (IPF. To better understand the clinical implications of these findings we performed additional, explorative analyses of the IFGENIA data set. Methods We analysed effects of NAC on VC, DLco, a composite physiologic index (CPI, and mortality in the 155 study-patients. Results In trial completers the functional indices did not change significantly with NAC, whereas most indices deteriorated with placebo; in non-completers the majority of indices worsened but decline was generally less pronounced in most indices with NAC than with placebo. Most categorical analyses of VC, DLco and CPI also showed favourable changes with NAC. The effects of NAC on VC, DLco and CPI were significantly better if the baseline CPI was 50 points or lower. Conclusion This descriptive analysis confirms and extends the favourable effects of NAC on lung function in IPF and emphasizes the usefulness of VC, DLco, and the CPI for the evaluation of a therapeutic effect. Most importantly, less progressed disease as indicated by a CPI of 50 points or lower at baseline was more responsive to therapy in this study. Trial Registration Registered at http://www.ClinicalTrials.gov; number NCT00639496.
Full Text Available Reverse worded (RW items are often used to reduce or eliminate acquiescence bias, but there is a rising concern about their harmful effects on the covariance structure of the scale. Therefore, results obtained via traditional covariance analyses may be distorted. This study examined the effect of the RW items on the factor structure of the abbreviated 18-item Need for Cognition (NFC scale using confirmatory factor analysis. We modified the scale to create three revised versions, varying from no RW items to all RW items. We also manipulated the type of the RW items (polar opposite vs. negated. To each of the four scales, we fit four previously developed models. The four models included a 1-factor model, a 2-factor model distinguishing between positively worded (PW items and RW items, and two 2-factor models, each with one substantive factor and one method factor. Results showed that the number and type of the RW items affected the factor structure of the NFC scale. Consistent with previous research findings, for the original NFC scale, which contains both PW and RW items, the 1-factor model did not have good fit. In contrast, for the revised scales that had no RW items or all RW items, the 1-factor model had reasonably good fit. In addition, for the scale with polar opposite and negated RW items, the factor model with a method factor among the polar opposite items had considerably better fit than the 1-factor model.
朱小姝 Xiao-Shu Zhu
Full Text Available 大型教育評量研究常採用多階段抽樣的設計（multi-stage sampling design），透過對母群體之抽樣單位進行分層以抽取受測者。此外，還會採用複雜題本設計（complex booklet design）的方式將題目組成多份測驗題本。在此情況下，欲確保公正測量出不同受測群體的能力，關鍵在於能夠有效偵測所採用的題目是否具差別試題功能（differential item functioning, DIF）。本文旨在介紹探討在大型教育評量複雜設計之下能用以偵測差別試題功能的建模方法，並應用六種可用於偵測DIF 的多階層廣義線性模式（hierarchical generalized linear models, HGLMs），再透過電腦模擬比較它們偵測DIF 的效力。接著又將這些模式應用到國際數學與科學教育成就趨勢調查研究（TIMSS）的實證數據上，藉以探測是否存在一致性的性別DIF（uniform gender DIF）。 Many educational surveys employ a multi-stage sampling design for students, which makes use of stratification and/or clustering of population units, as well as a complex booklet design for items from an item pool. In these surveys, the reliable detection of item bias or differential item functioning (DIF across student groups is a key component for ensuring fair representations of different student groups. In this paper, we describe several modeling approaches that can be useful for detecting DIF in educational surveys. We illustrate the key ideas by investigating the performance of six hierarchical generalized linear models (HGLMs using a small simulation study and by applying them to real data from the Trends in Mathematics and Science Study (TIMSS study where we use them to investigate potential uniform gender DIF.
... 17 Commodity and Securities Exchanges 3 2010-04-01 2010-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...
Full Text Available The Multiple Sclerosis Quality of Life-54 (MSQOL-54, 52 items grouped in 12 subscales plus two single items is the most used MS specific health related quality of life inventory.To develop a shortened version of the MSQOL-54.MSQOL-54 dimensionality and metric properties were investigated by confirmatory factor analysis (CFA and Rasch modelling (Partial Credit Model, PCM on MSQOL-54s completed by 473 MS patients. Their mean age was 41 years, 65% were women, and median Expanded Disability Status Scale (EDSS score was 2.0 (range 0-9.5. Differential item functioning (DIF was evaluated for gender, age and EDSS. Dimensionality of the resulting short version was assessed by exploratory factor analysis (EFA and CFA. Cognitive debriefing of the short instrument (vs. the original was then performed on 12 MS patients.CFA of MSQOL-54 subscales showed that the data fitted the overall model well. Two subscales (Role Limitations--Physical, Role Limitations--Emotional did not fit the PCM, and were removed; two other subscales (Health Perceptions, Social Function did not fit the model, but were retained as single items. Sexual Satisfaction (single-item subscale was also removed. The resulting MSQOL-29 consisted of 25 items grouped in 7 subscales, plus 4 single items. PCM fit statistics were within the acceptability range for all MSQOL-29 items except one which had significant DIF by age. EFA and CFA indicated adequate fit to the original two-factor (Physical and Mental Health Composites hypothesis. Cognitive debriefing confirmed that MSQOL-29 was acceptable and had lost no key items.The proposed MSQOL-29 is 50% shorter than MSQOL-54, yet preserves key quality of life dimensions. Prospective validation on a large, independent MS patient sample is ongoing.
Heruc, Z.; Pozar, J.
CGI procurement is a process whereby parts are brought without imposing Appendix B Quality Assurance requirements on the supplier, and than dedicated for use in safety-related applications. The dedication process involves 1) based upon required safety function, an engineering evaluation to identify critical characteristic of the item and specification of acceptance criteria; and 2) quality control activities to ensure the item(s) supplied meets the acceptance criteria specified. CGI Dedication supports the supply of certified components/parts for the plant operation in an environment where the number of nuclear qualified suppliers diminishes. It requires a more active role of the plant personnel, therefore presenting an additional burden on human resources, but at the same time increases the technical KNOW-HOW and improves the confidence of test and inspection data presented in the certificates. Very often it is also cost beneficial. This paper is a continuation to last year presentation of the introduction of this method into NEK's procurement process and presents the current approach and some practical examples. (author)
Tsutsumi, A.; Iwata, N.; Watanabe, N.; Jonge, de J.; Pikhart, H.; Férnandez-López, J.A.; Xu, Liying; Peter, R.; Knutsson, A.; Niedhammer, I.; Kawakami, N.; Siegrist, J.
Our objective was to examine cross-cultural comparability of standard scales of the Effort-Reward Imbalance occupational stress scales by item response theory (IRT) analyses. Data were from 20,256 Japanese employees, 1464 Dutch nurses and nurses' aides, 2128 representative employees from
Khedlekar, Uttam Kumar; Shukla, Diwakar; Namdeo, Anubhav
We have designed an inventory model for seasonal products in which deterioration can be controlled by item preservation technology investment. Demand for the product is considered price sensitive and decreases linearly. This study has shown that the profit is a concave function of optimal selling price, replenishment time and preservation cost parameter. We simultaneously determined the optimal selling price of the product, the replenishment cycle and the cost of item preservation technology. Additionally, this study has shown that there exists an optimal selling price and optimal preservation investment to maximize the profit for every business set-up. Finally, the model is illustrated by numerical examples and sensitive analysis of the optimal solution with respect to major parameters.
Marsh, Herbert W; Lüdtke, Oliver; Nagengast, Benjamin; Morin, Alexandre J S; Von Davier, Matthias
The present investigation has a dual focus: to evaluate problematic practice in the use of item parcels and to suggest exploratory structural equation models (ESEMs) as a viable alternative to the traditional independent clusters confirmatory factor analysis (ICM-CFA) model (with no cross-loadings, subsidiary factors, or correlated uniquenesses). Typically, it is ill-advised to (a) use item parcels when ICM-CFA models do not fit the data, and (b) retain ICM-CFA models when items cross-load on multiple factors. However, the combined use of (a) and (b) is widespread and often provides such misleadingly good fit indexes that applied researchers might believe that misspecification problems are resolved--that 2 wrongs really do make a right. Taking a pragmatist perspective, in 4 studies we demonstrate with responses to the Rosenberg Self-Esteem Inventory (Rosenberg, 1965), Big Five personality factors, and simulated data that even small cross-loadings seriously distort relations among ICM-CFA constructs or even decisions on the number of factors; although obvious in item-level analyses, this is camouflaged by the use of parcels. ESEMs provide a viable alternative to ICM-CFAs and a test for the appropriateness of parcels. The use of parcels with an ICM-CFA model is most justifiable when the fit of both ICM-CFA and ESEM models is acceptable and equally good, and when substantively important interpretations are similar. However, if the ESEM model fits the data better than the ICM-CFA model, then the use of parcels with an ICM-CFA model typically is ill-advised--particularly in studies that are also interested in scale development, latent means, and measurement invariance.
Full Text Available Background: The hospital anxiety and depression scale (HADS is a common screening tool designed to measure the level of anxiety and depression in different factor structures and has been extensively used in non-psychiatric populations and individuals experiencing fertility problems. Objective: The aims of this study were to evaluate the factor structure, item analyses, and internal consistency of HADS in Iranian infertile patients. Materials and Methods: This cross-sectional study included 651 infertile patients (248 men and 403 women referred to a referral infertility Center in Tehran, Iran between January 2014 and January 2015. Confirmatory factor analysis was used to determine the underlying factor structure of the HADS among one, two, and threefactor models. Several goodness of fit indices were utilized such as comparative, normed and goodness of fit indices, Akaike information criterion, and the root mean squared error of approximation. In addition to HADS, the Satisfaction with Life Scale questionnaires as well as demographic and clinical information were administered to all patients. Results: The goodness of fit indices through CFAs exposed that three and onefactor model provided the best and worst fit to the total, male and female datasets compared to the other factor structure models for the infertile patients. The Cronbach’s alpha for anxiety and depression subscales were 0.866 and 0.753 respectively. The HADS subscales significantly correlated with SWLS, indicating an acceptable convergent validity. Conclusion: The HADS was found to be a three-factor structure screening instrument in the field of infertility.
Sahin, Alper; Anil, Duygu
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
Hamane, Ryoso; Itoh, Toshiya; Tomita, Kouhei
When a store sells items to customers, the store wishes to determine the prices of the items to maximize its profit. Intuitively, if the store sells the items with low (resp. high) prices, the customers buy more (resp. less) items, which provides less profit to the store. So it would be hard for the store to decide the prices of items. Assume that the store has a set V of n items and there is a set E of m customers who wish to buy those items, and also assume that each item i ∈ V has the production cost di and each customer ej ∈ E has the valuation vj on the bundle ej ⊆ V of items. When the store sells an item i ∈ V at the price ri, the profit for the item i is pi = ri - di. The goal of the store is to decide the price of each item to maximize its total profit. We refer to this maximization problem as the item pricing problem. In most of the previous works, the item pricing problem was considered under the assumption that pi ≥ 0 for each i ∈ V, however, Balcan, et al. [In Proc. of WINE, LNCS 4858, 2007] introduced the notion of “loss-leader, ” and showed that the seller can get more total profit in the case that pi < 0 is allowed than in the case that pi < 0 is not allowed. In this paper, we derive approximation preserving reductions among several item pricing problems and show that all of them have algorithms with good approximation ratio.
Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
Liu, Yu; Taber, Keith S.
Symbolic expressions are essential resources for producing knowledge, yet they are a source of learning difficulties in chemistry education. This study aims to employ social semiotics to analyse the symbolic representation of chemistry from two complementary perspectives, referred to here as contextual (i.e., historical) and functional. First, the…
As the largest international study ever taken in history, the Trend in Mathematics and Science Study (TIMSS) has been held as a benchmark to measure U.S. student performance in the global context. In-depth analyses of the TIMSS project are conducted in this study to examine key issues of the comparative investigation: (1) item flaws in mathematics…
Full Text Available In this paper a modern item design framework for computer based assessment based on Flash authoring environment will be introduced. Question design will be discussed as well as the multimedia authoring environment used for item modeling emphasized. Item type templates are a structured means of collecting and storing item information that can be used to improve the efficiency and security of the innovative item design process. Templates can modernize the item design, enhance and speed up the development process. Along with content creation, multimedia has vast potential for use in innovative testing. The introduced item design template is based on taxonomy of innovative items which have great potential for expanding the content areas and construct coverage of an assessment. The presented item design approach is based on GUI's – one for question design based on implemented item design templates and one for user interaction tracking/retrieval. The concept of user interfaces based on Flash technology will be discussed as well as implementation of the innovative approach of the item design forms with multimedia authoring. Also an innovative method for user interaction storage/retrieval based on PHP extending Flash capabilities in the proposed framework will be introduced.
The 12-item general health questionnaire (GHQ-12) was used in the Longitudinal Study of Young People in England (LSYPE; N = 15,770) to collect measures on adolescent mental health. Given the debate in current literature regarding the dimensionality of the GHQ-12, this study examined the cultural sensitivity of the instrument at the item level for each of the 7 major ethnic groups within the database. This study used a hybrid approach of ordinal logistic regression and item response theory (IRT) to examine the presence of differential item functioning (DIF) on the questionnaire. Results demonstrated that uniform, nonuniform, and overall DIF were present on items between White and Asian adolescents (7 items), White and Black Caribbean adolescents (1 item), and White and Black African adolescents (7 items), however all McFadden's pseudo R² effect size estimates indicated that the DIF was negligible. Overall, there were cumulative small scale level effects for the Mixed/Biracial, Asian, and Black African groups, but in each case the bias was only marginal. Findings demonstrate that the GHQ-12 can be considered culturally sensitive for adolescents from diverse ethnic groups in England, but follow-up studies are necessary. Implications for future education and health policies as well as the use of IR-based approaches for psychological instruments are discussed. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
the defined boundaries. This ensures the accuracy of the classification. Third, when item threshold parameters vary a bit, the scoring rubrics and the items need to be reviewed to make the threshold parameters similar across items. This is because one important design criterion of the learning progression-based items is that ideally, a student should be at the same level across items, which means that the item threshold parameters (d1, d 2 and d3) should be similar across items. To design a learning progression-based science assessment, we need to understand whether the assessment measures a single construct or several constructs and how items are associated with the constructs being measured. Results from dimension analyses indicate that items of different carbon transforming processes measure different aspects of the carbon cycle construct. However, items of different practices assess the same construct. In general, there are high correlations among different processes or practices. It is not clear whether the strong correlations are due to the inherent links among these process/practice dimensions or due to the fact that the student sample does not show much variation in these process/practice dimensions. Future data are needed to examine the dimensionalities in terms of process/practice in detail. Finally, based on item characteristics analysis, recommendations are made to write more discriminative CR items and better OMC, MTF options. Item writers can follow these recommendations to write better learning progression-based items.
Chemla, Emmanuel; Mintz, Toben H; Bernal, Savita; Christophe, Anne
Mintz (2003) described a distributional environment called a frame, defined as the co-occurrence of two context words with one intervening target word. Analyses of English child-directed speech showed that words that fell within any frequently occurring frame consistently belonged to the same grammatical category (e.g. noun, verb, adjective, etc.). In this paper, we first generalize this result to French, a language in which the function word system allows patterns that are potentially detrimental to a frame-based analysis procedure. Second, we show that the discontinuity of the chosen environments (i.e. the fact that target words are framed by the context words) is crucial for the mechanism to be efficient. This property might be relevant for any computational approach to grammatical categorization. Finally, we investigate a recursive application of the procedure and observe that the categorization is paradoxically worse when context elements are categories rather than actual lexical items. Item-specificity is thus also a core computational principle for this type of algorithm. Our analysis, along with results from behavioural studies (Gómez, 2002; Gómez and Maye, 2005; Mintz, 2006), provides strong support for frames as a basis for the acquisition of grammatical categories by infants. Discontinuity and item-specificity appear to be crucial features.
Hays, Ron D; Revicki, Dennis A; Feeny, David; Fayers, Peter; Spritzer, Karen L; Cella, David
Preference-based health-related quality of life (HR-QOL) scores are useful as outcome measures in clinical studies, for monitoring the health of populations, and for estimating quality-adjusted life-years. This was a secondary analysis of data collected in an internet survey as part of the Patient-Reported Outcomes Measurement Information System (PROMIS(®)) project. To estimate Health Utilities Index Mark 3 (HUI-3) preference scores, we used the ten PROMIS(®) global health items, the PROMIS-29 V2.0 single pain intensity item and seven multi-item scales (physical functioning, fatigue, pain interference, depressive symptoms, anxiety, ability to participate in social roles and activities, sleep disturbance), and the PROMIS-29 V2.0 items. Linear regression analyses were used to identify significant predictors, followed by simple linear equating to avoid regression to the mean. The regression models explained 48 % (global health items), 61 % (PROMIS-29 V2.0 scales), and 64 % (PROMIS-29 V2.0 items) of the variance in the HUI-3 preference score. Linear equated scores were similar to observed scores, although differences tended to be larger for older study participants. HUI-3 preference scores can be estimated from the PROMIS(®) global health items or PROMIS-29 V2.0. The estimated HUI-3 scores from the PROMIS(®) health measures can be used for economic applications and as a measure of overall HR-QOL in research.
Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel; Verbeke, Geert; De Boeck, Paul
For item response theory (IRT) models, which belong to the class of generalized linear or non-linear mixed models, reliability at the scale of observed scores (i.e., manifest correlation) is more difficult to calculate than latent correlation based reliability, but usually of greater scientific interest. This is not least because it cannot be calculated explicitly when the logit link is used in conjunction with normal random effects. As such, approximations such as Fisher's information coefficient, Cronbach's α, or the latent correlation are calculated, allegedly because it is easy to do so. Cronbach's α has well-known and serious drawbacks, Fisher's information is not meaningful under certain circumstances, and there is an important but often overlooked difference between latent and manifest correlations. Here, manifest correlation refers to correlation between observed scores, while latent correlation refers to correlation between scores at the latent (e.g., logit or probit) scale. Thus, using one in place of the other can lead to erroneous conclusions. Taylor series based reliability measures, which are based on manifest correlation functions, are derived and a careful comparison of reliability measures based on latent correlations, Fisher's information, and exact reliability is carried out. The latent correlations are virtually always considerably higher than their manifest counterparts, Fisher's information measure shows no coherent behaviour (it is even negative in some cases), while the newly introduced Taylor series based approximations reflect the exact reliability very closely. Comparisons among the various types of correlations, for various IRT models, are made using algebraic expressions, Monte Carlo simulations, and data analysis. Given the light computational burden and the performance of Taylor series based reliability measures, their use is recommended. © 2014 The British Psychological Society.
Bodenburg, Sebastian; Dopslaff, Nina
The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.
Ariel, A.; van der Linden, Willem J.; Veldkamp, Bernard P.
Item-pool management requires a balancing act between the input of new items into the pool and the output of tests assembled from it. A strategy for optimizing item-pool management is presented that is based on the idea of a periodic update of an optimal blueprint for the item pool to tune item
Measuring impairments of functioning and health in patients with axial spondyloarthritis by using the ASAS Health Index and the Environmental Item Set: translation and cross-cultural adaptation into 15 languages
Kiltz, U; van der Heijde, D; Boonen, A; Bautista-Molano, W; Burgos-Vargas, R; Chiowchanwisawakit, P; Duruoz, T; El-Zorkany, B; Essers, I; Gaydukova, I; Géher, P; Gossec, L; Grazio, S; Gu, J; Khan, M A; Kim, T J; Maksymowych, W P; Marzo-Ortega, H; Navarro-Compán, V; Olivieri, I; Patrikos, D; Pimentel-Santos, F M; Schirmer, M; van den Bosch, F; Weber, U; Zochling, J; Braun, J
Introduction The Assessments of SpondyloArthritis international society Health Index (ASAS HI) measures functioning and health in patients with spondyloarthritis (SpA) across 17 aspects of health and 9 environmental factors (EF). The objective was to translate and adapt the original English version of the ASAS HI, including the EF Item Set, cross-culturally into 15 languages. Methods Translation and cross-cultural adaptation has been carried out following the forward–backward procedure. In the cognitive debriefing, 10 patients/country across a broad spectrum of sociodemographic background, were included. Results The ASAS HI and the EF Item Set were translated into Arabic, Chinese, Croatian, Dutch, French, German, Greek, Hungarian, Italian, Korean, Portuguese, Russian, Spanish, Thai and Turkish. Some difficulties were experienced with translation of the contextual factors indicating that these concepts may be more culturally-dependent. A total of 215 patients with axial SpA across 23 countries (62.3% men, mean (SD) age 42.4 (13.9) years) participated in the field test. Cognitive debriefing showed that items of the ASAS HI and EF Item Set are clear, relevant and comprehensive. All versions were accepted with minor modifications with respect to item wording and response option. The wording of three items had to be adapted to improve clarity. As a result of cognitive debriefing, a new response option ‘not applicable’ was added to two items of the ASAS HI to improve appropriateness. Discussion This study showed that the items of the ASAS HI including the EFs were readily adaptable throughout all countries, indicating that the concepts covered were comprehensive, clear and meaningful in different cultures. PMID:27752358
Tay, Louis; Drasgow, Fritz
Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…
Saher, Marieke; Arvola, Anne; Lindeman, Marjaana; Lähteenmäki, Liisa
Functional foods provide a new way of expressing healthiness in food choices. The objective of this study was to apply an indirect measure to explore what kind of impressions people form of users of functional foods. Respondents (n=350) received one of eight versions of a shopping list and rated the buyer of the foods on 66 bipolar attributes on 7-point scales. The shopping lists had either healthy or neutral background items, conventional or functional target items and the buyer was described either as a 40-year-old woman or man. The attribute ratings revealed three factors: disciplined, innovative and gentle. Buyers with healthy background items were perceived as more disciplined than those having neutral items on the list, users of functional foods were rated as more disciplined than users of conventional target items only when the background list consisted of neutral items. Buyers of functional foods were regarded as more innovative and less gentle, but gender affected the ratings on gentle dimension. The impressions of functional food users clearly differ from those formed of users of conventional foods with a healthy image. The shopping list method performed well as an indirect method, but further studies are required to test its feasibility in measuring other food-related impressions.
Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar
The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Pilkonis, Paul A; Kim, Yookyung; Yu, Lan; Morse, Jennifer Q
The Adult Attachment Ratings (AAR) include 3 scales for anxious, ambivalent attachment (excessive dependency, interpersonal ambivalence, and compulsive care-giving), 3 for avoidant attachment (rigid self-control, defensive separation, and emotional detachment), and 1 for secure attachment. The scales include items (ranging from 6-16 in their original form) scored by raters using a 3-point format (0 = absent, 1 = present, and 2 = strongly present) and summed to produce a total score. Item response theory (IRT) analyses were conducted with data from 414 participants recruited from psychiatric outpatient, medical, and community settings to identify the most informative items from each scale. The IRT results allowed us to shorten the scales to 5-item versions that are more precise and easier to rate because of their brevity. In general, the effective range of measurement for the scales was 0 to +2 SDs for each of the attachment constructs; that is, from average to high levels of attachment problems. Evidence for convergent and discriminant validity of the scales was investigated by comparing them with the Experiences of Close Relationships-Revised (ECR-R) scale and the Kobak Attachment Q-sort. The best consensus among self-reports on the ECR-R, informant ratings on the ECR-R, and expert judgments on the Q-sort and the AAR emerged for anxious, ambivalent attachment. Given the good psychometric characteristics of the scale for secure attachment, however, this measure alone might provide a simple alternative to more elaborate procedures for some measurement purposes. Conversion tables are provided for the 7 scales to facilitate transformation from raw scores to IRT-calibrated (theta) scores.
Phillips Charles D
Full Text Available Abstract Background To test the validity and reliability of scales intended to measure activity limitations faced by children with chronic illnesses living in the community. The scales were based on information provided by caregivers to service program personnel almost exclusively trained as social workers. The items used to measure activity limitations were interRAI items supplemented so that they were more applicable to activity limitations in children with chronic illnesses. In addition, these analyses may shed light on the possibility of gathering functional information that can span the life course as well as spanning different care settings. Methods Analyses included testing the internal consistency, predictive, concurrent, discriminant and construct validity of two activity limitation scales. The scales were developed using assessment data gathered in the United States of America (USA from over 2,700 assessments of children aged 4 to 20 receiving Medicaid Early and Periodic Screening, Diagnostic and Treatment (EPSDT services, specifically Personal Care Services to assist children in overcoming activity limitations. The Medicaid program in the USA pays for health care services provided to children in low-income households. Data were collected in a single, large state in the southwestern USA in late 2008 and early 2009. A similar sample of children was assessed in 2010, and the analyses were replicated using this sample. Results The two scales exhibited excellent internal consistency. Evidence on the concurrent, predictive, discriminant, and construct validity of the proposed scales was strong. Quite importantly, scale scores were not correlated with (confounded with a child's developmental stage or age. The results for these scales and items were consistent across the two independent samples. Conclusions Unpaid caregivers, usually parents, can provide assessors lacking either medical or nursing training with reliable and valid information
Phillips, Charles D; Patnaik, Ashweeta; Moudouni, Darcy K; Naiser, Emily; Dyer, James A; Hawes, Catherine; Fournier, Constance J; Miller, Thomas R; Elliott, Timothy R
To test the validity and reliability of scales intended to measure activity limitations faced by children with chronic illnesses living in the community. The scales were based on information provided by caregivers to service program personnel almost exclusively trained as social workers. The items used to measure activity limitations were interRAI items supplemented so that they were more applicable to activity limitations in children with chronic illnesses. In addition, these analyses may shed light on the possibility of gathering functional information that can span the life course as well as spanning different care settings. Analyses included testing the internal consistency, predictive, concurrent, discriminant and construct validity of two activity limitation scales. The scales were developed using assessment data gathered in the United States of America (USA) from over 2,700 assessments of children aged 4 to 20 receiving Medicaid Early and Periodic Screening, Diagnostic and Treatment (EPSDT) services, specifically Personal Care Services to assist children in overcoming activity limitations. The Medicaid program in the USA pays for health care services provided to children in low-income households. Data were collected in a single, large state in the southwestern USA in late 2008 and early 2009. A similar sample of children was assessed in 2010, and the analyses were replicated using this sample. The two scales exhibited excellent internal consistency. Evidence on the concurrent, predictive, discriminant, and construct validity of the proposed scales was strong. Quite importantly, scale scores were not correlated with (confounded with) a child's developmental stage or age. The results for these scales and items were consistent across the two independent samples. Unpaid caregivers, usually parents, can provide assessors lacking either medical or nursing training with reliable and valid information on the activity limitations of children. One can summarize these
Jäger, B; Schmid-Ott, G; Ernst, G; Dölle-Lange, E; Sack, M
The aim of this study was to construct and validate a short self-rating questionnaire for the assessment of ego functions and ability of self regulation. An item pool of 120 items covering 6 postulated dimensions was reduced by two steps in independent samples (n = 136 + 470) via factor and item analyses to the final version consisting of 35 items. The 5 resulting questionnaire scales "interpersonal disturbances", "frustration tolerance and impulse control", "identity disturbances", "affect differentiation and affect tolerance" and "self-esteem" were well interpretable and showed in confirmatory factor analysis the best fit to the data (CHI²/df = 3.48; RMSEA = 0.73). Total scores were found to differentiate well between diagnostic groups of patients with more or less ego pathology (FANOVA = 9.8; df = 11; p self-regulation questionnaire" (HSRQ) evidently is an appropriate and reliable screening instrument in order to assess ego functions and capacities of self regulation in an economic and user-friendly means. The scale structure allows differentiated diagnostics of weak vs. stable ego functions and may be used for detailed therapy planning. © Georg Thieme Verlag KG Stuttgart · New York.
Arce-Ferrer, Alvaro J.; Bulut, Okan
This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…
Oltra-Cucarella, J; Pérez-Elvira, R; Duque, P
the aim of this study is to test the encoding deficit hypothesis in Alzheimer disease (AD) using a recent method for correcting memory tests. To this end, a Spanish-language adaptation of the Free and Cued Selective Reminding Test was interpreted using the Item Specific Deficit Approach (ISDA), which provides three indices: Encoding Deficit Index, Consolidation Deficit Index, and Retrieval Deficit Index. We compared the performances of 15 patients with AD and 20 healthy control subjects and analysed results using either the task instructions or the ISDA approach. patients with AD displayed deficient encoding of more than half the information, but items that were encoded properly could be retrieved later with the help of the same semantic clues provided individually during encoding. Virtually all the information retained over the long-term was retrieved by using semantic clues. Encoding was shown to be the most impaired process, followed by retrieval and consolidation. Discriminant function analyses showed that ISDA indices are more sensitive and specific for detecting memory impairments in AD than are raw scores. These results indicate that patients with AD present impaired information encoding, but they benefit from semantic hints that help them recover previously learned information. This should be taken into account for intervention techniques focusing on memory impairments in AD. Copyright © 2013 Sociedad Española de Neurología. Published by Elsevier Espana. All rights reserved.
Full Text Available Laura Kelly, Crispin Jenkinson, Sarah Dummett, Jill Dawson, Ray Fitzpatrick, David Morley Health Services Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK Purpose: The Oxford Participation and Activities Questionnaire is a patient-reported outcome measure in development that is grounded on the World Health Organization International Classification of Functioning, Disability, and Health (ICF. The study reported here aimed to inform and generate an item pool for the new measure, which is specifically designed for the assessment of participation and activity in patients experiencing a range of health conditions. Methods: Items were informed through in-depth interviews conducted with 37 participants spanning a range of conditions. Interviews aimed to identify how their condition impacted their ability to participate in meaningful activities. Conditions included arthritis, cancer, chronic back pain, diabetes, motor neuron disease, multiple sclerosis, Parkinson's disease, and spinal cord injury. Transcripts were analyzed using the framework method. Statements relating to ICF themes were recast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n=13 were used to assess items for face and content validity. Results: ICF themes relevant to activities and participation in everyday life were explored, and a total of 222 items formed the initial item pool. This item pool was refined by the research team and 28 generic items were mapped onto all nine chapters of the ICF construct, detailing activity and participation. Cognitive interviewing confirmed the questionnaire instructions, items, and response options were acceptable to participants. Conclusion: Using a clear conceptual basis to inform item generation, 28 items have been identified as suitable to undergo further psychometric testing. A large-scale postal survey will follow in order to refine the instrument further and
McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H
Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item
Beard, R.L.; Rosch, F.C.; Sanwarwalla, M.H.; Tjernlund, R.M.
Nuclear utilities are embarking on innovative approaches for reducing costs in all aspects of engineering, operation, maintenance, and procurement to produce power cheaply and efficiently and remain competitive with other power producers. In the area of procurement, utilities are increasingly obtaining commercial-grade items for use in safety-related applications. This trend is occurring because of lack of suppliers capable or willing to meet 10 CFR 21 and 10 CFR 50 App. B requirements and because of the absence of original equipment suppliers or the spiraling cost associated with procuring items' basic components safety-related from original suppliers. Utilities have been looking at ways to reduce procurement costs. One promising means to reduce costs is to utilize information provided in other nonnuclear industry standards regarding the specification, control, manufacture, and acceptance of the critical characteristics required of the item to perform its design function. A task force was instituted under the sponsorship of the Electric Power Research Institute to investigate the feasibility of using items manufactured to other industry standards in nuclear safety-related applications. This investigation looked at a broad spectrum of available industry standards pertaining to the design, function, manufacture, and testing of items and determined that some standards are more useful than others. This paper discusses the results of this investigation and how credit from the controls exercised for items manufactured to certain existing industry standards can be taken to minimize dedication costs
A multi-level differential item functioning analysis of trends in international mathematics and science study: Potential sources of gender and minority difference among U.S. eighth graders' science achievement
Science is an area where a large achievement gap has been observed between White and minority, and between male and female students. The science minority gap has continued as indicated by the National Assessment of Educational Progress and the Trends in International Mathematics and Science Studies (TIMSS). TIMSS also shows a gender gap favoring males emerging at the eighth grade. Both gaps continue to be wider in the number of doctoral degrees and full professorships awarded (NSF, 2008). The current study investigated both minority and gender achievement gaps in science utilizing a multi-level differential item functioning (DIF) methodology (Kamata, 2001) within fully Bayesian framework. All dichotomously coded items from TIMSS 2007 science assessment at eighth grade were analyzed. Both gender DIF and minority DIF were studied. Multi-level models were employed to identify DIF items and sources of DIF at both student and teacher levels. The study found that several student variables were potential sources of achievement gaps. It was also found that gender DIF favoring male students was more noticeable in the content areas of physics and earth science than biology and chemistry. In terms of item type, the majority of these gender DIF items were multiple choice than constructed response items. Female students also performed less well on items requiring visual-spatial ability. Minority students performed significantly worse on physics and earth science items as well. A higher percentage of minority DIF items in earth science and biology were constructed response than multiple choice items, indicating that literacy may be the cause of minority DIF. Three-level model results suggested that some teacher variables may be the cause of DIF variations from teacher to teacher. It is essential for both middle school science teachers and science educators to find instructional methods that work more effectively to improve science achievement of both female and minority students
... DEPARTMENT OF DEFENSE Defense Acquisition Regulations System Commercial Item Handbook AGENCY.... SUMMARY: DoD has updated its Commercial Item Handbook. The purpose of the Handbook is to help acquisition personnel develop sound business strategies for procuring commercial items. DoD is seeking industry input on...
Löwe, Bernd; Spitzer, Robert L; Williams, Janet B W; Mussell, Monika; Schellberg, Dieter; Kroenke, Kurt
To determine diagnostic overlap of depression, anxiety and somatization as well as their unique and overlapping contribution to functional impairment. Two thousand ninety-one consecutive primary care clinic patients participated in a multicenter cross-sectional survey in 15 primary care clinics in the United States (participation rate, 92%). Depression, anxiety, somatization and functional impairment were assessed using validated scales from the Patient Health Questionnaire (PHQ) (PHQ-8, eight-item depression module; GAD-7, seven-item Generalized Anxiety Disorder Scale; and PHQ-15, 15-item somatic symptom scale) and the Short-Form General Health Survey (SF-20). Multiple linear regression analyses were used to investigate unique and overlapping associations of depression, anxiety and somatization with functional impairment. In over 50% of cases, comorbidities existed between depression, anxiety and somatization. The contribution of the commonalities of depression, anxiety and somatization to functional impairment substantially exceeded the contribution of their independent parts. Nevertheless, depression, anxiety and somatization did have important and individual effects (i.e., separate from their overlap effect) on certain areas of functional impairment. Given the large syndrome overlap, a potential consideration for future diagnostic classification would be to describe basic diagnostic criteria for a single overarching disorder and to optionally code additional diagnostic features that allow a more detailed classification into specific depressive, anxiety and somatoform subtypes.
Chandra K. Jaggi
Full Text Available In today’s competition inherited business world, managing inventory of goods is a major challenge in all the sectors of economy. The demand of an item plays a significant role while managing the stock of goods, as it may depend on several factors viz., inflation, selling price, advertisement, etc. Among these, selling price of an item is a decisive factor for the organization; because in this competitive world of business one is constantly on the lookout for the ways to beat the competition. It is a well-known accepted fact that keeping a reasonable price helps in attracting more customers, which in turn increases the aggregate demand. Thus in order to improve efficiency of business performance organization needs to stock a higher inventory, which needs an additional storage space. Moreover, in today’s unstable global economy there is consequent decline in the real value of money, because the general level of prices of goods and services is rising (i.e., inflation. And since inventories represent a considerable investment for every organization, it is inevitable to consider the effects of inflation and time value of money while determining the optimal inventory policy. With this motivation, this paper is aimed at developing a two-warehouse inventory model for deteriorating items where the demand rate is a decreasing function of the selling price under inflationary conditions. In addition, shortages are allowed and partially backlogged, and the backlogging rate has been considered as an exponentially decreasing function of the waiting time. The model jointly optimizes the initial inventory and the price for the product, so as to maximize the total average profit. Finally, the model is analysed and validated with the help of numerical examples, and a comprehensive sensitivity analysis has been performed which provides some important managerial implications.
Englehard, George, Jr.
The methods used by E. L. Thorndike, L. L. Thurstone, and G. Rasch to address issues related to item-invariant measurement and the scoring of individual performance are compared. The analyses highlight the close connection among the three methods, and suggest that progress in measurement theory reflects the movement from essentially ad hoc methods…
Fernandez Carratala, L.
There is an increasing difficulty for purchasing safety related spare items, with certifications by manufacturers for maintaining the original qualifications of the equipment of destination. The main reasons are, on the top of the logical evolution of technology, applied to the new manufactured components, the quitting of nuclear specific production lines and the evolution of manufacturers quality systems, originally based on nuclear codes and standards, to conventional industry standards. To face this problem, for many years different Dedication processes have been implemented to verify whether a commercial grade element is acceptable to be used in safety related applications. In the same way, due to our particular position regarding the spare part supplies, mainly from markets others than the american, C.N. Trillo has developed a methodology called Spare Items Validation. This methodology, which is originally based on dedication processes, is not a single process but a group of coordinated processes involving engineering, quality and management activities. These are to be performed on the spare item itself, its design control, its fabrication and its supply for allowing its use in destinations with specific requirements. The scope of application is not only focussed on safety related items, but also to complex design, high cost or plant reliability related components. The implementation in C.N. Trillo has been mainly curried out by merging, modifying and making the most of processes and activities which were already being performed in the company. (Author)
Full Text Available In temperate ecosystems, acidic forest soils are among the most nutrient-poor terrestrial environments. In this context, the long-term differentiation of the forest soils into horizons may impact the assembly and the functions of the soil microbial communities. To gain a more comprehensive understanding of the ecology and functional potentials of these microbial communities, a suite of analyses including comparative metagenomics was applied on independent soil samples from a spruce plantation (Breuil-Chenue, France. The objectives were to assess whether the decreasing nutrient bioavailability and pH variations that naturally occurs between the organic and mineral horizons affects the soil microbial functional biodiversity. The 14 Gbp of pyrosequencing and Illumina sequences generated in this study revealed complex microbial communities dominated by bacteria. Detailed analyses showed that the organic soil horizon was significantly enriched in sequences related to Bacteria, Chordata, Arthropoda and Ascomycota. On the contrary the mineral horizon was significantly enriched in sequences related to Archaea. Our analyses also highlighted that the microbial communities inhabiting the two soil horizons differed significantly in their functional potentials according to functional assays and MG-RAST analyses, suggesting a functional specialisation of these microbial communities. Consistent with this specialisation, our shotgun metagenomic approach revealed a significant increase in the relative abundance of sequences related glycoside hydrolases in the organic horizon compared to the mineral horizon that was significantly enriched in glycoside transferases. This functional stratification according to the soil horizon was also confirmed by a significant correlation between the functional assays performed in this study and the functional metagenomic analyses. Together, our results suggest that the soil stratification and particularly the soil resource
Gamper, Eva-Maria; Petersen, Morten Aa.; Aaronson, Neil
a computer-adaptive test (CAT) for RF. This was part of a larger project whose objective is to develop a CAT version of the EORTC QLQ-C30 which is one of the most widely used HRQOL instruments in oncology. METHODS: In accordance with EORTC guidelines, the development of the RF-CAT comprised four phases...... with good psychometric properties. The resulting item bank exhibits excellent reliability (mean reliability = 0.85, median = 0.95). Using the RF-CAT may allow sample size savings from 11 % up to 50 % compared to using the QLQ-C30 RF scale. CONCLUSIONS: The RF-CAT item bank improves the precision...
Sánchez, S C; Chedraui, P; Pérez-López, F R; Ortiz-Benegas, M E; Palacios-De Franco, Y
Background There are scant data related to sexuality assessed among mid-aged women from Paraguay. Objective To assess sexual function in a sample of mid-aged Paraguayan women. Methods This was a cross-sectional study in which 265 urban-living women from Asunción (Paraguay) aged 40-65 years were surveyed with the six-item version of the Female Sexual Function Index (FSFI-6) and a questionnaire containing personal and partner data. Results The median age of the sample was 48 years, 48.2% were postmenopausal (median/interquartile range age at menopause 46/13 years), 11.3% used hormone therapy, 37.0% used psychotropic drugs, 44.5% had hypertension, 7.2% diabetes, 46.1% abdominal obesity and 89.4% had a partner (n = 237). Overall, 84.1% (223/265) of surveyed women were sexually active, presenting a median total FSFI-6 score of 23.0, and 25.6% obtained a total score of 19 or less, suggestive of sexual dysfunction (lower sexual function). Upon bivariate analysis, several factors were associated with lower total FSFI-6 scores; however, multiple linear regression analysis found that lower total FSFI-6 scores (worse sexual function) were significantly correlated to the postmenopausal status and having an older partner, whereas coital frequency was positively correlated to higher scores (better sexual function). Conclusion In this pilot sample of urban-living, mid-aged Paraguayan women, as determined with the FSFI-6, lower sexual function was related to menopausal status, coital frequency and partner age. There is a need for more research in this regard in this population.
Tinari, Frank D.
Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)
Michalis P Michaelides
Full Text Available Many studies have investigated the topic of change or drift in item parameter estimates in the context of Item Response Theory. Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
Michaelides, Michalis P
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
Liou, Pey-Yan; Bulut, Okan
The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
Kim eDe Roover
Full Text Available The issue of measurement invariance is ubiquitous in the behavioral sciences nowadays as more and more studies yield multivariate multigroup data. When measurement invariance cannot be established across groups, this is often due to different loadings on only a few items. Within the multigroup CFA framework, methods have been proposed to trace such non-invariant items, but these methods have some disadvantages in that they require researchers to run a multitude of analyses and in that they imply assumptions that are often questionable. In this paper, we propose an alternative strategy which builds on clusterwise simultaneous component analysis (SCA. Clusterwise SCA, being an exploratory technique, assigns the groups under study to a few clusters based on differences and similarities in the covariance matrices, and thus based on the component structure of the items. Non-invariant items can then be traced by comparing the cluster-specific component loadings via congruence coefficients, which is far more parsimonious than comparing the component structure of all separate groups. In this paper we present a heuristic for this procedure. Afterwards, one can return to the multigroup CFA framework and check whether removing the non-invariant items or removing some of the equality restrictions for these items, yields satisfactory invariance test results. An empirical application concerning cross-cultural emotion data is used to demonstrate that this novel approach is useful and can co-exist with the traditional CFA approaches.
This document specifies the critical characteristics for Commercial Grade Items (CGI) procured for PFP's criticality alarm system as required by HNF-PRO-268 and HNF-PRO-1819. These are the minimum specifications that the equipment must meet in order to properly perform its safety function. There may be several manufacturers or models that meet the critical characteristics for any one item. PFP's Criticality Alarm System includes the nine criticality alarm system panels and their associated hardware. This includes all parts up to the first breaker in the electrical distribution system. Specific system boundaries and justifications are contained in HNF-SD-CP-SDD-003, ''Definition and Means of Maintaining the Criticality Detectors and Alarms Portion of the PFP Safety Envelope.'' The procurement requirements associated with the system necessitates procurement of some system equipment as Commercial Grade Items in accordance with HNF-PRO-268, ''Control of Purchased Items and Services.''
Tovel, Hava; Carmel, Sara
This article describes the development and validation of the Function Self-Efficacy Scale (FSES) for assessing the degree of confidence in self-functioning while facing decline in health and function (DHF). The FSES was evaluated in two studies of older Israelis, aged 75+ years. Data were collected by structured home interviews. Exploratory factor analyses conducted in both studies clearly revealed two underlying factors: emotion self-efficacy and action self-efficacy. Confirmatory factor analyses resulted in acceptable model fit criteria. The shortened final 13-item FSES had good internal consistency and satisfactory criterion and convergent validity. Multiple regression analyses, conducted to predict subjective well-being in each of the studies, showed that function self-efficacy had a positive and significant contribution to the explanation of well-being, while controlling for general self-efficacy, self-rated health, and sociodemographic variables. We propose that appropriate interventions can strengthen function self-efficacy, thus improving the well-being of elderly persons and their ability to cope with DHF. © The Author(s) 2015.
Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E
The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.
Khawaja, Nigar G.; Yu, Lai Ngo Heidi
The 27-item Intolerance of Uncertainty Scale (IUS) has become one of the most frequently used measures of Intolerance of Uncertainty. More recently, an abridged, 12-item version of the IUS has been developed. The current research used clinical (n = 50) and non-clinical (n = 56) samples to examine and compare the psychometric properties of both…
Full Text Available Numerous items, small order, and frequent delivery are the characteristics of many distribution centers. Such characteristics generally increase the operating costs of the distribution center. To remedy this problem, this study employs the Entry-Item-Quantity (EIQ method to identify the characteristic of the cigarette distribution center and further analyzes the importance degree of customers and the frequently ordered products by means of EQ/EN/IQ-B/IK statistic charts. Based on these analyses as well as the total replenishment cost optimization model, multipicking strategies and combined multitype picking equipment allocation is then formulated accordingly. With such design scheme, the cigarette picking costs of the distribution center are expected to reduce. Finally, the specific number of equipment is figured out in order to meet the capability demand of the case cigarette distribution center.
Fahmie, Tara A; Iwata, Brian A; Harper, Jill M; Querim, Angie C
A common condition included in most functional analyses (FAs) is the attention condition, in which the therapist ignores the client by engaging in a solitary activity (antecedent event) but delivers attention to the client contingent on problem behavior (consequent event). The divided attention condition is similar, except that the antecedent event consists of the therapist conversing with an adult confederate. We compared the typical and divided attention conditions to determine whether behavior in general (Study 1) and problem behavior in particular (Study 2) were more sensitive to one of the test conditions. Results showed that the divided attention condition resulted in faster acquisition or more efficient FA results for 2 of 9 subjects, suggesting that the divided attention condition could be considered a preferred condition when resources are available. © Society for the Experimental Analysis of Behavior.
Crins, M H P; Roorda, L D; Smits, N; de Vet, H C W; Westhovens, R; Cella, D; Cook, K F; Revicki, D; van Leeuwen, J; Boers, M; Dekker, J; Terwee, C B
The aims of the current study were to calibrate the item parameters of the Dutch-Flemish PROMIS Pain Behavior item bank using a sample of Dutch patients with chronic pain and to evaluate cross-cultural validity between the Dutch-Flemish and the US PROMIS Pain Behavior item banks. Furthermore, reliability and construct validity of the Dutch-Flemish PROMIS Pain Behavior item bank were evaluated. The 39 items in the bank were completed by 1042 Dutch patients with chronic pain. To evaluate unidimensionality, a one-factor confirmatory factor analysis (CFA) was performed. A graded response model (GRM) was used to calibrate the items. To evaluate cross-cultural validity, Differential item functioning (DIF) for language (Dutch vs. English) was evaluated. Reliability of the item bank was also examined and construct validity was studied using several legacy instruments, e.g. the Roland Morris Disability Questionnaire. CFA supported the unidimensionality of the Dutch-Flemish PROMIS Pain Behavior item bank (CFI = 0.960, TLI = 0.958), the data also fit the GRM, and demonstrated good coverage across the pain behavior construct (threshold parameters range: -3.42 to 3.54). Analysis showed good cross-cultural validity (only six DIF items), reliability (Cronbach's α = 0.95) and construct validity (all correlations ≥0.53). The Dutch-Flemish PROMIS Pain Behavior item bank was found to have good cross-cultural validity, reliability and construct validity. The development of the Dutch-Flemish PROMIS Pain Behavior item bank will serve as the basis for Dutch-Flemish PROMIS short forms and computer adaptive testing (CAT). © 2015 European Pain Federation - EFIC®
A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure.
Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C
It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
Full Text Available In the food industry, composition, size, and shape of items are much less regular than in other commodities sectors. In addition, a wide variety of packaging, composed by different materials, is employed. As material, size and shape of items to which the tag should be attached strongly influence the minimum power requested for tag functioning, performance improvements can be achieved only selecting suitable radio frequency (RF identifiers for the specific combination of food product and packaging. When dealing with logistics units, the dynamic reading of a vast number of tags could originate simultaneous broadcasting of signals (tag-to-tag collisions that could affect reading rates and the overall reliability of the identification procedure. This paper reports the results of an analysis of the reading performance of ultra high frequency radio frequency identification systems for multiple static and dynamic electronic identification of food packed products in controlled conditions. Products were considered when arranged on a logistics pallet. The effects on reading rate of different factors, among which the product type, the gate configuration, the field polarisation, the power output of the RF reader, the interrogation protocol configuration as well as the transit speed, the number of tags and their interactions were statistically analysed and compared.
This document specifies the critical characteristics for Commercial Grade Items (CGI) procured for use in the Plutonium Finishing Plant as required by HNF-PRO-268 and HNF-PRO-1819. These are the minimum specifications that the equipment must meet in order to properly perform its safety function. There may be several manufacturers or models that meet the critical characteristics of any one item
Kim, Kyungmi; Yi, Do-Joon; Raye, Carol L; Johnson, Marcia K
In the present study, we explored how item repetition affects source memory for new item-feature associations (picture-location or picture-color). We presented line drawings varying numbers of times in Phase 1. In Phase 2, each drawing was presented once with a critical new feature. In Phase 3, we tested memory for the new source feature of each item from Phase 2. Experiments 1 and 2 demonstrated and replicated the negative effects of item repetition on incidental source memory. Prior item repetition also had a negative effect on source memory when different source dimensions were used in Phases 1 and 2 (Experiment 3) and when participants were explicitly instructed to learn source information in Phase 2 (Experiments 4 and 5). Importantly, when the order between Phases 1 and 2 was reversed, such that item repetition occurred after the encoding of critical item-source combinations, item repetition no longer affected source memory (Experiment 6). Overall, our findings did not support predictions based on item predifferentiation, within-dimension source interference, or general interference from multiple traces of an item. Rather, the findings were consistent with the idea that prior item repetition reduces attention to subsequent presentations of the item, decreasing the likelihood that critical item-source associations will be encoded.
Huggins-Manley, Anne Corinne
This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…
Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.
Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
Current research on autobiographical memory distinguishes between a self function, a directive function, and a social function of autobiographical memory. From a lifespan perspective, the use of autobiographical memory for these functions is expected to decrease with age. The present study extended these functions by the function of nostalgia: Often triggered by negative emotions, remembering personal and positive experiences might, among others, enhance positive effects. This emotion-regulating function is expected to become more important in old age. In the present study 273 adults (aged between 19 and 90 years) completed the Thinking About Life Experiences Questionnaire (TALE) as well as 11 newly developed items to assess the nostalgia function. Exploratory and confirmatory factor analyses supported a four-factor model reflecting the presumed self, directive, social, and nostalgia functions of autobiographical memory. The results showed a decrease in the use of autobiographical memory for self, directive and social functions with increasing age, whereas the nostalgia function followed a U-shaped pattern.
Proudfoot, Judith G; Bubner, Tanya; Amoroso, Cheryl; Swan, Edward; Holton, Christine; Winstanley, Julie; Beilby, Justin; Harris, Mark F
At a time when workforce shortages in general practices are leading to greater role substitution and skill-mix diversification, and the demand on general practices for chronic disease care is increasing, the structure and function of the general practice team is taking on heightened importance. To assist general practices and the organizations supporting them to assess the effectiveness of their chronic care teamworking, we developed an interview tool, the Chronic Care Team Profile (CCTP), to measure the structure and function of teams in general practice. This paper describes its properties and potential use. An initial pool of items was derived from guidelines of best-practice for chronic disease care and performance standards for general practices. The items covered staffing, skill-mix, job descriptions and roles, training, protocols and procedures within the practice. The 41-item pool was factor analysed, retained items were measured for internal consistency and the reduced instrument's face, content and construct validity were evaluated. A three-factor solution corresponding to non-general practitioner staff roles in chronic care, administrative functions and management structures provided the best fit to the data and explained 45% of the variance in the CCTP. Further analyses suggested that the CCTP is reliable, valid and has some utility. The CCTP measures aspects of the structure and function of general practices which are independent of team processes. It is associated with the job satisfaction of general practice staff and the quality of care provided to patients with chronic illnesses. As such, the CCTP offers a simple and useful tool for general practices to assess their teamworking in chronic disease care.
Hitomi, Yuki; Tokunaga, Katsushi
Human genome variation may cause differences in traits and disease risks. Disease-causal/susceptible genes and variants for both common and rare diseases can be detected by comprehensive whole-genome analyses, such as whole-genome sequencing (WGS), using next-generation sequencing (NGS) technology and genome-wide association studies (GWAS). Here, in addition to the application of an NGS as a whole-genome analysis method, we summarize approaches for the identification of functional disease-causal/susceptible variants from abundant genetic variants in the human genome and methods for evaluating their functional effects in human diseases, using an NGS and in silico and in vitro functional analyses. We also discuss the clinical applications of the functional disease causal/susceptible variants to personalized medicine.
Semino, Sara; Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.
Autism Spectrum Disorder (ASD) is generally associated with difficulties in contextual source memory but not single item memory. There are surprising inconsistencies in the literature, however, that the current study seeks to address by examining item and source memory in age and ability matched groups of 22 ASD and 21 comparison adults. Results…
Kazman, Josh B; Scott, Jonathan M; Deuster, Patricia A
The limitations for self-reporting of dietary patterns are widely recognised as a major vulnerability of FFQ and the dietary screeners/scales derived from FFQ. Such instruments can yield inconsistent results to produce questionable interpretations. The present article discusses the value of psychometric approaches and standards in addressing these drawbacks for instruments used to estimate dietary habits and nutrient intake. We argue that a FFQ or screener that treats diet as a 'latent construct' can be optimised for both internal consistency and the value of the research results. Latent constructs, a foundation for item response theory (IRT)-based scales (e.g. Patient Reported Outcomes Measurement Information System) are typically introduced in the design stage of an instrument to elicit critical factors that cannot be observed or measured directly. We propose an iterative approach that uses such modelling to refine FFQ and similar instruments. To that end, we illustrate the benefits of psychometric modelling by using items and data from a sample of 12 370 Soldiers who completed the 2012 US Army Global Assessment Tool (GAT). We used factor analysis to build the scale incorporating five out of eleven survey items. An IRT-driven assessment of response category properties indicates likely problems in the ordering or wording of several response categories. Group comparisons, examined with differential item functioning (DIF), provided evidence of scale validity across each Army sub-population (sex, service component and officer status). Such an approach holds promise for future FFQ.
... AND FORMS SOLICITATION PROVISIONS AND CONTRACT CLAUSES Texts of Provisions and Clauses 852.214-72... 2008) Bids on * will be given equal consideration along with bids on ** and any such bids received... .** * Contracting officer will insert an alternate item that is considered acceptable. ** Contracting officer will...
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Ying, Zhiliang
Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.
Frändberg, Birgitta; Lincoln, Per; Wallin, Anita
Explanations involving submicro levels of representation are central to science education, but known to be difficult for students in secondary school. This study examines students' written explanations of physical and chemical phenomena regarding matter and changes in matter, in a large-scale test. This is done in order to understand linguistic challenges in constructing submicro level explanations involving the particle model of matter. Drawing from systemic functional linguistics, the lexicogrammatics used in explanations for realising experiential meaning in student explanations were analysed. We used answers to two partly constructed response items from the Swedish part of Trends in International Mathematics and Science Studies 2007, grade 8, to sort out explanations referring to the particle model of matter. These answers (86 from 954) were analysed regarding choices of vocabulary and grammar to distinguish between macro and submicro level of representation. The results show that students use a wide variety of lexicogrammatical resources to realise what happens on both macro and submicro level of representation, with greater diversity of verbs on the submicro level of explanation. The results suggest an uncertainty about the distinction between macro and submicro level of explanation.
Full Text Available The paper discusses the macrostructural treatment of multi-word lexical items in mono- and bilingual dictionaries. First, the classification of multi-word lexical items is presented, and special attention is paid to the discussion of compounds – a specific group of multi-word lexical items that is most commonly afforded headword status but whose inclusion in the headword list may also depend on spelling. Then the inclusion of multi-word lexical items in monolingual dictionaries is dealt with in greater detail, while the results of a short survey on the inclusion of five randomly chosen multi-word lexical items in seven English monolingual dictionaries are presented. The proposals as to how to treat these five multi-word lexical items in bilingual dictionaries are presented in the section about the inclusion of multi-word lexical items in bilingual dictionaries. The conclusion is that it is most important to take the users’ needs into consideration and to make any dictionary as user friendly as possible.
J. van Hoof PhD
Full Text Available Introduction: Losing items is a time-consuming occurrence in nursing homes that is ill described. An explorative study was conducted to investigate which items got lost by nursing home residents, and how this affects the residents and family caregivers. Method: Semi-structured interviews and card sorting tasks were conducted with 12 residents with early-stage dementia and 12 family caregivers. Thematic analysis was applied to the outcomes of the sessions. Results: The participants stated that numerous personal items and assistive devices get lost in the nursing home environment, which had various emotional, practical, and financial implications. Significant amounts of time are spent on trying to find items, varying from 1 hr up to a couple of weeks. Numerous potential solutions were identified by the interviewees. Discussion: Losing items often goes together with limitations to the participation of residents. Many family caregivers are reluctant to replace lost items, as these items may get lost again.
Full Text Available Humans need to be able to selectively control their memories. Here, we investigate the underlying processes in item-method directed forgetting and compare the classic active memory cues in this paradigm with a passive instruction. Typically, individual items are presented and each is followed by either a forget- or remember-instruction. On a surprise test of all items, memory is then worse for to-be-forgotten items (TBF compared to to-be-remembered items (TBR. This is thought to result from selective rehearsal of TBR, or from active inhibition of TBF, or from both. However, evidence suggests that if a forget instruction initiates active processing, paradoxical effects may also arise. To investigate the underlying mechanisms, four experiments were conducted where un-cued items (UI were introduced and recognition performance was compared between TBR, TBF and UI stimuli. Accuracy was encouraged via a performance-dependent monetary bonus. Across all experiments, including perceptually fully matched variants, memory accuracy for TBF was reduced compared to TBR, but better than for UI. Moreover, participants used a more conservative response criterion when responding to TBF stimuli. Thus, ironically, the F cue results in active processing, but this does not have inhibitory effects that would impair recognition memory beyond a un-cued baseline condition. This casts doubts on inhibitory accounts of item-method directed forgetting and is also difficult to reconcile with pure selective rehearsal of TBR. While the F-cue does induce active processing, this does not result in particularly successful forgetting. The pattern seems most consistent with the notion of ironic processing.
In recent years, tagging system has become a building block o summarize the content of items for further functions like retrieval or personalized recommendation in various web applications. One nontrivial requirement is to precisely deliver a list of suitable items when users interact with the systems via inputing a specific tag (i.e. a query term). Different from traditional recommender systems, we need deal with a collaborative retrieval (CR) problem, where both characteristics of retrieval and recommendation should be considered to model a ternary relationship involved with query× user× item. Recently, several works are proposed to study CR task from users’ perspective. However, they miss a significant challenge raising from the sparse content of items. In this work, we argue that items will suffer from the sparsity problem more severely than users, since items are usually observed with fewer features to support a feature-based or content-based algorithm. To tackle this problem, we aim to sufficiently explore the sophisticated relationship of each query× user× item triple from items’ perspective. By integrating item-based collaborative information for this joint task, we present an alternative factorized model that could better evaluate the ranks of those items with sparse information for the given query-user pair. In addition, we suggest to employ a recently proposed bayesian personalized ranking (BPR) algorithm to optimize latent collaborative retrieval problem from pairwise learning perspective. The experimental results on two real-world datasets, (i.e. Last.fm, Yelp), verified the efficiency and effectiveness of our proposed approach at top-k ranking metric.
Yu, Lu; Huang, Junming; Zhou, Ge; Liu, Chuang; Zhang, Zi-Ke
In recent years, tagging system has become a building block o summarize the content of items for further functions like retrieval or personalized recommendation in various web applications. One nontrivial requirement is to precisely deliver a list of suitable items when users interact with the systems via inputing a specific tag (i.e. a query term). Different from traditional recommender systems, we need deal with a collaborative retrieval (CR) problem, where both characteristics of retrieval and recommendation should be considered to model a ternary relationship involved with query× user× item. Recently, several works are proposed to study CR task from users’ perspective. However, they miss a significant challenge raising from the sparse content of items. In this work, we argue that items will suffer from the sparsity problem more severely than users, since items are usually observed with fewer features to support a feature-based or content-based algorithm. To tackle this problem, we aim to sufficiently explore the sophisticated relationship of each query× user× item triple from items’ perspective. By integrating item-based collaborative information for this joint task, we present an alternative factorized model that could better evaluate the ranks of those items with sparse information for the given query-user pair. In addition, we suggest to employ a recently proposed bayesian personalized ranking (BPR) algorithm to optimize latent collaborative retrieval problem from pairwise learning perspective. The experimental results on two real-world datasets, (i.e. Last.fm, Yelp), verified the efficiency and effectiveness of our proposed approach at top-k ranking metric.
Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A
The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Thompson, Trevor; Lloyd, Andrew; Joseph, Alain; Weiss, Margaret
Purpose The Weiss Functional Impairment Rating Scale-Parent Form (WFIRS-P) is a 50-item scale that assesses functional impairment on six clinically relevant domains typically affected in attention-deficit/hyperactivity disorder (ADHD). As functional impairment is central to ADHD, the WFIRS-P offers potential as a tool for assessing functional impairment in ADHD. These analyses were designed to examine the overall performance of WFIRS-P in differentiating ADHD and non-ADHD cases using receiver...
Davis-Stober, Clintin P; Doignon, Jean-Paul; Suck, Reinhard
We consider the covariance matrix for dichotomous Guttman items under a set of uniformity conditions, and obtain closed-form expressions for the eigenvalues and eigenvectors of the matrix. In particular, we describe the eigenvalues and eigenvectors of the matrix in terms of trigonometric functions of the number of items. Our results parallel those of Zwick (1987) for the correlation matrix under the same uniformity conditions. We provide an explanation for certain properties of principal components under Guttman scalability which have been first reported by Guttman (1950).
Wells, Katharine M; Boyd, Matthew J; Thornley, Tracey; Boardman, Helen F
The payment structure for the New Medicine Service (NMS) in England is based on the assumption that 0.5% of prescription items dispensed in community pharmacies are eligible for the service. This assumption is based on a theoretical calculation. This study aimed to find out the actual proportion of prescription items eligible for the NMS dispensed in community pharmacies in order to compare this with the theoretical assumption. The study also aimed to investigate whether the proportion of prescription items eligible for the NMS is affected by pharmacies' proximity to GP practices. The study collected data from eight pharmacies in Nottingham belonging to the same large chain of pharmacies. Pharmacies were grouped by distance from the nearest GP practice and sampled to reflect the distribution by distance of all pharmacies in Nottingham. Data on one thousand consecutive prescription items were collected from each pharmacy and the number of NMS eligible items recorded. All NHS prescriptions were included in the sample. Data were analysed and proportions calculated with 95% confidence intervals used to compare the study results against the theoretical figure of 0.5% of prescription items being eligible for the NMS. A total of 8005 prescription items were collected (a minimum of 1000 items per pharmacy) of which 17 items were eligible to receive the service. The study found that 0.25% (95% confidence intervals: 0.14% to 0.36%) of prescription items were eligible for the NMS which differs significantly from the theoretical assumption of 0.5%. The opportunity rate for the service was lower, 0.21% (95% confidence intervals: 0.10% to 0.32%) of items, as some items eligible for the NMS did not translate into opportunities to offer the service. Of all the prescription items collected in the pharmacies, 28% were collected by patient representatives. The results of this study show that the proportion of items eligible for the NMS dispensed in community pharmacies is lower than
Baylor, Carolyn; McAuliffe, Megan J.; Hughes, Louise E.; Yorkston, Kathryn; Anderson, Tim; Jiseon, Kim; Amtmann, Dagmar
Purpose: To examine the cross-cultural applicability of the Communicative Participation Item Bank (CPIB) through a comparison of respondents with Parkinson's disease (PD) from the United States and New Zealand. Method: A total of 428 respondents--218 from the United States and 210 from New Zealand-completed the self-report CPIB and a series of…
Matthew S. Johnson
Full Text Available Item response theory (IRT models are a class of statistical models used by researchers to describe the response behaviors of individuals to a set of categorically scored items. The most common IRT models can be classified as generalized linear fixed- and/or mixed-effect models. Although IRT models appear most often in the psychological testing literature, researchers in other fields have successfully utilized IRT-like models in a wide variety of applications. This paper discusses the three major methods of estimation in IRT and develops R functions utilizing the built-in capabilities of the R environment to find the marginal maximum likelihood estimates of the generalized partial credit model. The currently available R packages ltm is also discussed.
Lorber, Michael F; Xu, Shu; Slep, Amy M Smith; Bulling, Lisanne; O'Leary, Susan G
The psychometrics of the Parenting Scale's Overreactivity and Laxness subscales were evaluated using item response theory (IRT) techniques. The IRT analyses were based on 2 community samples of cohabiting parents of 3- to 8-year-old children, combined to yield a total sample size of 852 families. The results supported the utility of the Overreactivity and Laxness subscales, particularly in discriminating among parents in the mid to upper reaches of each construct. The original versions of the Overreactivity and Laxness subscales were more reliable than alternative, shorter versions identified in replicated factor analyses from previously published research and in IRT analyses in the present research. Moreover, in several cases, the original versions of these subscales, in comparison with the shortened versions, exhibited greater 6-month stabilities and correlations with child externalizing behavior and couple relationship satisfaction. Reliability was greater for the Laxness than for the Overreactivity subscale. Item performance on each subscale was highly variable. Together, the present findings are generally supportive of the psychometrics of the Parenting Scale, particularly for clinical research and practice. They also suggest areas for further development.
Uthayakumar, R.; Tharani, S.
Recently, much emphasis has given to study the control and maintenance of production inventories of the deteriorating items. Rework is one of the main issues in reverse logistic and green supply chain, since it can reduce production cost and the environmental problem. Many researchers have focused on developing rework model, but few of them have developed model for deteriorating items. Due to this fact, we take up productivity and rework with deterioration as the major concern in this paper. In this paper, a production-inventory model with deteriorative items in which one cycle has n production setups and one rework setup (n, 1) policy is considered for deteriorating items with stock-dependent demand in case 1 and exponential demand in case 2. An effective iterative solution procedure is developed to achieve optimal time, so that the total cost of the system is minimized. Numerical and sensitivity analyses are discussed to examine the outcome of the proposed solution procedure presented in this research.
Chang, C.K., E-mail: firstname.lastname@example.org [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of); Lee, K.J., E-mail: email@example.com [Kyung Hee Univ., Seoul (Korea, Republic of)
Qualification of equipment essential to safety in nuclear power plants (NPPs) ensures its capability to perform designated safety functions on demand under postulated service conditions. However, a number of incidents identified by the NRC since 1980s catalysed the US nuclear industry to adopt standard precautions to guard against counterfeit items. The purpose of this paper is to suggest the NFC (Near Field Communication) based equipment qualification management system preventing counterfeit and fraudulent items. The NEQM (NFC based Equipment Qualification Management) system work with the support of legacy systems such as PMS (Procurement Management System) and FMS (Facility management System). (author)
Chang, C.K.; Lee, K.J.
Qualification of equipment essential to safety in nuclear power plants (NPPs) ensures its capability to perform designated safety functions on demand under postulated service conditions. However, a number of incidents identified by the NRC since 1980s catalysed the US nuclear industry to adopt standard precautions to guard against counterfeit items. The purpose of this paper is to suggest the NFC (Near Field Communication) based equipment qualification management system preventing counterfeit and fraudulent items. The NEQM (NFC based Equipment Qualification Management) system work with the support of legacy systems such as PMS (Procurement Management System) and FMS (Facility management System). (author)
Valentin A Schriever
Full Text Available Tools for measuring olfactory function in adults have been well established. Although studies have shown that olfactory impairment in children may occur as a consequence of a number of diseases or head trauma, until today no consensus on how to evaluate the sense of smell in children exists in Europe. Aim of the study was to develop a modified "Sniffin' Sticks" odor identification test, the "Sniffin' Kids" test for the use in children. In this study 537 children between 6-17 years of age were included. Fourteen odors, which were identified at a high rate by children, were selected from the "Sniffin' Sticks" 16-item odor identification test. Normative date for the 14-item "Sniffin' Kids" odor identification test was obtained. The test was validated by including a group of congenital anosmic children. Results show that the "Sniffin' Kids" test is able to discriminate between normosmia and anosmia with a cutoff value of >7 points on the odor identification test. In addition the test-retest reliability was investigated in a group of 31 healthy children and shown to be ρ = 0.44. With the 14-item odor identification "Sniffin' Kids" test we present a valid and reliable test for measuring olfactory function in children between ages 6-17 years.
Amide local anaesthetics impair blood clotting in a concentration-dependent manner by inhibition of platelet function and enhanced fibrinolysis. We hypothesised that the presence of ropivacaine in the epidural space could decrease the efficacy of an epidural blood patch, as this technique requires that the injected blood can clot in order to be effective. Ropivacaine is an aminoamide local anaesthetic used increasingly for epidural analgesia during labour. The concentration of local anaesthetic in blood achieved in the epidural space during the performance of an epidural blood patch is likely to be the greatest which occurs (intentionally) in any clinical setting. This study was undertaken to investigate whether concentrations of ropivacaine in blood, which could occur: (i) clinically in the epidural space and (ii) in plasma during an epidural infusion of ropivacaine, alter platelet function. A platelet function analyser (Dade PFA-100, Miami) was employed to assess the effects of ropivacaine-treated blood on platelet function. The greater concentrations of ropivacaine studied (3.75 and 1.88 mg x ml(-1)), which correspond to those which could occur in the epidural space, produced significant inhibition of platelet aggregation. We conclude that the presence of ropivacaine in the epidural space may decrease the efficacy of an early or prophylactic epidural blood patch.
Bregnbak, David; Johansen, Jeanne D.; Hamann, Dathan
We recently evaluated and validated a diphenylcarbazide(DPC)-based screening spot test that can detect the release of chromium(VI) ions (≥0.5 ppm) from various metallic items and leather goods (1). We then screened a selection of metal screws, leather shoes, and gloves, as well as 50 earrings......, and identified chromium(VI) release from one earring. In the present study, we used the DPC spot test to assess chromium(VI) release in a much larger sample of jewellery items (n=848), 160 (19%) of which had previously be shown to contain chromium when analysed with X-ray fluorescence spectroscopy (2)....
Alexander K. Volkov
Full Text Available The modern approaches to the aviation security screeners’ efficiency have been analyzedand, certain drawbacks have been considered. The main drawback is the complexity of ICAO recommendations implementation concerning taking into account of shadow x-ray image complexity factors during preparation and evaluation of prohibited items detection efficiency by aviation security screeners. Х-ray image based factors are the specific properties of the x-ray image that in- fluence the ability to detect prohibited items by aviation security screeners. The most important complexity factors are: geometric characteristics of a prohibited item; view difficulty of prohibited items; superposition of prohibited items byother objects in the bag; bag content complexity; the color similarity of prohibited and usual items in the luggage.The one-dimensional two-parameter IRT model and the related criterion of aviation security screeners’ qualification have been suggested. Within the suggested model the probabilistic detection characteristics of aviation security screeners are considered as functions of such parameters as the difference between level of qualification and level of x-ray images com- plexity, and also between the aviation security screeners’ responsibility and structure of their professional knowledge. On the basis of the given model it is possible to consider two characteristic functions: first of all, characteristic function of qualifica- tion level which describes multi-complexity level of x-ray image interpretation competency of the aviation security screener; secondly, characteristic function of the x-ray image complexity which describes the range of x-ray image interpretation com- petency of the aviation security screeners having various training levels to interpret the x-ray image of a certain level of com- plexity. The suggested complex criterion to assess the level of the aviation security screener qualification allows to evaluate his or
Psychometric evaluation of the pediatric and parent-proxy Patient-Reported Outcomes Measurement Information System and the Neurology and Traumatic Brain Injury Quality of Life measurement item banks in pediatric traumatic brain injury.
Bertisch, Hilary; Rivara, Frederick P; Kisala, Pamela A; Wang, Jin; Yeates, Keith Owen; Durbin, Dennis; Zonfrillo, Mark R; Bell, Michael J; Temkin, Nancy; Tulsky, David S
The primary objective is to provide evidence of convergent and discriminant validity for the pediatric and parent-proxy versions of the Patient-Reported Outcomes Measurement Information System (PROMIS) Anxiety, Depression, Anger, Peer Relations, Mobility, Pain Interference, and Fatigue item banks, the Neurology Quality of Life measurement system (Neuro-QOL) Cognition-General Concerns and Stigma item banks, and the Traumatic Brain Injury Quality of Life (TBI-QOL) Executive Function and Headache item banks in a pediatric traumatic brain injury (TBI) sample. Participants were 134 parent-child (ages 8-18 years) days. Children all sustained TBI and the dyads completed outcome ratings 6 months after injury at one of six medical centers across the United States. Ratings included PROMIS, Neuro-QOL, and TBI-QOL item banks, as well as the Pediatric Quality of Life inventory (PedsQL), the Health Behavior Inventory (HBI), and the Strengths and Difficulties Questionnaire (SDQ) as legacy criterion measures against which these item banks were validated. The PROMIS, Neuro-QOL, and TBI-QOL item banks demonstrated good convergent validity, as evidenced by moderate to strong correlations with comparable scales on the legacy measures. PROMIS, Neuro-QOL, and TBI-QOL item banks showed weaker correlations with ratings of unrelated constructs on legacy measures, providing evidence of discriminant validity. Our results indicate that the constructs measured by the PROMIS, Neuro-QOL, and TBI-QOL item banks are valid in our pediatric TBI sample and that it is appropriate to use these standardized scores for our primary study analyses.
Heinemann, Allen W; Kisala, Pamela A; Hahn, Elizabeth A; Tulsky, David S
To develop a spinal cord injury (SCI)-focused version of PROMIS and Neuro-QOL social domain item banks; evaluate the psychometric properties of items developed for adults with SCI; and report information to facilitate clinical and research use. We used a mixed-methods design to develop and evaluate Ability to Participate in Social Roles and Activities and Satisfaction with Social Roles and Activities items. Focus groups helped define the constructs; cognitive interviews helped revise items; and confirmatory factor analysis and item response theory methods helped calibrate item banks and evaluate differential item functioning related to demographic and injury characteristics. Five SCI Model System sites and one Veterans Administration medical center. The calibration sample consisted of 641 individuals; a reliability sample consisted of 245 individuals residing in the community. A subset of 27 Ability to Participate and 35 Satisfaction items demonstrated good measurement properties and negligible differential item functioning related to demographic and injury characteristics. The SCI-specific measures correlate strongly with the PROMIS and Neuro-QOL versions. Ten item short forms correlate >0.96 with the full banks. Variable-length CATs with a minimum of 4 items, variable-length CATs with a minimum of 8 items, fixed-length CATs of 10 items, and the 10-item short forms demonstrate construct coverage and measurement error that is comparable to the full item bank. The Ability to Participate and Satisfaction with Social Roles and Activities CATs and short forms demonstrate excellent psychometric properties and are suitable for clinical and research applications.
Irene Fernández Monsalve
Full Text Available During language comprehension, semantic contextual information is used to generate expectations about upcoming items. This has been commonly studied through the N400 event-related potential (ERP, as a measure of facilitated lexical retrieval. However, the associative relationships in multi-word expressions (MWE may enable the generation of a categorical expectation, leading to lexical retrieval before target word onset. Processing of the target word would thus reflect a target-identification mechanism, possibly indexed by a P3 ERP component. However, given their time overlap (200-500 ms post-stimulus onset, differentiating between N400/P3 ERP responses (averaged over multiple linguistically variable trials is problematic. In the present study, we analyzed EEG data from a previous experiment, which compared ERP responses to highly expected words that were placed either in a MWE or a regular non-fixed compositional context, and to low predictability controls. We focused on oscillatory dynamics and regression analyses, in order to dissociate between the two contexts by modeling the electrophysiological response as a function of item-level parameters. A significant interaction between word position and condition was found in the regression model for power in a theta range (~7-9 Hz, providing evidence for the presence of qualitative differences between conditions. Power levels within this band were lower for MWE than compositional contexts then the target word appeared later on in the sentence, confirming that in the former lexical retrieval would have taken place before word onset. On the other hand, gamma-power (~50-70 Hz was also modulated by predictability of the item in all conditions, which is interpreted as an index of a similar `matching' sub-step for both types of contexts, binding an expected representation and the external input.
Clintin P Davis-Stober
Full Text Available We consider the sample covariance matrix for dichotomous Guttman items under a set of uniformity conditions, and obtain closed-form expressions for the eigenvalues and eigenvectors of the matrix. In particular, we describe the eigenvalues and eigenvectors of the matrix in terms of trigonometric functions of the number of items. Our results parallel those of Zwick (1987 for the correlation matrix under the same uniformity conditions. We provide an explanation for certain properties of principal components under Guttman scalability which have been first reported by Guttman (1950.
Germans, Sara; Van Heck, Guus L.; Masthoff, Erik D.; Trompenaars, Fons J. W. M.; Hodiamont, Paul P. G.
This article describes the identification of a 10-item set of the Structured Clinical Interview for DSM-IV Personality Disorders (SCID-II) items, which proved to be effective as a self-report assessment instrument in screening personality disorders. The item selection was based on the retrospective analyses of 495 SCID-II interviews. The…
Wu, Tzu-Yi; Lin, Chung-Ying; Årestedt, Kristofer; Griffiths, Mark D; Broström, Anders; Pakpour, Amir H
Background and aims The nine-item Internet Gaming Disorder Scale - Short Form (IGDS-SF9) is brief and effective to evaluate Internet Gaming Disorder (IGD) severity. Although its scores show promising psychometric properties, less is known about whether different groups of gamers interpret the items similarly. This study aimed to verify the construct validity of the Persian IGDS-SF9 and examine the scores in relation to gender and hours spent online gaming among 2,363 Iranian adolescents. Methods Confirmatory factor analysis (CFA) and Rasch analysis were used to examine the construct validity of the IGDS-SF9. The effects of gender and time spent online gaming per week were investigated by multigroup CFA and Rasch differential item functioning (DIF). Results The unidimensionality of the IGDS-SF9 was supported in both CFA and Rasch. However, Item 4 (fail to control or cease gaming activities) displayed DIF (DIF contrast = 0.55) slightly over the recommended cutoff in Rasch but was invariant in multigroup CFA across gender. Items 4 (DIF contrast = -0.67) and 9 (jeopardize or lose an important thing because of gaming activity; DIF contrast = 0.61) displayed DIF in Rasch and were non-invariant in multigroup CFA across time spent online gaming. Conclusions Given the Persian IGDS-SF9 was unidimensional, it is concluded that the instrument can be used to assess IGD severity. However, users of the instrument are cautioned concerning the comparisons of the sum scores of the IGDS-SF9 across gender and across adolescents spending different amounts of time online gaming.
Aydogdu, K.; Boss, C. R.
heavily on experience and engineering judgement, consistent with the ALARA philosophy. Special care is taken to ensure that the best estimate dose rates are used to the extent possible when applying ALARA. Provisions for safeguards equipment are made throughout the fuel-handling route in CANDU and ACR reactors. For example, the fuel bundle counters rely on the decay gammas from the fission products in spent-fuel bundles to record the number of fuel movements. The International Atomic Energy Agency (IAEA) Safeguards system for CANDU and ACR reactors is based on item (fuel bundle) accounting. It involves a combination of IAEA inspection with containment and surveillance, and continuous unattended monitoring. The spent fuel bundle counter monitors spent fuel bundles as they are transferred from the fuelling machine to the spent fuel bay. The shielding and dose-rate analysis need to be carried out so that the bundle counter functions properly. This paper includes two codes used in criticality safety analyses. Criticality safety is a unique phenomenon and codes that address criticality issues will demand specific validations. However, it is recognised that some of the codes used in radiation physics will also be used in criticality safety assessments. (authors)
DeMars, Christine E.; Jurich, Daniel P.
The nonequivalent groups anchor test (NEAT) design is often used to scale item parameters from two different test forms. A subset of items, called the anchor items or common items, are administered as part of both test forms. These items are used to adjust the item calibrations for any differences in the ability distributions of the groups taking…
Although a GUI largely replaces textual descriptions by graphical icons, the textual items are not completely removed. The textual items are inevitably used in window titles, message boxes, help items, menu items and popup items. Textual items are necessary for communicating messages that are beyond the limitation of graphical messages. However, it is necessary to harness the textual items on the graphical interface in such a way that they complement each other to produce the best effect. One...
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Davis, A M; Perruccio, A V; Canizares, M
: The daily activity and sports and recreational items of the HOOS were reduced to five items achieving a feasible, short measure of physical function with interval level properties. This tool has potential for use as the function component of an OA severity scoring system. Further testing of this measure......OBJECTIVE: To derive a cross-culturally valid, short measure of physical function using function subscales (daily living and sports and recreation) of the Hip disability and Osteoarthritis Outcome Score (HOOS). METHODS: Rasch analysis was conducted on data from individuals from multiple countries...... who had hip osteoarthritis (OA). Fit of the data to the Rasch model was evaluated by model chi(2) and item fit statistics (chi(2), size of residual, and F-test). Differential item functioning was evaluated by gender, age and country. Unidimensionality was evaluated by factor analysis of residuals...
Stringer, Timothy Kent [Bucyrus, KS; Yerganian, Simon Scott [Lee's Summit, MO
A feeding mechanism and method for feeding minute items, such as capacitors, resistors, or solder preforms. The mechanism is adapted to receive a plurality of the randomly-positioned and randomly-oriented extremely small or minute items, and to isolate, orient, and position one or more of the items in a specific repeatable pickup location wherefrom they may be removed for use by, for example, a computer-controlled automated assembly machine. The mechanism comprises a sliding shelf adapted to receive and support the items; a wiper arm adapted to achieve a single even layer of the items; and a pushing arm adapted to push the items into the pickup location. The mechanism can be adapted for providing the items with a more exact orientation, and can also be adapted for use in a liquid environment.
Hammer-Helmich, Lene; Haro, Josep Maria; Jönsson, Bengt; Tanguy Melac, Audrey; Di Nicola, Sylvie; Chollet, Julien; Milea, Dominique; Rive, Benoît; Saragoussi, Delphine
The Prospective Epidemiological Research on Functioning Outcomes Related to Major depressive disorder (PERFORM) study describes the course of depressive symptoms, perceived cognitive symptoms, and functional impairment over 2 years in outpatients with major depressive disorder (MDD) and investigates the patient-related factors associated with functional impairment. This was a 2-year observational study in 1,159 outpatients with MDD aged 18-65 years who were either initiating antidepressant monotherapy or undergoing their first switch of antidepressant. Functional impairment was assessed by the Sheehan Disability Scale and the Work Productivity and Activity Impairment questionnaire. Patients assessed depression severity using the nine-item Patient Health Questionnaire and severity of perceived cognitive symptoms using the five-item Perceived Deficit Questionnaire. To investigate which patient-related factors were associated with functional impairment, univariate analyses of variance were performed to identify relevant factors that were then included in multivariate analyses of covariance at baseline, month 2, months 6 and 12 combined, and months 18 and 24 combined. The greatest improvement in depressive symptoms, perceived cognitive symptoms, and functional impairment was seen immediately (within 2 months) following initiation or switch of antidepressant therapy, followed by more gradual improvement and long-term stabilization. Improvement in perceived cognitive symptoms was less marked than improvement in depressive symptoms during the acute treatment phase. Functional impairment in patients with MDD was not only associated with severity of depressive symptoms but also independently associated with severity of perceived cognitive symptoms when adjusted for depression severity throughout the 2 years of follow-up. These findings highlight the burden of functional impairment in MDD and the importance of recognizing and managing cognitive symptoms in daily practice.
Hannula, Deborah E.; Tranel, Daniel; Allen, John S.; Kirchhoff, Brenda A.; Nickel, Allison E.; Cohen, Neal J.
Objective The objective of this study was to examine the dependence of item memory and relational memory on medial temporal lobe (MTL) structures. Patients with amnesia, who either had extensive MTL damage or damage that was relatively restricted to the hippocampus, were tested, as was a matched comparison group. Disproportionate relational memory impairments were predicted for both patient groups, and those with extensive MTL damage were also expected to have impaired item memory. Method Participants studied scenes, and were tested with interleaved two-alternative forced-choice probe trials. Probe trials were either presented immediately after the corresponding study trial (lag 1), five trials later (lag 5), or nine trials later (lag 9) and consisted of the studied scene along with a manipulated version of that scene in which one item was replaced with a different exemplar (item memory test) or was moved to a new location (relational memory test). Participants were to identify the exact match of the studied scene. Results As predicted, patients were disproportionately impaired on the test of relational memory. Item memory performance was marginally poorer among patients with extensive MTL damage, but both groups were impaired relative to matched comparison participants. Impaired performance was evident at all lags, including the shortest possible lag (lag 1). Conclusions The results are consistent with the proposed role of the hippocampus in relational memory binding and representation, even at short delays, and suggest that the hippocampus may also contribute to successful item memory when items are embedded in complex scenes. PMID:25068665
Craig, W.E.; Fakhar, A.A.; Shulman, M.N.
This document presents guidelines and supporting information for the technical evaluation of replacement items in nuclear power plants. These guidelines contain six major sections which provide the practical knowledge and a programmatic approach to determine the technical and quality requirements necessary to generate purchase documents to procure the proper replacement items. The technical evaluation methodology includes the following steps. (1) Identification of the need for a technical evaluation. (2) Component/part functional classification procedures. (3) Performance of a Failure Modes and Effects Analysis. (4) Selection of Critical Characteristics for Design Determination. (5) Performance of a ''Like-For-Like'' or ''Alternate'' item Evaluation. (6) Preparation of the Technical and Quality Requirements Specification. Work on this document was initiated in response to the increased emphasis by the utilities owning nuclear power plants and nuclear industry on procurement of replacement items for use in safety related applications at nuclear power plants. 20 refs., 9 figs., 14 tabs
Full Text Available Abstract Background A potential problem of low-stakes large-scale assessments such as the Programme for the International Assessment of Adult Competencies (PIAAC is low test-taking engagement. The present study pursued two goals in order to better understand conditioning factors of test-taking disengagement: First, a model-based approach was used to investigate whether item indicators of disengagement constitute a continuous latent person variable by domain. Second, the effects of person and item characteristics were jointly tested using explanatory item response models. Methods Analyses were based on the Canadian sample of Round 1 of the PIAAC, with N = 26,683 participants completing test items in the domains of literacy, numeracy, and problem solving. Binary item disengagement indicators were created by means of item response time thresholds. Results The results showed that disengagement indicators define a latent dimension by domain. Disengagement increased with lower educational attainment, lower cognitive skills, and when the test language was not the participant’s native language. Gender did not exert any effect on disengagement, while age had a positive effect for problem solving only. An item’s location in the second of two assessment modules was positively related to disengagement, as was item difficulty. The latter effect was negatively moderated by cognitive skill, suggesting that poor test-takers are especially likely to disengage with more difficult items. Conclusions The negative effect of cognitive skill, the positive effect of item difficulty, and their negative interaction effect support the assumption that disengagement is the outcome of individual expectations about success (informed disengagement.
Koller, Ingrid; Levenson, Michael R.; Glück, Judith
The valid measurement of latent constructs is crucial for psychological research. Here, we present a mixed-methods procedure for improving the precision of construct definitions, determining the content validity of items, evaluating the representativeness of items for the target construct, generating test items, and analyzing items on a theoretical basis. To illustrate the mixed-methods content-scaling-structure (CSS) procedure, we analyze the Adult Self-Transcendence Inventory, a self-report measure of wisdom (ASTI, Levenson et al., 2005). A content-validity analysis of the ASTI items was used as the basis of psychometric analyses using multidimensional item response models (N = 1215). We found that the new procedure produced important suggestions concerning five subdimensions of the ASTI that were not identifiable using exploratory methods. The study shows that the application of the suggested procedure leads to a deeper understanding of latent constructs. It also demonstrates the advantages of theory-based item analysis. PMID:28270777
Cavanagh, Anna; Wilson, Coralie J; Caputi, Peter; Kavanagh, David J
There is some evidence that, in contrast to depressed women, depressed men tend to report alternative symptoms that are not listed as standard diagnostic criteria. This may possibly lead to an under- or misdiagnosis of depression in men. This study aims to clarify whether depressed men and women report different symptoms. This study used data from the 2007 Australian National Survey of Mental Health and Wellbeing that was collected using the World Health Organization's Composite International Diagnostic Interview. Participants with a diagnosis of a depressive disorder with 12-month symptoms (n = 663) were identified and included in this study. Differential item functioning (DIF) was used to test whether depressed men and women endorse different features associated with their condition. Gender-related DIF was present for three symptoms associated with depression. Depressed women were more likely to report 'appetite/weight disturbance', whereas depressed men were more likely to report 'alcohol misuse' and 'substance misuse'. While the results may reflect a greater risk of co-occurring alcohol and substance misuse in men, inclusion of these features in assessments may improve the detection of depression in men, especially if standard depressive symptoms are under-reported. © The Author(s) 2016.
Williamson, David M.; Johnson, Matthew S.; Sinharay, Sandip; Bejar, Isaac I.
This study explored the application of hierarchical model calibration as a means of reducing, if not eliminating, the need for pretesting of automatically generated items from a common item model prior to operational use. Ultimately the successful development of automatic item generation (AIG) systems capable of producing items with highly similar…
... 41 Public Contracts and Property Management 2 2010-07-01 2010-07-01 true Review of items. 101-27.404 Section 101-27.404 Public Contracts and Property Management Federal Property Management...-Elimination of Items From Inventory § 101-27.404 Review of items. Except for standby or reserve stocks, items...
Rikers, Jos H.A.N.
The process of writing test items is analyzed, and a blueprint is presented for an authoring system for test item writing to reduce invalidity and to structure the process of item writing. The developmental methodology is introduced, and the first steps in the process are reported. A historical
Baghaei, Purya; Ravand, Hamdollah
In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
Chalmers, R Philip; Pek, Jolynn; Liu, Yang
Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.
... 10 Energy 4 2010-01-01 2010-01-01 false Labeling items and containers. 835.605 Section 835.605... items and containers. Except as provided at § 835.606, each item or container of radioactive material... information to permit individuals handling, using, or working in the vicinity of the items or containers to...
Dorn, B.; de Haan, R.; Schlotter, I.; Röthe, J.
We consider the following control problem on fair allocation of indivisible goods. Given a set I of items and a set of agents, each having strict linear preference over the items, we ask for a minimum subset of the items whose deletion guarantees the existence of a proportional allocation in the
basket of items, utilized by many e-commerce sites, cannot take advantage of pre-computed user-to-user similarities. Finally, even though the...not discriminate between items that are present in frequent itemsets and items that are not, while still maintaining the computational advantages of...453219 0.02% 7.74 ccard 42629 68793 398619 0.01% 9.35 ecommerce 6667 17491 91222 0.08% 13.68 em 8002 1648 769311 5.83% 96.14 ml 943 1682 100000 6.31
Haverman, Lotte; Grootenhuis, Martha A; Raat, Hein; van Rossum, Marion A J; van Dulmen-den Broeder, Eline; Hoppenbrouwers, Karel; Correia, Helena; Cella, David; Roorda, Leo D; Terwee, Caroline B
The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is a new, state-of-the-art assessment system for measuring patient-reported health and well-being of adults and children. It has the potential to be more valid, reliable, and responsive than existing PROMs. The items banks are designed to be self-reported and completed by children aged 8-18 years. The PROMIS items can be administered in short forms or through computerized adaptive testing. This paper describes the translation and cultural adaption of nine PROMIS item banks (151 items) for children in Dutch-Flemish. The translation was performed by FACITtrans using standardized PROMIS methodology and approved by the PROMIS Statistical Center. The translation included four forward translations, two back-translations, three independent reviews (at least two Dutch, one Flemish), and pretesting in 24 children from the Netherlands and Flanders. For some items, it was necessary to have separate translations for Dutch and Flemish: physical function-mobility (three items), anger (one item), pain interference (two items), and asthma impact (one item). Challenges faced in the translation process included scarcity or overabundance of possible translations, unclear item descriptions, constructs broader/smaller in the target language, difficulties in rank ordering items, differences in unit of measurement, irrelevant items, or differences in performance of activities. By addressing these challenges, acceptable translations were obtained for all items. The Dutch-Flemish PROMIS items are linguistically equivalent to the original USA version. Short forms are now available for use, and entire item banks are ready for cross-cultural validation in the Netherlands and Flanders.
Lundin, Andreas; Leijon, Ola; Vaez, Marjan; Hallgren, Mats; Torgén, Margareta
This study assesses the predictive ability of the full Work Ability Index (WAI) as well as its individual items in the general population. The Work, Health and Retirement Study (WHRS) is a stratified random national sample of 25-75-year-olds living in Sweden in 2000 that received a postal questionnaire ( n = 6637, response rate = 53%). Current and subsequent sickness absence was obtained from registers. The ability of the WAI to predict long-term sickness absence (LTSA; ⩾ 90 consecutive days) during a period of four years was analysed by logistic regression, from which the Area Under the Receiver Operating Characteristic curve (AUC) was computed. There were 313 incident LTSA cases among 1786 employed individuals. The full WAI had acceptable ability to predict LTSA during the 4-year follow-up (AUC = 0.79; 95% CI 0.76 to 0.82). Individual items were less stable in their predictive ability. However, three of the individual items: current work ability compared with lifetime best, estimated work impairment due to diseases, and number of diagnosed current diseases, exceeded AUC > 0.70. Excluding the WAI item on number of days on sickness absence did not result in an inferior predictive ability of the WAI. The full WAI has acceptable predictive validity, and is superior to its individual items. For public health surveys, three items may be suitable proxies of the full WAI; current work ability compared with lifetime best, estimated work impairment due to diseases, and number of current diseases diagnosed by a physician.
Mueller, Evelyn A; Bengel, Juergen; Wirtz, Markus A
This study aimed to develop a self-description assessment instrument to measure work performance in patients with musculoskeletal diseases. In terms of the International Classification of Functioning, Disability and Health (ICF), work performance is defined as the degree of meeting the work demands (activities) at the actual workplace (environment). To account for the fact that work performance depends on the work demands of the job, we strived to develop item banks that allow a flexible use of item subgroups depending on the specific work demands of the patients' jobs. Item development included the collection of work tasks from literature and content validation through expert surveys and patient interviews. The resulting 122 items were answered by 621 patients with musculoskeletal diseases. Exploratory factor analysis to ascertain dimensionality and Rasch analysis (partial credit model) for each of the resulting dimensions were performed. Exploratory factor analysis resulted in four dimensions, and subsequent Rasch analysis led to the following item banks: 'impaired productivity' (15 items), 'impaired cognitive performance' (18), 'impaired coping with stress' (13) and 'impaired physical performance' (low physical workload 20 items, high physical workload 10 items). The item banks exhibited person separation indices (reliability) between 0.89 and 0.96. The assessment of work performance adds the activities component to the more commonly employed participation component of the ICF-model. The four item banks can be adapted to specific jobs where necessary without losing comparability of person measures, as the item banks are based on Rasch analysis.
French, Christine L.
Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Davis, Diane, Ed.
This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…
... 26 Internal Revenue 18 2010-04-01 2010-04-01 false Partnership items. 301.6501(o)-3 Section 301... § 301.6501(o)-3 Partnership items. (a) Partnership item defined. For purposes of section 6501(o) (as it..., and § 301.6511(g)-1, the term “partnership item” means— (1) Any item required to be taken into account...
Forrin, Noah D; MacLeod, Colin M
Differences in memory for item order have been used to explain the absence of between-subjects (i.e., pure-list) effects in free recall for several encoding techniques, including the production effect, the finding that reading aloud benefits memory compared with reading silently. Notably, however, evidence in support of the item-order account (Nairne, Riegler, & Serra, 1991) has derived primarily from short-list paradigms. We provide novel evidence that the item-order account also applies when recalling long lists. In Experiment 1, participants studied and then free recalled 3 different long lists of words: pure aloud, pure silent, and mixed (half aloud, half silent). A Bayesian analysis supported a null pure-list production effect, and subsequent order analyses were largely consistent with the item-order account. These findings indicate that order information is retained in long-term memory and is useful in guiding subsequent free recall. In Experiment 2, a distractor task was inserted between the study and test phases, ensuring that only long-term memory processes were involved in recall: The pattern of results remained consistent with the item-order account. Order information can be retained in long-term memory for long lists, and is useful in guiding subsequent free recall, extending the domain of the item-order account. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Oliveira, Carla; Carpinteiro, Goncalo; Correia, Luis M.; Fernandes, Carlos A.; Serralha, Afonso; Marques, Nuno
The ITEM Project is a pioneer project in Portugal, providing public information on exposure to electromagnetic radiation, essentially due to mobile communication systems. The motivation, the main goals and the Project description are presented in this paper, as well as the website that provides the public dissemination of results and further significant information (www.lx.it.pt/item). This site provides information on different issues related to exposure to radiation, namely results of measurement campaigns conducted by a team on several locations in Portugal, and results of continuous measurements performed by autonomous stations located in public places in collaboration with municipal authorities. The global overview of the results from the measurement campaigns carried out up to present shows that all the analysed locations are in compliance with the radiation thresholds, i.e., all the electric field measured values are below the most restrictive threshold established at European level. (author)
Hiscox, Michael D.
Educational item banking presents observers with a considerable paradox. The development of test items from scratch is viewed as wasteful, a luxury in times of declining resources. On the other hand, item banking has failed to become a mature technology despite large amounts of money and the efforts of talented professionals. The question of which…
Liu, Jin-Hu; Zhou, Tao; Zhang, Zi-Ke; Yang, Zimo; Liu, Chuang; Li, Wei-Min
As one of the major challenges, cold-start problem plagues nearly all recommender systems. In particular, new items will be overlooked, impeding the development of new products online. Given limited resources, how to utilize the knowledge of recommender systems and design efficient marketing strategy for new items is extremely important. In this paper, we convert this ticklish issue into a clear mathematical problem based on a bipartite network representation. Under the most widely used algorithm in real e-commerce recommender systems, the so-called item-based collaborative filtering, we show that to simply push new items to active users is not a good strategy. Interestingly, experiments on real recommender systems indicate that to connect new items with some less active users will statistically yield better performance, namely, these new items will have more chance to appear in other users' recommendation lists. Further analysis suggests that the disassortative nature of recommender systems contributes to such observation. In a word, getting in-depth understanding on recommender systems could pave the way for the owners to popularize their cold-start products with low costs.
Liu, Jin-Hu; Zhou, Tao; Zhang, Zi-Ke; Yang, Zimo; Liu, Chuang; Li, Wei-Min
As one of the major challenges, cold-start problem plagues nearly all recommender systems. In particular, new items will be overlooked, impeding the development of new products online. Given limited resources, how to utilize the knowledge of recommender systems and design efficient marketing strategy for new items is extremely important. In this paper, we convert this ticklish issue into a clear mathematical problem based on a bipartite network representation. Under the most widely used algorithm in real e-commerce recommender systems, the so-called item-based collaborative filtering, we show that to simply push new items to active users is not a good strategy. Interestingly, experiments on real recommender systems indicate that to connect new items with some less active users will statistically yield better performance, namely, these new items will have more chance to appear in other users' recommendation lists. Further analysis suggests that the disassortative nature of recommender systems contributes to such observation. In a word, getting in-depth understanding on recommender systems could pave the way for the owners to popularize their cold-start products with low costs. PMID:25479013
Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Bisby, James A; Burgess, Neil
The formation of associations between items and their context has been proposed to rely on mechanisms distinct from those supporting memory for a single item. Although emotional experiences can profoundly affect memory, our understanding of how it interacts with different aspects of memory remains unclear. We performed three experiments to examine the effects of emotion on memory for items and their associations. By presenting neutral and negative items with background contexts, Experiment 1 demonstrated that item memory was facilitated by emotional affect, whereas memory for an associated context was reduced. In Experiment 2, arousal was manipulated independently of the memoranda, by a threat of shock, whereby encoding trials occurred under conditions of threat or safety. Memory for context was equally impaired by the presence of negative affect, whether induced by threat of shock or a negative item, relative to retrieval of the context of a neutral item in safety. In Experiment 3, participants were presented with neutral and negative items as paired associates, including all combinations of neutral and negative items. The results showed both above effects: compared to a neutral item, memory for the associate of a negative item (a second item here, context in Experiments 1 and 2) is impaired, whereas retrieval of the item itself is enhanced. Our findings suggest that negative affect impairs associative memory while recognition of a negative item is enhanced. They support dual-processing models in which negative affect or stress impairs hippocampal-dependent associative memory while the storage of negative sensory/perceptual representations is spared or even strengthened.
Liu, Chen-Wei; Wang, Wen-Chung
Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.
Kaller, Christoph P; Loosli, Sandra V; Rahm, Benjamin; Gössel, Astrid; Schieting, Stephan; Hornig, Tobias; Hennig, Jürgen; Tebartz van Elst, Ludger; Weiller, Cornelius; Katzev, Michael
Susceptibility to item-specific proactive interference (PI) contributes to interindividual differences in working memory (WM) capacity and complex cognition relying on WM. Although WM deficits are a well-recognized impairment in schizophrenia, the underlying pathophysiological effects on specific WM control functions, such as the ability to resist item-specific PI, remain unknown. Moreover, opposing hypotheses on increased versus reduced PI susceptibility in schizophrenia are both justifiable by the extant literature. To provide first insights into the behavioral and neural correlates of PI-related WM control in schizophrenia, a functional magnetic resonance imaging experiment was conducted in a sample of 20 patients and 20 well-matched control subjects. Demands on item-specific PI were experimentally manipulated in a recent-probes task (three runs, 64 trials each) requiring subjects to encode and maintain a set of four target items per trial. Compared with healthy control subjects, schizophrenia patients showed a significantly reduced PI susceptibility in both accuracy and latency measures. Notably, reduced PI susceptibility in schizophrenia was not associated with overall WM impairments and thus constituted an independent phenomenon. In addition, PI-related activations in inferior frontal gyrus and anterior insula, typically assumed to support PI resistance, were reduced in schizophrenia, thus ruling out increased neural efforts as a potential cause of the patients' reduced PI susceptibility. The present study provides first evidence for a diminished vulnerability of schizophrenia patients to item-specific PI, which is presumably a consequence of the patients' more efficient clearing of previously relevant WM traces and the accordingly reduced likelihood for item-specific PI to occur. Copyright © 2014 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Kim, Jiyoung; Chi, Youngshin; Huensch, Amanda; Jun, Heesung; Li, Hongli; Roullion, Vanessa
This article discusses a case study on an item writing process that reflects on our practical experience in an item development project. The purpose of the article is to share our lessons from the experience aiming to demystify item writing process. The study investigated three issues that naturally emerged during the project: how item writers use…
Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B
The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.
Martine H P Crins
Full Text Available The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA. Items were calibrated using the graded response model (GRM, an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF for language (Dutch vs. English was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986. Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44. The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF, good reliability (Cronbach's alpha = 0.98, and good construct validity (Pearson correlations between 0.62 and 0.75. A computer adaptive test (CAT and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.
Setiawan, Beni; Sabtiawan, Wahyu Budi
Analyzing test items is a skill that must be mastered by prospective teachers, in order to determine the quality of test questions which have been written. The main aim of this research was to describe the effectiveness of authentic task to foster the student's skill for analyzing test items involving validity, reliability, item discrimination index, level of difficulty, and distractor functioning through the authentic task. The participant of the research is students of science education study program, science and mathematics faculty, Universitas Negeri Surabaya, enrolled for assessment course. The research design was a one-group posttest design. The treatment in this study is that the students were provided an authentic task facilitating the students to develop test items, then they analyze the items like a professional assessor using Microsoft Excel and Anates Software. The data of research obtained were analyzed descriptively, such as the analysis was presented by displaying the data of students' skill, then they were associated with theories or previous empirical studies. The research showed the task facilitated the students to have the skills. Thirty-one students got a perfect score for the analyzing, five students achieved 97% mastery, two students had 92% mastery, and another two students got 89% and 79% of mastery. The implication of the finding was the students who get authentic tasks forcing them to perform like a professional, the possibility of the students for achieving the professional skills will be higher at the end of learning.
Sachs, J; Gao, L
The learning process questionnaire (LPQ) has been the source of intensive cross-cultural study. However, an item-level factor analysis of all the LPQ items simultaneously has never been reported. Rather, items within each subscale have been factor analysed to establish subscale unidimensionality and justify the use of composite subscale scores. It was of major interest to see if the six logically constructed items groups of the LPQ would be supported by empirical evidence. Additionally, it was of interest to compare the consistency of the reliability and correlational structure of the LPQ subscales in our study with those of previous cross-cultural studies. Confirmatory factor analysis was used to fit the six-factor item level model and to fit five representative subscale level factor models. A total of 1070 students between the ages of 15 to 18 years was drawn from a representative selection of 29 classes from within 15 secondary schools in Guangzhou, China. Males and females were almost equally represented. The six-factor item level model of the LPQ seemed to fit reasonably well, thus supporting the six dimensional structure of the LPQ and justifying the use of composite subscale scores for each LPQ dimension. However, the reliability of many of these subscales was low. Furthermore, only two subscale-level factor models showed marginally acceptable fit. Substantive considerations supported an oblique three-factor model. Because the LPQ subscales often show low internal consistency reliability, experimental and correlational studies that have used these subscales as dependent measures have been disappointing. It is suggested that some LPQ items should be revised and other items added to improve the inventory's overall psychometric properties.
Deepak, Kishore K; Al-Umran, Khalid Umran; AI-Sheikh, Mona H; Dkoli, B V; Al-Rubaish, Abdullah
The functionality of distra